python网页列表爬虫 - 菜鸟 - ITeye博客

`

l62s

浏览: 634008 次

最近访客更多访客>>

囧囧有神

ASDFX110

fangxincxy

hishuaijun3

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

树下白狐：
listview与adapter用法
u011467537： ...
Android背景渐变色(shape,gradient)
asdf12343800： asdf12343800 写道asdf12343800 写道a ...
Android背景渐变色(shape,gradient)
asdf12343800： asdf12343800 写道asdf12343800 写道 ...
Android背景渐变色(shape,gradient)
asdf12343800： asdf12343800 写道 : lol: fghhg
Android背景渐变色(shape,gradient)

python网页列表爬虫

博客分类：

python

阅读更多

#-*- encoding: utf-8 -*-

import htmllib,urllib,formatter,string

class GetLinks(htmllib.HTMLParser,str):

def __init__(self,str):

self.str=str

self.links = {}

f = formatter.NullFormatter()

htmllib.HTMLParser.__init__(self, f)

def anchor_bgn(self, href, name, type):

self.save_bgn()

self.link = href

def anchor_end(self):

text = string.strip(self.save_end())

if text.find(self.str)!=-1 :

if self.link and text:

self.links[text] = self.link

def findall(str1,strfront,i,strlat):

fp = urllib.urlopen(strfront+str(i)+strlat)

data = fp.read()

fp.close()

linkdemo = GetLinks(str1)

linkdemo.feed(data)

linkdemo.close()

for href, link in linkdemo.links.items():

print href, "=>", link

i=1

strfront='http://readthedocs.org/docs/learn-python-the-hard-way-zh_cn-translation/en/latest/ex'

strlat='.html'

search='ex'

while i<20 :

findall(search,strfront,i,strlat)

i=i+1

分享到：

osgi在Actiivator中使用command | 代码注释样例

2012-05-02 08:51
浏览 1192
评论(0)
分类:编程语言
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

python网页文本爬虫.pdf: python网页文本爬虫.pdfpython网页文本爬虫.pdfpython网页文本爬虫.pdfpython网页文本爬虫.pdfpython网页文本爬虫.pdfpython网页文本爬虫.pdfpython网页文本爬虫.pdfpython网页文本爬虫.pdf

python网页文本爬虫 (2).pdf: python网页文本爬虫 (2).pdfpython网页文本爬虫 (2).pdfpython网页文本爬虫 (2).pdfpython网页文本爬虫 (2).pdfpython网页文本爬虫 (2).pdfpython网页文本爬虫 (2).pdfpython网页文本爬虫 (2).pdfpython网页文本爬虫...

网页爬虫_爬虫python_dancepca_python网页爬虫_爬虫_funnyzfy_: python网页爬虫

python网络爬虫爬取整个网页: python实现对于整个网页内容的爬取，简单易写，非常适合对python爬虫的学习。

Python网页图片爬虫: Python2.7 的适合新手学习

python网页文本爬虫 (2).docx: python网页文本爬虫 (2).docxpython网页文本爬虫 (2).docxpython网页文本爬虫 (2).docxpython网页文本爬虫 (2).docxpython网页文本爬虫 (2).docxpython网页文本爬虫 (2).docxpython网页文本爬虫 (2).docxpython网页...

Python简单网页爬虫示例: 利用Python实现了网页爬虫简单示例，包括下载图片、下载题目和获取大学排名3个例子，用到的库有bs4库和requests库

自己动手,用Python实现网络爬虫: 自己动手,用Python实现网络爬虫自己动手,用Python实现网络爬虫

基于Python的网络爬虫技术: 1基于Python的网络爬虫网络爬虫又称网络蜘蛛，或网络机器人。网络爬虫通过网页的链接地址来查找网页内容，并直接返回给用户所需要的数据，不需要人工操纵浏览器获取。脚daon是一个广泛使用的脚本语言，其自带了...

python网页文本爬虫: 网络爬虫（又被称为网页蜘蛛，网络机器人），是一种按照一定的规则，自动的抓取万维网信息的程序或者脚本。各大搜索引擎都用爬虫缓存各种url，提供搜索服务。高级爬虫技术难度是很高的，要考虑很多，比如连接优化，...

基于python的网络爬虫爬取天气数据及可视化分析python大作业，课程设计报告: 基于python的网络爬虫爬取天气数据及可视化分析 python程序设计报告源代码+csv文件+设计报告 python期末简单大作业（自己写的，重复率低）利用python爬取了网站上的城市天气，并用利用可视化展示，有参考文献有...

基于Python的网页数据爬虫设计分析.pdf: 基于Python的网页数据爬虫设计分析.pdf

整理的用Python编写的爬虫文档: 整理的用Python编写的爬虫文档 [Python]网络爬虫（一）：网络爬虫的定义网络爬虫，即Web Spider，是一个很形象的名字。把互联网比喻成一个蜘蛛网，那么Spider就是在网上爬来爬去的蜘蛛。网络蜘蛛是通过网页的链接...

基于Python的网络爬虫技术_钱程: 基于Python的网络爬虫技术 PYTHON网络爬虫源代码基于Python的网络爬虫可以方便地抓取网页信息,以豆瓣网站为例,实现了基于Python网络爬虫抓取豆瓣影视信息的过程。

Python网络爬虫技术_习题答案.rar: Python网络爬虫技术_习题答案.rar

Python网络爬虫网页爬虫: Python网络爬虫网页爬虫

Python 如何通过爬虫实现GitHub网页的模拟登录 Python源码: Python 如何通过爬虫实现GitHub网页的模拟登录 Python源码Python 如何通过爬虫实现GitHub网页的模拟登录 Python源码Python 如何通过爬虫实现GitHub网页的模拟登录 Python源码Python 如何通过爬虫实现GitHub网页的...

python3 爬虫爬取静态网页和动态网页下载图片案例.rar: python3 爬虫爬取静态网页和动态网页下载图片案例.rar python3 爬虫爬取静态网页和动态网页下载图片案例.rar python3 爬虫爬取静态网页和动态网页下载图片案例.rar python3 爬虫爬取静态网页和动态网页下载图片案例....

初学python制作网页爬虫: 可用于初始学习Python，制作一个简单有趣的网页内容爬虫，并可以提升相关动手能力与思考。

Python网页爬虫1: Python网页爬虫1

Global site tag (gtag.js) - Google Analytics