简单的PYTHON应用(使用了urllib, re等库） - 专注 - ITeye博客

`

kingoal

浏览: 156578 次
性别:
来自: 北京

最近访客更多访客>>

xuabi

jms1209

warjiang

windjian

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

docong： [root@docong boost]# g++ bind.c ...
boost::bind使用例子
zhangyafei_kimi：这个软件有多大？
slickedit 2009的新功能如符号高亮（有截图）

简单的PYTHON应用(使用了urllib, re等库）

博客分类：

Python

Python 网络应用 HTML

阅读更多

下面是一个简单的PYTHON应用，主要是使用了python的urllib,re等库，非常简单，可以作为其他的python在网络方面应用的模板（使用Python3测试）

#!/usr/bin/env python
import sys
import re
import urllib.request
from urllib.parse import urlparse
def download(url,flag):
    try:
        fd=urllib.request.urlopen(url) # Open the URL and get the file description
        page=fd.read() # Get the index page html content
        unicodePage=page.decode('gb2312') # Get the unicode page html content. Can display chinese character
        tempURL=urlparse(url)
        tempURL=tempURL.geturl()
        tempLIST=tempURL.split('/')
        fileName=tempLIST[-1] ##Get the file name via URL
        path=tempURL[0:tempURL.index(fileName)] ## get the path info
        print("Downloading: ",tempURL,";Saving: ",fileName)
   
        writefd=open(fileName,'w') ## get the write file description
        writefd.write(unicodePage) ## write to the file
        writefd.close()
    except:
        pass
   
    if flag==1: # flag==1 shows that the page is the index page
        """
        first get the url list
        then call download to download the url and saving the html to file
        """
        pattern=r'a href="([^"]+)"'
        linklist=re.findall(pattern,unicodePage)
       
        for item in linklist:
            if not item.startswith('http'):
                temp=path+item.strip()
                print("!!!!....",temp)
                download(temp,0)
if len(sys.argv)
url=sys.argv[1] # Get the URL address
download(url,1)

分享到：

windows下使用apache配置django应用 | ptr_fun学习笔记

2009-03-17 20:56
浏览 2902
评论(0)
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

python3网络爬虫笔记与实战源码.zip: 记录python爬虫学习全程笔记、参考资料和常见错误，约40个爬取实例与思路解析，涵盖urllib、requests、bs4、jsonpath、re、 pytesseract、PIL等常用库的使用。爬虫（Web Crawler）是一种自动化程序，用于从互联网...

Python核心编程第二版: 　10.3.7　在应用使用我们封装的函数　　10.3.8　else子句　　10.3.9　finally子句　　10.3.10　try-finally语句　　10.3.11　try-except-else-finally：厨房一锅端　 10.4　上下文管理　　10.4.1　with语句　 ...

零基础Python爬虫48小时速成课.txt: 02 1.02爬虫技术库及反爬说明.mp4 03 1.03百度搜索及文件下载.mp4 04 1.04百度翻译之urllib的POST请求.mp4 05 1.05复杂的GET请求多页数据.mp4 06 1.06urllib的build_opener及handlers.mp4 07 1.07上下文扩展和...

Python核心编程第二版(ok): 　10.3.7　在应用使用我们封装的函数　　10.3.8　else子句　　10.3.9　finally子句　　10.3.10　try-finally语句　　10.3.11　try-except-else-finally：厨房一锅端　 cccc10.4　上下文管理　　10.4.1　with...

Python-Penetration-Testing-Cookbook:Packt发行的《 Python渗透测试手册》: 该代码将如下所示： import urllib.requestimport urllib.parseimport refrom os.path import basename基本上，这是一台装有Python的计算机。可以使用虚拟机来模拟易受攻击的计算机并进行测试。相关产品

零基础python爬虫48小时速成: 021.02爬虫技术库及反爬说明.mp4 031.03百度搜索及文件下载.mp4 041.04百度翻译之urllib的POST请求.mp4 051.05复杂的GET请求多页数据.mp4 061.06urllib的build_opener及handlers.mp4 071.07上下文扩展和Dao设计，mp4...

pythoncourse:自动学习Python: datetime，time，json，集合，dateutil 抓取数据：lxml，re，requests，urllib，urlparse 使用文件：os，sys，glob 处理数据：pandas和numpy 使用数据库：pandas，csv，sqlite3，sqlalchemy 建立一个网络应用程序：...

python人人网登录应用实例: 本文实例讲述了python人人网登录应用的实现方法，分享给大家供大家参考。具体方法如下： import re import urllib import urllib2 import cookielib import datetime import time from urllib2 import URLError,...

Python基础教程（第3版）-201802出版-文字版: 久负盛名的 Python 入门经典针对 Python 3 全新升级十个出色的项目，让你尽快可以使用 Python 解决实际问题目录第 1章快速上手：基础知识 ........................ 1 1.1 交互式解释器 .............................

review-analysis-and-chatbot:电子商务应用程序和聊天机器人中基于信誉的信任评估: Ubuntu 16 •语言：Python 3.1+版本•算法：朴素贝叶斯•数据库：SQLite 3 •IDE：Pycharm •Web框架工作：Django •包：请求，dateutil，bs，beatifulSoup4，urllib，验证器，lxml，nltk，时间，re，通道，字符串，...

大数据项目开发实训.pdf: # -*- coding: utf-8 -*- import scrapy from wuyou.items import WuyouItem import re import urllib.parse class WuyouSpider(scrapy.Spider): name = 'Wuyou' allowed_domains = ['51job.com'] # 全国 000000 # ...

Global site tag (gtag.js) - Google Analytics