最近用使开发的过程中出现了一个小问题,顺便记录一下原因和方法--安装下载
Scrapy是一个开源的遇机twisted框架的python的单机爬虫,该爬虫实际上包括大多数网页抓取的工具包,用于爬虫下载端以及取抽端。
安装环境:
centos5.4 python2.7.3
安装步调:
1.下载python2.7 http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz
[root@zxy-websgs ~]# wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz -P /opt
[root@zxy-websgs opt]# tar xvf Python-2.7.3.tgz
[root@zxy-websgs Python-2.7.3]# ./configure
[root@zxy-websgs Python-2.7.3]# make && make install
验证python2.7安装
[root@zxy-websgs Python-2.7.3]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
2.安装setuptools,http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz
[root@zxy-websgs ~]# wget http://pypi.python.org/packages/source/s/setuptools/setuptools-0.6c11.tar.gz -P /opt/
[root@zxy-websgs opt]# tar zxvf setuptools-0.6c11.tar.gz
[root@zxy-websgs setuptools-0.6c11]# python2.7 setup.py install
3.安装Twisted
[root@zxy-websgs setuptools-0.6c11]# easy_install Twisted
......
Installed /usr/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg
......
Installed /usr/local/lib/python2.7/site-packages/zope.interface-4.0.4-py2.7-linux-x86_64.egg
Twisted要安装zope.interface,可以从面下地址下载
zope.interface:http://pypi.python.org/packages/source/z/zope.interface/zope.interface-4.0.1.tar.gz
twisted:http://twistedmatrix.com/Releases/Twisted/12.1/Twisted-12.1.0.tar.bz2
5.安装w3lib
[root@zxy-websgs setuptools-0.6c11]# easy_install -U w3lib
Searching for w3lib
Reading http://pypi.python.org/simple/w3lib/
Reading http://github.com/scrapy/w3lib
Best match: w3lib 1.2
Downloading http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz#md5=f929d5973a9fda59587b09a72f185a9e
Processing w3lib-1.2.tar.gz
Running w3lib-1.2/setup.py -q bdist_egg --dist-dir /tmp/easy_install-wm_1BB/w3lib-1.2/egg-dist-tmp-2DQHY_
zip_safe flag not set; analyzing archive contents...
Adding w3lib 1.2 to easy-install.pth file
Installed /usr/local/lib/python2.7/site-packages/w3lib-1.2-py2.7.egg
Processing dependencies for w3lib
Finished processing dependencies for w3lib
w3lib:http://pypi.python.org/packages/source/w/w3lib/w3lib-1.2.tar.gz
6.安装libxml2或者用easy_install安装lxml
[root@zxy-websgs lxml-3.1.0]# easy_install lxml
验证lxml安装
[root@zxy-websgs lxml-3.1.0]# python2.7
Python 2.7.3 (default, Feb 28 2013, 03:08:43)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
>>> exit()
也可以安装libxml2,官网上荐推安装2.6.28或者以上的本版,但在官网上没找到,我先是安装的2.6.9的本版,行运scrapy时报以下错误
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 5, in <module>
pkg_resources.run_script('Scrapy==0.14.4', 'scrapy')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in run_script
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/EGG-INFO/scripts/scrapy", line 4, in <module>
execute()
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 112, in execute
cmds = _get_commands_dict(inproject)
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 37, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 30, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/cmdline.py", line 21, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/utils/misc.py", line 65, in walk_modules
submod = __import__(fullpath, {}, {}, [''])
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/commands/shell.py", line 8, in <module>
from scrapy.shell import Shell
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/shell.py", line 14, in <module>
from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/__init__.py", line 30, in <module>
from scrapy.selector.libxml2sel import *
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/libxml2sel.py", line 12, in <module>
from .factories import xmlDoc_from_html, xmlDoc_from_xml
File "/usr/local/lib/python2.7/site-packages/Scrapy-0.14.4-py2.7.egg/scrapy/selector/factories.py", line 14, in <module>
libxml2.HTML_PARSE_NOERROR + \
AttributeError: 'module' object has no attribute 'HTML_PARSE_RECOVER'
升级到2.6.21本版当前决解了。
libxml2.6.1:ftp://xmlsoft.org/libxml2/python/libxml2-python-2.6.21.tar.gz
7.安装pyOpenSSL(这个是可选安装的,重要为了使scrapy够能持支https)
用easy_install pyOpenSSL安装的是pyOpenSSL-0.13本版,没安装功成,于是手动下载.011本版来停止安装。
[root@zxy-websgs opt]# wget http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz -P /opt
[root@zxy-websgs opt]# tar zxvf pyOpenSSL-0.11.tar.gz
[root@zxy-websgs pyOpenSSL-0.11]# python2.7 setup.py install
pyOpenSSL:http://launchpadlibrarian.net/58498441/pyOpenSSL-0.11.tar.gz
8.安装scrapy
[root@zxy-websgs pyOpenSSL-0.11]# easy_install -U Scrapy
验证安装
[root@zxy-websgs pyOpenSSL-0.11]# scrapy
Scrapy 0.16.4 - no active project
Usage:
scrapy <command> [options] [args]
Available commands:
fetch Fetch a URL using the Scrapy downloader
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy
[ more ] More commands available when run from project directory
Use "scrapy <command> -h" to see more info about a command
scrapy:http://pypi.python.org/packages/source/S/Scrapy/Scrapy-0.14.4.tar.gz
结总:
pyOpenSSL独自安装的时候不功成,也可以先下载pyOpenSSL0.11停止安装,再用使easy_install -U Scrapy停止全程安装
文章结束给大家分享下程序员的一些笑话语录: 火车
一个年轻的程序员和一个项目经理登上了一列在山里行驶的火车,他们发现 列车上几乎都坐满了,只有两个在一起的空位,这个空位的对面是一个老奶 奶和一个年轻漂亮的姑娘。两个上前坐了下来。程序员和那个姑娘他们比较 暧昧地相互看对方。这时,火车进入山洞,车厢里一片漆黑。此时,只听见 一个亲嘴的声音,随后就听到一个响亮的巴掌声。很快火车出了山洞,他们 四个人都不说话。
那个老奶奶在喃喃道, “这个年轻小伙怎么这么无礼, 不过我很高兴我的孙女 扇了一个巴掌”。
项目经理在想,“没想到这个程序员居然这么大胆,敢去亲那姑娘,只可惜那 姑娘打错了人,居然给打了我。”
漂亮的姑娘想,“他亲了我真好,希望我的祖母没有打疼他”。
程序员坐在那里露出了笑容, “生活真好啊。 这一辈子能有几次机会可以在亲 一个美女的同时打项目经理一巴掌啊”
相关推荐
CentOS6.5 linux64位系统安装scrapy框架
完全离线安装python3,离线安装scrapy,及各种安装过程中需要的系统包和python包
主要介绍了Centos7 Python3下安装scrapy的详细步骤,小编觉得挺不错的,现在分享给大家,也给大家做个参考。一起跟随小编过来看看吧
CentOS7下安装Scrapy步骤详细介绍 更新yum [root@localhost ~]# yum -y update 安装gcc及扩展包 [root@localhost ~]# yum install gcc libffi-devel python-devel openssl-devel 安装开发工具包 [root@localhost ...
自己亲自测试 保证可以用,需要安装python2.7 还有scrapy,测试是在centos7下完成的
v6.0.14 CentOS_v7.1908 Python_v3.8.0 MongoDB_v3.2.22 pip_v19.3.1 windows Pycharm-2019.2.3企业版(社区版不支持远程连接linux开发),需激活使用系统自带Python2.7.5不做处理,尝试过pyenv安装3.8,但是到安装...
主要介绍了在Linux系统上安装Python的Scrapy框架的教程,Scrapy是著名的专门针对搜索引擎的爬虫制作而研发的Python框架,需要的朋友可以参考下
先说下自己的环境,redis是部署在centos上的,爬虫运行在windows上, 1. 安装redis yum install -y redis 2. 修改配置文件 vi /etc/redis.conf 将 protected-mode no解注释,否则的话,在不设置密码情况下远程无法...
CentOS 安装python3.5.3 wget https://www.python.org/ftp/python/3.5.3/Python-3.5.3.tgz tar -xf Python-3.5.3.tgz cd Python-3.5.3 ./configure --prefix=/usr/local/python353 make & make install 完成安装...
rust-1.48.0-x86_64-pc-windows-gnu rust 稳定版本,国外下载慢,放在这里
centos 7 python3.6.4 安装 scrapy 框架使用依赖 送上 免积分地址 https://pypi.org/
Scrapy是python中鼎鼎大名的爬虫框架,笔者在Centos 7系统之上进行安装,发现了如下问题: >> pip install scrapy 由于安装过程中的过程信息比较多,这里只列出了其中的关键片段信息: running egg_info ...
Python 2.4的中文手册,rar包,html格式.当然也可以做2.5的参考了,改变不多,精髓不变....
重装系统:Openssl恢复一、现象二、原因三、解决方法(一)下载:msvcr120.dll(二)下载diretx修复工具继续修复四、成功 一、现象 运行scrapy时遇到报错 twisted.web._newclient.ResponseNeverReceived: [] 重装...
scrapy django增量抓取天涯莲蓬鬼话全部帖子,并简单展示,部署至centos
数据来源于百度百科,scrapy爬虫目录在scripts/univer/目录下 然后将数据存储到neo4j中 对问题进行解析 通过actree得到实体,关键字得到问题类型,给出答案。 使用步骤 1 环境,个人python3.6.8, centos系统(阿里云...
服务端是阿里云的CentOS-7 + Play! + Scala + Docker + Apache Mahout, 爬虫是Python写的,基于Scrapy框架;安卓客户端用Android Studio开发,iOS客户端基于swift。代码开源在:, 可以在 扫二维码体验。服务端提供...