casperjs爬虫

hhg08

浏览: 29666 次
性别:
来自: 北京

最近访客更多访客>>

zhangcaiyanbeyond

Taiyee

woodding2008

My-Way1992

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

爬虫

主要是抓一些网站的数据特别对于mechanize无法抓取的js产生的数据
（1）casperjs是在phatomjs基础上来的所以安装casperjs必须先安装phatomjs
   http://casperjs.org/   http://phantomjs.org/ 都有
（2）casperjs是按步来的，所以start   run then这三个方法中前两者是必须有的
（3）evaluate方法是document对外的接口，里面不能有自己定义的一些方法使用，一般都是设计到元素选择器查找之类的方法__utils__.findAll("div.rank-s ul.rank1-body li")
（4）会看到很多与then配合使用的方法，加then之后表示必须等待上一个动作完成才会运行此方法这步。
（5） captrue这个方法会把当前页面拍照，然后保存起来有时候很有用 this.wait(20000,function init(){
        // this.capture(work_path+"/"+p+"child"+".jpg")
});
（6）cookie读取与保存
    phantom.cookiesEnabled = true;
//设置访问页面的cookie
function updateCookie() {
    var cookiefile = work_path+"/cookie.txt";
    var cookies = JSON.parse(fs.read(cookiefile));
    for (var i = 0; i < cookies.length; i++) {
       phantom.addCookie(cookies[i]);
    }
}
   保存
   phantom.cookiesEnabled = true;
casper.then(function(){
    this.wait(20000,function wait_submit(){
    //--验证是否登陆成功
    this.capture(work_path+"/is_login.jpg");
    fs.write(work_path+"/cookie.txt",JSON.stringify(phantom.cookies),777);
    });
});

自己项目中用到的
1）casper = require("casper").create();
2）casper.userAgent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36";
3）casper.start("http://xxxx.com/");
4）casper.then(function(){
//模拟按键
this.sendKeys('#username','youname',{reset: true});
this.sendKeys('#password','youpassword',{keepFocus: true});
});
5)casper.wait(800,function init(){
    this.capture(work_path+"/yzm.png");
    console.log("验证码已拍照完成,等待30s")
});可以看看是不是已经登录，通过截取画面
6)日志文件输出在控制台，也可以>导引到其它文件里面
    casper.on('remote.message', function(message) {
        console.log(message);
   });
7)其它的都是一个then方法按步骤获取页面数据
var infoss = casper.evaluate(function(){
      var info = {
                    "name": name,
                };
                infos.push(info);
        return infos;
    })

分享到：

css实现省略过长文字 | capistrano部署文件设置

2017-02-17 11:43
浏览 650
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

casperjs爬虫

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

casperjs爬虫

评论

发表评论

相关推荐

casperjs 模拟登录

mechanize模拟登录与爬取数据

最近访客更多访客>>