你们都用什么来做爬虫的

pyzheng

浏览: 3438807 次
性别:
来自: 珠海

最近访客更多访客>>

kopomimi

oszerone

lindow

leisure0422

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2016-06 ( 26)
2016-05 ( 32)
2016-04 ( 33)
更多存档...

博客分类：

Python&爬虫

看这里的回复 http://www.v2ex.com/t/62657

42 回复 | 直到 2013-03-18 23:08:21 PM
     1
for4   200 天前   ♥ 3
Python
+requests
+lxml
+celery
     2
xdeng   200 天前
@for4 -.-! 要学这么多东西啊
     3
for4   200 天前
@xdeng
第一个是编程语言
后面三个是可能需要用到的库

这是我认为的写一个爬虫最简单易学的搭配
     4
xieren58   200 天前
Node + jquery
     5
liuxurong   200 天前
我是 requests + pyquery

另外
@for4 celery通常用来做什么
     6
xdeng   200 天前
@xieren58
@liuxurong 这个网站里的全都是做网页的么
     7
shinwood   200 天前   ♥ 2
试过python + Scrapy，感觉不错。

http://scrapy.org/
     8
greatghoul   200 天前
@shinwood 这个用起来的确骚爽。
     9
colincat   200 天前 via Android
java
     10
for4   200 天前   ♥ 1
@liuxurong
我是把爬虫的各个功能部分分成小任务, 然后按需放入任务队列中. 这样既能有效的降低爬虫的复杂度, 同时用队列也能提高爬虫的稳健度, 比如失败重做.
还有, 使用celery后你的爬虫就变成分布式的了, 可以简单的布置在多台机器上跑
     11
wingoo   200 天前
scrapy
     12
twm   200 天前
JAVA PHP
     13
dulao5   200 天前
PHP + curl_multi_*

不过以后应该尝试nodejs了，并发容易实现，解析页面里的js更有优势。
     14
xjay   200 天前
scrapy
不解释
     15
PrideChung   200 天前
ruby+norogiri
http://nokogiri.org/
     16
amxku   199 天前
Python
+curl
+celery
     17
1up   199 天前
http://www.gregreda.com/2013/03/03/web-scraping-101-with-python/ Web Scraping 101 with Python
     18
cloverstd   199 天前
Python: urllib, urllib2, re
     19
sobigfish   199 天前
前几天用nodejs写个玩，但不知道怎么部署在只有web服务的 PaaS上－，－
cheerio很好用阿，完全是jQuery的语法。

require('http');require('cheerio');require('iconv').Iconv;require('mongodb');
     20
chuck911   199 天前
还有人写个爬虫还非要用芹菜...

Scrapy爽是因为它基于事件驱动的Twisted，我以前也很爱Scrapy，后来用上Node写爬虫就感觉从重型土炮换到了肩扛火箭筒
     21
atom   199 天前
@twm
@colincat
同为javaer，能否推荐下是哪个库？
     22
sohoer   199 天前
@atom
JAVA?
HttpURLConnection + Regex = Spider
     23
Linxing   199 天前 via Android
python beautifulsoup urlib爬文章
     24
liuxurong   199 天前
@for4 谢谢。有没有celery的中文资料
     25
crazybubble   199 天前   ♥ 1
@atom 用regex来做html parsing不推荐，我推荐用jsoup。
     26
colincat   199 天前 via iPhone
@sohoer htmlparse httpclient
     27
workaholic   199 天前   ♥ 1
php+snoopy
     28
akalanala   199 天前
@crazybubble 同推荐.
     29
binux   199 天前
python + tornado AsyncHTTPClient + PyQuery
     30
sonicwu   199 天前
Java
+ jsoup

Python
+ Beautiful Soup
+ urllib
+ lxml
     31
dingyaguang117   199 天前
Python
+ Beautiful Soup
+ lxml
+ Scrapy
     32
atom   199 天前
@crazybubble
是个很棒的库，看到 http://try.jsoup.org/ 我就喜欢上它了
     33
zoran   198 天前
Java 可以试试这个 https://github.com/zhuoran/crawler4j
     34
yangxin0   198 天前
看过别人用C
     35
Xrong   198 天前
希望大家给推荐PHP的，毕业设计打算用这玩意写；都说用PHP写不大方便，但是还是希望大伙有写过的，提供源码参考下，有在线资源的也行。
     36
zdwalter   197 天前
phantomjs, casperjs
     37
zhouquanbest   196 天前
python + pyquery 是个好东西
会jquery就能写
     38
nojt7Zm   194 天前
php
     39
kingwkb   194 天前
之前用python，现在换到ruby

http://s.yanghao.org/
     40
gameending   194 天前
python跟java都写过，python很简洁，java的话我觉得也还不错
     41
lbj96347   194 天前
node.js or python. :-)
     42
kdepp   82 天前
node + cheerio

分享到：

python + request + pyquery[安装失败] | [Python]网络爬虫

2013-09-28 20:30
浏览 2090
评论(0)
分类:Web前端
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论