浏览 3275 次
精华帖 (0) :: 良好帖 (0) :: 新手帖 (2) :: 隐藏帖 (0)
|
|
---|---|
作者 | 正文 |
发表时间:2010-01-12
最后修改:2010-01-12
我使用的是Simple-RSS来抓取。 Simple-RSS介绍: Simple RSS is a simple, flexible, extensible, and liberal RSS and Atom reader for Ruby gem install simple-rss 代码如下: require 'rubygems' require 'simple-rss' #gem install simple-rss require 'open-uri' require 'net/http' ## #常量定义 FS_LEN = 80 #抓取javascript最火blog的rss,这里简单一点,做个示范 ## #获得blog名 def get_blogs_names() begin blog_url = "http://www.iteye.com/blogs" blog_html = Net::HTTP.get(URI.parse(blog_url)) reg = /http:\/\/([a-zA-z]{4,}).iteye.com/ match = blog_html.scan(reg) rescue Exception=>ex puts ex puts "blog exit" return [] end match.flatten end ## # 获得blog rss def get_blog_rss(names) return if names.nil? or names.size.eql?(0) names.uniq! for name in names puts name + "*" * FS_LEN begin rss = SimpleRSS.parse open("http://#{name}.iteye.com/rss") rescue Exception=>ex puts ex puts "rss exit" end puts "-" * FS_LEN puts rss.channel.title puts rss.channel.link for item in rss.items puts "-"*50 puts "title:" + item.title.to_s puts "description:" + item.description.to_s puts "link:" + item.link.to_s puts "pubDate:" + item.pubDate.to_s puts "guid:" + item.guid.to_s puts "category:" + item.category.to_s end end end ## #RUN names = get_blogs_names() get_blog_rss(names) 声明:ITeye文章版权属于作者,受法律保护。没有作者书面许可不得转载。
推荐链接
|
|
返回顶楼 | |
发表时间:2010-01-20
赞,不过 不是说javaeye做了防采集吗,小心封IP哦,哈。
|
|
返回顶楼 | |