Hpricot

wutao8818

浏览: 624789 次
性别:
来自: 杭州

最近访客更多访客>>

KevinTeng

malson

rapin

shi007

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

ruby

Ruby CSS HTML .net

A Fast, Enjoyable HTML Parser for Ruby

http://code.whytheluckystiff.net/hpricot/-->http://wiki.github.com/why/hpricot

#!ruby
 doc.search("//p[@class='posted']")
 #=> #<Hpricot:Elements[{p ...}, {p ...}]>

#!ruby
 (doc/"p.posted")
 #=> #<Hpricot:Elements[{p ...}, {p ...}]>

查找所有p元素并且 class 为 posted的元素

#!ruby
 doc.at("body")['onload']

返回第一个元素,最常见的愿意是直接使用元素，读写元素属性。

#!ruby
 (doc/"#elementID").inner_html
 #=> "..<b>contents</b>.."

获得元素内容。

#!ruby
 (doc/"#elementID").first.inner_html
 #=> "..<b>contents</b>.."

如果匹配很多，可以是用.first 访问第一个元素。

#!ruby
 (doc/"#elementID").to_html
 #=> "<div id='elementID'>...</div>"

获得整个元素，包括外面的元素。

#!ruby
 (doc/"p/a/img").each do |img|
   puts img.attributes['class']
 end

所有的搜索返回的都是元素集合。可以循环遍历。

#!ruby
 doc.search("div.entryPermalink").at("a")

返回class为entryPermalink 第一个div,第一个a元素

#!ruby
 # find all paragraphs.
 elements = doc.search("/html/body//p")
 # continue the search by finding any images within those paragraphs.
 (elements/"img")
 #=> #<Hpricot::Elements[{img ...}, {img ...}]>

继续搜索。

#!ruby
 # the xpath version  			XPATH方式
 (doc/"/html/body//p//img")
 # the css version 			CSS方式
 (doc/"html > body > p img")
 # ..or symbols work, too!  		标记方式
 (doc/:html/:body/:p/:img)


#!ruby
 (doc/"span.entryPermalink").each do |span|
   span.attributes['class'] = 'newLinks'
 end
 puts doc

循环编辑

#!ruby
 doc.search("div.entryPermalink").search("a") do |link|
   pp link
 end.search("span") do |span|
   pp span
 end

搜索所有 class 为 entryPermalink的div元素中的a 元素。遍历
然后搜索所有 class 为 entryPermalink的div元素中的span元素，遍历打印。

Using CSS Selectors CSS选择器

#!ruby
 doc = Hpricot(open("qwantz.html"))
 (doc/'div img[@src^="http://www.qwantz.com/comics/"]')
   #=> Elements[...]

找到div里面img 元素中 src属性为http://www.qwantz.com/comics/的元素。

#!ruby
 puts doc.search('#menu').inner_html

找到id为menu的元素,显示内部数据

#!ruby
 puts doc.search("span").length

得到span类型的元素，length显示了数组的长度

#!ruby
 puts (doc/:span).length

找到span元素，与上面的作用相同

Selecting by Class

#!ruby
 doc.search(".entryTitle").each do |title|
   puts title.inner_html
 end

找到class属性为entryTitle的所有元素

#!ruby
 (doc/"div.entryTitle").remove

XPATH也可以是实现类似的目标，//div[@class='entryTitle']
但是css方式似乎更出众。例如一个元素有多个class属性，那XPATH就无能为力了。

<div class="entryTitle dark">

#!ruby
 (doc/"div[@class~='entryTitle']").remove

#!ruby
 (doc/"div.entryPermalink a").empty

找到所有的class 为 entryPermalink 的div元素下的 a元素

如果用css选择器栈方式来实现是

#!ruby
 (doc/"div.entryPermalink"/"a").empty

#!ruby
 doc.search("div.entryPermalink > a").
   prepend("<b>found you on the left</b>").
   append("<b>found you on the right</b>")

如果希望搜索的对象是直接子元素，需要 > 符号。

#!ruby
 doc.search("input[@checked]")

搜索input字段type为checked

#!ruby
 doc.at("a[@name='part_two']")

搜索name属性为part_two的 a 对象

#!ruby
 doc.search("*[@onclick*='document.location']").each do |ele|
   ele.remove_attribute('onclick')
 end

搜索所有onclick属性为document.location的对象。

#!ruby
 doc.search("p:not(.blue)")

非选择符。P元素class属性不是blue的元素集合

Searching Hpricot With XPath

#!ruby
 doc = Hpricot(URI.parse("http://we-make-money-not-art.com/").read)
 (doc/'//div/img')
   #=> Elements[...]

目前有些限制。
/ 开头的搜索可以使用xpath方式。

分享到：

rails2 code | 部署python django自定义版本

2008-08-06 10:19
浏览 3355
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hpricot

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hpricot

评论

发表评论

相关推荐

AgileWebDevelopmentWithRails3rdEditionBeta2008-6-4

jRails

rails2 code

Practical.REST.on.Rails.2.Projects笔记

看Rails2怎么贯彻REST思想的

linux ubuntu 下ror开发ide gedit

rails 2.0中depot开发笔记

Rails 2.0: Scaffold

10 Flex and Ruby on Rails Integration Examples

Top 12 Ruby on Rails Tutorials

最近访客更多访客>>