nekohtml的2个小例子

gcgmh

浏览: 358716 次
性别:
来自: 北京

最近访客更多访客>>

kevin.shi

12697459

Yan_Sunny

leoeco2000

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Parser_html

HTML

//获取网页里面的keywords和description
	public static void main(String[] argv) throws Exception {
		
		BufferedReader in = new BufferedReader(new FileReader("d:/163.html"));
		DOMParser parser = new DOMParser();
		 parser.setProperty(
			     "http://cyberneko.org/html/properties/default-encoding",
			     "gb2312");
		parser.parse(new InputSource(in));
		
		Document doc = parser.getDocument();
		NodeList list = doc.getElementsByTagName("META");
		for(int i = 0, n = list.getLength(); i < n ; i++){
			Element e = (Element) list.item(i);
			if(e.getAttribute("name").equalsIgnoreCase("keywords")){
				String keywords = e.getAttribute("content");
				System.out.println("keywords: " + keywords);
			}
			if(e.getAttribute("name").equalsIgnoreCase("description")){
				String description = e.getAttribute("content");
				System.out.println(description);
			}
		}
	} 

==========================================================================
//2、test使用DOMFragmentParser，提取所有正文，由于没有过滤一些没用的标签，所以会有没用的信息打印，这个可以再事先过滤一下。

	public static void main(String[] argv) throws Exception {
		DOMFragmentParser parser = new DOMFragmentParser();
		HTMLDocument document = new HTMLDocumentImpl();
		DocumentFragment fragment = document.createDocumentFragment();
		parser.parse("http://sports.sina.com.cn/f1/2009-09-21/20104599271.shtml", fragment);
		print(fragment, "");
	} 

	/** Prints a node's class name. */
	public static void print(Node node, String indent) {
		
//		System.out.println(indent + node.getClass().getName());
//		System.out.println(node.getNodeType());
		
		if (node.getNodeType() == Node.TEXT_NODE) {
			System.out.println(indent + node.getNodeValue());
		}
		Node child = node.getFirstChild();
		while (child != null) {
			print(child, indent + " ");
			child = child.getNextSibling();
		}
	} 

}

分享到：

nekohtml经典小例子一个 | htmlparser提取正文

2009-09-22 10:10
浏览 1726
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

nekohtml的2个小例子

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

nekohtml的2个小例子

评论

发表评论

相关推荐

HtmlCleanner结合xpath用法

java 正则表达式

htmlparser获取网页上所有有用链接的方法

htmlparser解析自定义标签功能

nekohtml使用笔记

htmlparser使用例子（全）

nekohtml经典小例子一个

htmlparser提取正文

通过百度获取天气预报

一个很好的htmlparser的学习blog

httpclient htmlparser来查询手机号相关信息

nekohtml 用法

htmlparser 精确提取的一些代码

获取meta里的keywords及description的方法

最近访客更多访客>>