Htmlparse解析HTML文档（例）

hao861002

浏览: 86415 次
性别:
来自: 上海

最近访客更多访客>>

sdx0312

yangke9024

iris19860111

jifashi110

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java软件

HTML Java .net

import java.util.HashMap;
import java.util.Map;
import org.htmlparser.Node;
import org.htmlparser.NodeFilter;
import org.htmlparser.Parser;
import org.htmlparser.tags.LinkTag;
import org.htmlparser.util.NodeList;
import com.yao.http.HttpRequester;
import com.yao.http.HttpRespons;

/**
* JAVA中使用Htmlparse解析HTML文档，使用htmlparse遍历出HTML文档的所有超链接（<a>标记）。
*
* @author YYmmiinngg
*/
public class Test {
    public static void main(String[] args) {
        try {
/* 首先我们先使用HttpRequester类和HttpRespons类获得一个HTTP请求中的数据（HTML文档）。可以从(http://download.csdn.net/source/321516)中下载htmlloader，该库中有上述类；或从我的《JAVA发送HTTP请求，返回HTTP响应内容，实例及应用》一文中摘取上述两JAVA类的代码。htmlparse可以从(http://download.csdn.net/source/321507)中下载
*/
            Map<String, String> map = new HashMap<String, String>();
            HttpRequester request = new HttpRequester();
            HttpRespons hr = request.sendGet("http://www.baidu.com");
            Parser parser = Parser.createParser(hr.getContent(), hr
                    .getContentEncoding());
            try {
                // 通过过滤器过滤出<A>标签
                NodeList nodeList = parser
                        .extractAllNodesThatMatch(new NodeFilter() {
                            //实现该方法,用以过滤标签
                            public boolean accept(Node node) {
                                if (node instanceof LinkTag)//标记
                                    return true;
                                return false;
                            }
                        });
                // 打印
                for (int i = 0; i < nodeList.size(); i++) {
                    LinkTag n = (LinkTag) nodeList.elementAt(i);
                    System.out.print(n.getStringText() + " ==>> ");
                    System.out.println(n.extractLink());
                }
            } catch (Exception e) {
                e.printStackTrace();
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

分享到：

Log4J使用完全手册 | HttpClient

2008-12-08 16:43
浏览 2282
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论