jsouptest

lan13217

浏览: 504724 次
性别:

最近访客更多访客>>

jin361612388

ssj014

夜默兮

科凯20140707

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

jsoup

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;


public class T {

	/**
	 * @param args
	 * @throws IOException 
	 */
	public static void main(String[] args) throws IOException {
		Document doc = Jsoup.connect("http://www.xxxx.net/new/new_1.htm")
		  .get();
		  //.data("query", "Java")
		  //.userAgent("Mozilla")
		  //.cookie("auth", "token")
		  //.timeout(3000)
		  //.post();
		Elements resultLinks = doc.select("div.main_l_l"); 
		for(Element e:resultLinks){
			Elements tresultLinks = e.select("div.list_body a");
			for(Element te:tresultLinks){
				String href=te.attr("href");
				System.out.println("Start:"+href);
				Document art = Jsoup.connect(href)
				  .get();
				String title = art.select("h1").get(0).html();
				String content = art.select("#art_content").get(0).html();

				Pattern pattern = Pattern.compile("(?si)<!--NEWSZW_HZH_BEGIN-->(.+?)<!--NEWSZW_HZH_END-->");
				Matcher m = pattern.matcher(content);
				while (m.find()) {
					content=m.group(1);
				}
				System.out.println("*************title********************");
				System.out.println(title);
				System.out.println("*************content********************");
				System.out.println(content);

			}
		}
	}
}

分享到：

testwifi | PHP Web Service

2014-02-19 14:11
浏览 650
评论(0)
分类:非技术
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

jsouptest

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

jsouptest

评论

发表评论

相关推荐

jsoup 获取json

最近访客更多访客>>