java 对Html的操作

zmx955

浏览: 19699 次

最近访客更多访客>>

zhangweihong_zyj

mhxc_004

lwg2001s

andseny

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

2012年7月

html java 正则表达式

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.junit.Assert;

public class MatcherTest{

public String Html() 
{ 
		HttpRequest hq = new HttpRequest();
		
       String htmlStr = hq.Request("http://www.funshion.com"); //含html标签的字符串 
         
        // System.out.println(htmlStr.length());
     
       //   String reg = "(?<=http\\://[a-zA-Z0-9]{0,100}[.]{0,1})[^.\\s]*?\\.(com|cn|net|org|biz|info|cc|tv)";
       //   Pattern p = Pattern.compile(reg,Pattern.CASE_INSENSITIVE);
       //   Matcher m = p.matcher(htmlStr);
         // System.out.println(m);
        
       
    
           String regEx_script = "<[\\s]*?script[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?script[\\s]*?>"; 
           //定义script的正则表达式.
           Pattern p = Pattern.compile(regEx_script, Pattern.CASE_INSENSITIVE);
           Matcher m = p.matcher(htmlStr);
           htmlStr = m.replaceAll(" "); 
           
           String regEx_style = "<[\\s]*?style[^>]*?>[\\s\\S]*?<[\\s]*?\\/[\\s]*?style[\\s]*?>"; 
           //定义style的正则表达式. 
           p =Pattern.compile(regEx_style, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll(" ");
           
           String regEx_html = "<[^>]+>"; 
           //定义HTML标签的正则表达式 
           p =Pattern.compile(regEx_html, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll(" ");
           
           String regEx_houhtml = "/[^>]+>"; 
           //定义HTML标签的正则表达式 
           p =Pattern.compile(regEx_houhtml, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll(" ");
           
           String regEx_spe="\\&[^;]+;";
           //定义特殊符号的正则表达式
           p =Pattern.compile(regEx_spe, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll(" ");
     
           String regEx_blank=" +";
           //定义多个空格的正则表达式
           p =Pattern.compile(regEx_blank, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll("");
          
           String regEx_table="\t+";
           //定义多个制表符的正则表达式
           p =Pattern.compile(regEx_table, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll(" ");
           
           String regEx_enter="\n+";
           //定义多个回车的正则表达式
           p =Pattern.compile(regEx_enter, Pattern.CASE_INSENSITIVE);
           m = p.matcher(htmlStr);
           htmlStr = m.replaceAll("");
           System.out.println(htmlStr);
          
      
           return htmlStr;
}

	public static void main(String[] args) {
	MatcherTest hl = new MatcherTest();
	hl.Html();
}
}

分享到：

jsoup很好很强大对html解析{待续正在学 ... | java反射

2012-07-16 13:18
浏览 956
评论(0)
分类:Web前端
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

java 对Html的操作

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

java 对Html的操作

评论

发表评论

相关推荐

简单的对 linux服务器服务检查的java程序

sikuli

识别简单验证码

Linux 安装mongodb

java连接linux

jsoup很好很强大 对html解析{待续正在学}

mongodb增删查改

最近访客更多访客>>

jsoup很好很强大对html解析{待续正在学}