java去除html标签

wandejun1012

浏览: 2749985 次
性别:
来自: 上海

最近访客更多访客>>

da19900

john521

冬天的狼

magicdyq

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java

利用正则表达式即可，见如下代码：

import java.util.regex.Matcher; 
import java.util.regex.Pattern; 

public class HTMLSpirit{ 
    public static String delHTMLTag(String htmlStr){ 
        String regEx_script="<script[^>]*?>[\\s\\S]*?<\\/script>"; //定义script的正则表达式 
        String regEx_style="<style[^>]*?>[\\s\\S]*?<\\/style>"; //定义style的正则表达式 
        String regEx_html="<[^>]+>"; //定义HTML标签的正则表达式 
         
        Pattern p_script=Pattern.compile(regEx_script,Pattern.CASE_INSENSITIVE); 
        Matcher m_script=p_script.matcher(htmlStr); 
        htmlStr=m_script.replaceAll(""); //过滤script标签 
         
        Pattern p_style=Pattern.compile(regEx_style,Pattern.CASE_INSENSITIVE); 
        Matcher m_style=p_style.matcher(htmlStr); 
        htmlStr=m_style.replaceAll(""); //过滤style标签 
         
        Pattern p_html=Pattern.compile(regEx_html,Pattern.CASE_INSENSITIVE); 
        Matcher m_html=p_html.matcher(htmlStr); 
        htmlStr=m_html.replaceAll(""); //过滤html标签 

        return htmlStr.trim(); //返回文本字符串 
    } 
} 


Java中去掉网页HTML标记的方法 
Java里面去掉网页里的HTML标记的方法： 

/** 
* 去掉字符串里面的html代码。<br> 
* 要求数据要规范，比如大于小于号要配套,否则会被集体误杀。 
* 
* @param content 
* 内容 
* @return 去掉后的内容 
*/ 
public static String stripHtml(String content) { 
// <p>段落替换为换行 
content = content.replaceAll("<p .*?>", "\r\n"); 
// <br><br/>替换为换行 
content = content.replaceAll("<br\\s*/?>", "\r\n"); 
// 去掉其它的<>之间的东西 
content = content.replaceAll("\\<.*?>", ""); 
// 还原HTML 
// content = HTMLDecoder.decode(content); 
return content; 
}

参考URL：http://xiejincheng.blog.51cto.com/2307724/722731

分享到：

jquery 取指定序号的元素 | native2ascii 反向

2014-08-22 10:39
浏览 1103
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

java去除html标签

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

java去除html标签

评论

发表评论

相关推荐

普通类和线程类的区别

最近一些心得

druid连接池

罕见问题记录

Java Security

kafka java demo

autowired resource component

eclipse 部署spring源码经验之谈

main函数中如何实例化内部类

java synchronized 串行

观察者 Listener是什么意思

UTF-8和Unicode的关系

dubbo

datetime格式化

Java接口中的成员变量为什么必须是public static final?

抽象类能不能有构造函数

workspace workset设置

eclipse一直building workspace

序列化和反序列化

java.net.bindexception cannot assign requested address jvm_bind

最近访客更多访客>>