与网页内容抽取相关的文献

touchinsert

浏览: 1340482 次
性别:
来自: 北京

最近访客更多访客>>

yin_bp

u012363178

wangyy

dongguangming88

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (1629)

社区版块

存档分类

2012-01 ( 61)
2011-12 ( 68)
2011-11 ( 26)
更多存档...

网页内容抽取是指从网页中抽取大块内容。例如新闻正文抽取等。以下为一些相关的文献。

[1] Ziegler, C. & Skubacz, M. Content Extraction from News Pages Using Particle Swarm Optimization on Linguistic and Structural Features WI '07: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, 2007, 242-249
[2] Reis, D. C.; Golgher, P. B.; Silva, A. S. & Laender, A. F. Automatic web news extraction using tree edit distance WWW '04: Proceedings of the 13th international conference on World Wide Web, ACM, 2004, 502-511
[3] Gupta, S.; Kaiser, G.; Neistadt, D. & Grimm, P. DOM-based content extraction of HTML documents WWW '03: Proceedings of the 12th international conference on World Wide Web, ACM, 2003, 207-214
[4] Gupta, S.; Kaiser, G. E.; Grimm, P.; Chiang, M. F. & Starren, J. Automating Content Extraction of HTML Documents World Wide Web, Kluwer Academic Publishers, 2005, 8, 179-224
[5] Gupta, S.; Kaiser, G. & Stolfo, S. Extracting context to improve accuracy for HTML content extraction WWW '05: Special interest tracks and posters of the 14th international conference on World Wide Web, ACM, 2005, 1114-1115
[6] Gupta, S.; Becker, H.; Kaiser, G. & Stolfo, S. Verifying genre-based clustering approach to content extraction WWW '06: Proceedings of the 15th international conference on World Wide Web, ACM, 2006, 875-876
[7] Gibson, J.; Wellner, B. & Lubar, S. Adaptive web-page content identification WIDM '07: Proceedings of the 9th annual ACM international workshop on Web information and data management, ACM, 2007, 105-112
[8] Lin, S. & Ho, J. Discovering informative content blocks from Web documents KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2002, 588-593
[9] Debnath, S.; Mitra, P. & Giles, C. L. Automatic extraction of informative blocks from webpages SAC '05: Proceedings of the 2005 ACM symposium on Applied computing, ACM, 2005, 1722-1726
[10] 王琦, 唐世渭, 杨冬青, 王腾蛟. 基于DOM 的网页主题信息自动提取. 计算机研究与发展, 2004年第41卷10期.
[11] 胡国平, 张巍, 王仁华. 基于双层决策的新闻网页正文精确抽取. 中文信息学报, 2006年第20卷06期.
[12] 孙承杰, 关毅. 基于统计的网页正文信息抽取方法的研究. 中文信息学报, 2004年第18卷05期.
[13] 黄文蓓, 杨静, 顾君忠. 基于分块的网页正文信息提取算法研究. 计算机应用, 2007 年第27卷.
[14] 赵欣欣, 索红光, 刘玉树. 基于标记窗的网页正文信息提取方法. 计算机应用研究, 2007年第24卷03期.
[15] 赵文, 唐建雄, 高庆锋. 基于统计的中文网页正文抽取的研究. 电脑知识与技术, 2008年第1卷1期.