论坛首页 → 编程语言技术论坛 →

python 面向对象入门 - 之正则表达式

全部 Ruby Python PHP Flash C++ .net Rails Flex C C# Django

浏览 2476 次

锁定老帖子主题：python 面向对象入门 - 之正则表达式精华帖 (0) :: 良好帖 (0) :: 新手帖 (1) :: 隐藏帖 (0)
作者	正文
edisonlz 等级: 性别: 文章: 82 积分: 470 来自: 北京	发表时间：2010-03-02 相关推荐: [Python从零到壹] 四.网络爬虫之入门基础及正则表达式抓取博客案例精品课件 Python从入门到精通第8章 Python中使用正则表达式（共16页）.ppt python爬虫入门(六)------正则表达式学习 python提升篇（九）---正则表达式之re.match函数微课--Python正则表达式语法与应用（83分钟）更多相关推荐 Python 该博文涵盖了如下内容： 1.正则表达式 re 2.url库 urllib 3.debug 方法 4.面向对象封装方法 #encoding=utf-8 ''' python learn regular express url : http://docs.python.org/library/re.html parse html url : http://www.boddie.org.uk/python/HTML.html author : liuzheng ''' import re import urllib #分析javaeye blog 频道 class ParseHTML: ''' parse html for infomation parse javeeye page ''' def __init__(self,url): self.url = url pass #analyses html def parse(self): sock = urllib.urlopen(self.url) html = sock.read() self.__puts(html) pass #打印html 匹配数据 def __puts(self,html): b = re.compile(r"<a href='([\w./:\\]+?)'[\s]title=([^<>]+?)[\s]target=([^<>]+?)>([^<>]+?)</a>",re.I) m = re.findall(b,html) #这里有encode 问题？，不知道，大家是否可以帮忙解答 print m if __name__ == '__main__': url = "http://www.iteye.com/blogs" p = ParseHTML(url) p.parse() if __debug__: print "debuging is %s" % __debug__ print "regular" + "* " * 30 #math str = "800-820-8800" m = re.match(r"(\d{3})-(\d{3})-(\d{4})", str) print "result : " ,m.groups() #split print "split : %s" % re.split('\W', 'Words, words, words.') #findall text = "He was carefully disguised but captured quickly by police." print "findall :%s" % re.findall(r"\w+ly",text) #sub text = "hello world!" print "sub:%s" % re.sub(r"\s+","--",text) 声明：ITeye文章版权属于作者，受法律保护。没有作者书面许可不得转载。推荐链接
返回顶楼

论坛首页 → 编程语言技术版

跳转论坛:

Global site tag (gtag.js) - Google Analytics