`
simohayha
  • 浏览: 1403216 次
  • 性别: Icon_minigender_1
  • 来自: 火星
社区版块
存档分类
最新评论

Peter Norvig用python写的拼写纠错

阅读更多
文章在这里:
http://www.norvig.com/spell-correct.html
import re, string, collections

def words(text): return re.findall('[a-z]+', text.lower()) 

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

NWORDS = train(words(file('Documents/holmes.txt').read()))

def edits1(word):
    n = len(word)
    return set([word[0:i]+word[i+1:] for i in range(n)] + ## deletion
               [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] + ## transposition
               [word[0:i]+c+word[i+1:] for i in range(n) for c in string.lowercase] + ## alteration
               [word[0:i]+c+word[i:] for i in range(n+1) for c in string.lowercase]) ## insertion

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    return max(known([word]) or known(edits1(word)) or known_edits2(word) or [word],
               key=lambda w: NWORDS[w])


牛人就是牛人,这几行代码是在飞机上写的.
分享到:
评论
1 楼 bruce.fine 2007-07-30  
佩服佩服,理解中

相关推荐

Global site tag (gtag.js) - Google Analytics