`
sillycat
  • 浏览: 2543284 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

DiveIntoPython(十六)

阅读更多
DiveIntoPython(十六)

英文书地址:
http://diveintopython.org/toc/index.html

Chapter 17.Dynamic functions

17.1.Diving in
the rules for making singular nouns into plural nouns are varied and complex.

If you grew up in an English-speaking country or learned English in a formal school setting, you're probably familiar with the basic rules:

1.If a word ends in S, X, or Z, add ES. “Bass” becomes “basses”, “fax” becomes “faxes”, and “waltz” becomes “waltzes”.

2.If a word ends in a noisy H, add ES; if it ends in a silent H, just add S. What's a noisy H? One that gets combined with other letters to make a sound that you can hear. So “coach” becomes “coaches” and “rash” becomes “rashes”, because you can hear the CH and SH sounds when you say them. But “cheetah” becomes “cheetahs”, because the H is silent.

3.If a word ends in Y that sounds like I, change the Y to IES; if the Y is combined with a vowel to sound like something else, just add S. So “vacancy” becomes “vacancies”, but “day” becomes “days”.

4.If all else fails, just add S and hope for the best.

5.there are a lot of exceptions. “Man” becomes “men” and “woman” becomes “women”, but “human” becomes “humans”. “Mouse” becomes “mice” and “louse” becomes “lice”, but “house” becomes “houses”. “Knife” becomes “knives” and “wife” becomes “wives”, but “lowlife” becomes “lowlifes”. And don't even get me started on words that are their own plural, like “sheep”, “deer”, and “haiku”

17.2.plural.py ,stage 1
So you're looking at words, which at least in English are strings of characters. And you have rules that say you need to find different combinations of characters, and then do different things to them. This sounds like a job for regular expressions.

example 17.1.plural1.py
import re

def plural(noun):                           
    if re.search('[sxz]$', noun):            
        return re.sub('$', 'es', noun)       
    elif re.search('[^aeioudgkprt]h$', noun):
        return re.sub('$', 'es', noun)      
    elif re.search('[^aeiou]y$', noun):     
        return re.sub('y$', 'ies', noun)    
    else:                                   
        return noun + 's'

The square brackets mean “match exactly one of these characters”. So [sxz] means “s, or x, or z”, but only one of them. The $ should be familiar; it matches the end of string. So you're checking to see if noun ends with s, x, or z.

example 17.2.Introducing re.sub
>>> import re
>>> re.search('[abc]','Mark')
<_sre.SRE_Match object at 0x0142F870>
>>> re.sub('[abc]','o','Mark')
'Mork'
>>> re.sub('[abc]','o','rock')
'rook'
>>> re.sub('[abc]','o','caps')
'oops'

You might think this would turn caps into oaps, but it doesn't. re.sub replaces all of the matches, not just the first one. So this regular expression turns caps into oops, because both the c and the a get turned into o.

example 17.3.Back to plural1.py
import re

def plural(noun):                           
    if re.search('[sxz]$', noun):           
        return re.sub('$', 'es', noun)       
    elif re.search('[^aeioudgkprt]h$', noun):
        return re.sub('$', 'es', noun)       
    elif re.search('[^aeiou]y$', noun):     
        return re.sub('y$', 'ies', noun)    
    else:                                   
        return noun + 's'

Look closely, this is another new variation. The ^ as the first character inside the square brackets means something special: negation. [^abc] means “any single character except a, b, or c”. So [^aeioudgkprt] means any character except a, e, i, o, u, d, g, k, p, r, or t. Then that character needs to be followed by h, followed by end of string. You're looking for words that end in H where the H can be heard.

Same pattern here: match words that end in Y, where the character before the Y is not a, e, i, o, or u. You're looking for words that end in Y that sounds like I.

example 17.4.More on negation regular expressions
>>> import re
>>> re.search('[^aeiou]y$','vacancy')
<_sre.SRE_Match object at 0x0142FA30>
>>> re.search('[^aeiou]y$','boy')
>>> re.search('[^aeiou]y$','day')
>>> re.search('[^aeiou]y$','pita')

vacancy matches this regular expression, because it ends in cy, and c is not a, e, i, o, or u.

boy does not match, because it ends in oy, and you specifically said that the character before the y could not be o. day does not match, because it ends in ay.

example 17.5.More on re.sub
>>> re.sub('y$','ies','vacancy')
'vacancies'
>>> re.sub('y$','ies','agency')
'agencies'
>>> re.sub('([^aeiou])y$',r'\1ies','vacancy')
'vacancies'

Most of it should look familiar: you're using a remembered group, which you learned in Section 7.6, “Case study: Parsing Phone Numbers”, to remember the character before the y. Then in the substitution string, you use a new syntax, \1, which means “hey, that first group you remembered? put it here”. In this case, you remember the c before the y, and then when you do the substitution, you substitute c in place of c, and ies in place of y. (If you have more than one remembered group, you can use \2 and \3 and so on.)

17.3.plural.py,stage 2

example 17.6.plura12.py
import re

def match_sxz(noun):                         
    return re.search('[sxz]$', noun)         

def apply_sxz(noun):                         
    return re.sub('$', 'es', noun)           

def match_h(noun):                           
    return re.search('[^aeioudgkprt]h$', noun)

def apply_h(noun):                           
    return re.sub('$', 'es', noun)           

def match_y(noun):                           
    return re.search('[^aeiou]y$', noun)     
       
def apply_y(noun):                           
    return re.sub('y$', 'ies', noun)         

def match_default(noun):                     
    return 1                                 
       
def apply_default(noun):                     
    return noun + 's'                        

rules = ((match_sxz, apply_sxz),
         (match_h, apply_h),
         (match_y, apply_y),
         (match_default, apply_default)
         )                                    

def plural(noun):                            
    for matchesRule, applyRule in rules:      
        if matchesRule(noun):                 
            return applyRule(noun)            

This version looks more complicated (it's certainly longer), but it does exactly the same thing: try to match four different rules, in order, and apply the appropriate regular expression when a match is found. The difference is that each individual match and apply rule is defined in its own function, and the functions are then listed in this rules variable, which is a tuple of tuples.

example 17.7.Unrolling the plural function
def plural(noun):
    if match_sxz(noun):
        return apply_sxz(noun)
    if match_h(noun):
        return apply_h(noun)
    if match_y(noun):
        return apply_y(noun)
    if match_default(noun):
        return apply_default(noun)

17.4.plural.py, stage 3

example 17.8.plural3.py
import re

rules = \
  (
    (
     lambda word: re.search('[sxz]$', word),
     lambda word: re.sub('$', 'es', word)
    ),
    (
     lambda word: re.search('[^aeioudgkprt]h$', word),
     lambda word: re.sub('$', 'es', word)
    ),
    (
     lambda word: re.search('[^aeiou]y$', word),
     lambda word: re.sub('y$', 'ies', word)
    ),
    (
     lambda word: re.search('$', word),
     lambda word: re.sub('$', 's', word)
    )
   )                                          

def plural(noun):                            
    for matchesRule, applyRule in rules:      
        if matchesRule(noun):                
            return applyRule(noun)

This is the same set of rules as you defined in stage 2. The only difference is that instead of defining named functions like match_sxz and apply_sxz, you have “inlined” those function definitions directly into the rules list itself, using lambda functions.

17.5.plural.py, stage 4

example 17.9.plural4.py

import re

def buildMatchAndApplyFunctions((pattern, search, replace)): 
    matchFunction = lambda word: re.search(pattern, word)     
    applyFunction = lambda word: re.sub(search, replace, word)
    return (matchFunction, applyFunction)        

buildMatchAndApplyFunctions is a function that builds other functions dynamically. It takes pattern, search and replace (actually it takes a tuple, but more on that in a minute), and you can build the match function using the lambda syntax to be a function that takes one parameter (word) and calls re.search with the pattern that was passed to the buildMatchAndApplyFunctions function, and the word that was passed to the match function you're building. Whoa.

example 17.10.plural4.py continued
patterns = \
  (
    ('[sxz]$', '$', 'es'),
    ('[^aeioudgkprt]h$', '$', 'es'),
    ('(qu|[^aeiou])y$', 'y$', 'ies'),
    ('$', '$', 's')
  )                                                
rules = map(buildMatchAndApplyFunctions, patterns)

This line is magic. It takes the list of strings in patterns and turns them into a list of functions. How? By mapping the strings to the buildMatchAndApplyFunctions function, which just happens to take three strings as parameters and return a tuple of two functions. This means that rules ends up being exactly the same as the previous example: a list of tuples, where each tuple is a pair of functions, where the first function is the match function that calls re.search, and the second function is the apply function that calls re.sub.

example 17.11.Unrolling the rules definition

example 17.12.plural4.py, finishing up

example 17.13.Another look at buildMatchAndApplyFunctions

example 17.14.Expanding tuples when calling functions
>>> def foo((a,b,c)):
... print c
... print b
... print a
...
>>> parameters = ('apple','bear','catnap')
>>> foo(parameters)
catnap
bear
apple

17.6.plural.py,stage 5
First, let's create a text file that contains the rules you want. No fancy data structures, just space- (or tab-)delimited strings in three columns. You'll call it rules.en; “en” stands for English. These are the rules for pluralizing English nouns. You could add other rule files for other languages later.

example 17.15.rules.en
[sxz]$                  $               es
[^aeioudgkprt]h$        $               es
[^aeiou]y$              y$              ies
$                       $                s

example 17.16.plural5.py
import re
import string                                                                    

def buildRule((pattern, search, replace)):                                       
    return lambda word: re.search(pattern, word) and re.sub(search, replace, word)

def plural(noun, language='en'):                            
    lines = file('rules.%s' % language).readlines()         
    patterns = map(string.split, lines)                     
    rules = map(buildRule, patterns)                        
    for rule in rules:                                     
        result = rule(noun)                                 
        if result: return result

return lambda word: re.search(pattern, word) and re.sub(search, replace, word)
This will let you accomplish the same thing as having two functions, but you'll need to call it differently, as you'll see in a minute.

17.7.plural.py, stage 6

example 17.17.plural6.py
import re

def rules(language):                                                                
    for line in file('rules.%s' % language):                                        
        pattern, search, replace = line.split()                                     
        yield lambda word: re.search(pattern, word) and re.sub(search, replace, word)

def plural(noun, language='en'):     
    for applyRule in rules(language):
        result = applyRule(noun)     
        if result: return result   

This uses a technique called generators, which I'm not even going to try to explain until you look at a simpler example first.

example 17.18.Introducing generators
>>> def make_counter(x):
... print 'entering make_counter'
... while 1:
... yield x
... print 'incrementing x'
... x = x + 1
...
>>> counter = make_counter(2)
>>> counter
<generator object make_counter at 0x01367508>
>>> counter.next()
entering make_counter
2
>>> counter.next()
incrementing x
3
>>> counter.next()
incrementing x
4

The presence of the yield keyword in make_counter means that this is not a normal function. It is a special kind of function which generates values one at a time. You can think of it as a resumable function. Calling it will return a generator that can be used to generate successive values of x.

The make_counter function returns a generator object.

The first time you call the next() method on the generator object, it executes the code in make_counter up to the first yield statement, and then returns the value that was yielded. In this case, that will be 2, because you originally created the generator by calling make_counter(2).

Repeatedly calling next() on the generator object resumes where you left off and continues until you hit the next yield statement. The next line of code waiting to be executed is the print statement that prints incrementing x, and then after that the x = x + 1 statement that actually increments it. Then you loop through the while loop again, and the first thing you do is yield x, which returns the current value of x (now 3).

Since make_counter sets up an infinite loop, you could theoretically do this forever, and it would just keep incrementing x and spitting out values. But let's look at more productive uses of generators instead.

example 17.19.Using generators instead of recursion  recursion [ri'kə:ʃən, -ʒən] n. 递归,循环;递归式
def fibonacci(max):
    a, b = 0, 1      
    while a < max:
        yield a      
        a, b = b, a+b

The Fibonacci sequence is a sequence of numbers where each number is the sum of the two numbers before it. It starts with 0 and 1, goes up slowly at first, then more and more rapidly. To start the sequence, you need two variables: a starts at 0, and b starts at 1.

a is the current number in the sequence, so yield it.

b is the next number in the sequence, so assign that to a, but also calculate the next value (a+b) and assign that to b for later use. Note that this happens in parallel; if a is 3 and b is 5, then a, b = b, a+b will set a to 5 (the previous value of b) and b to 8 (the sum of the previous values of a and b).

example 17.20.Generators in for loops
>>> for n in fibonacci(1000):
... print n,
...
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

You can use a generator like fibonacci in a for loop directly. The for loop will create the generator object and successively call the next() method to get values to assign to the for loop index variable (n).

Each time through the for loop, n gets a new value from the yield statement in fibonacci, and all you do is print it out. Once fibonacci runs out of numbers (a gets bigger than max, which in this case is 1000), then the for loop exits gracefully.

example 17.21.Generators that generate dynamic functions
def rules(language):                                                                
    for line in file('rules.%s' % language):                                         
        pattern, search, replace = line.split()                                      
        yield lambda word: re.search(pattern, word) and re.sub(search, replace, word)

def plural(noun, language='en'):     
    for applyRule in rules(language): 
        result = applyRule(noun)     
        if result: return result

What do you yield? A function, built dynamically with lambda, that is actually a closure (it uses the local variables pattern, search, and replace as constants). In other words, rules is a generator that spits out rule functions.

分享到:
评论

相关推荐

    《Dive Into Python 3中文版》PDF

    《Dive Into Python 3中文版》是一本深入学习Python 3编程语言的教程,适合初学者和有一定编程基础的开发者。这本书详细介绍了Python 3的各种特性,包括语法、数据结构、函数、类、模块、异常处理、输入/输出、网络...

    dive into python3 (中文版)

    Python是一种广泛使用的高级编程语言,以其简洁明了的语法和强大的功能而闻名。《深入Python3(中文版)》是一本系统介绍Python 3的书籍,旨在帮助读者深入学习Python 3的基本知识与应用。本文将根据给定文件的信息...

    Dive into Python3

    《Dive into Python3》的压缩包文件名为diveintopython3-r860-2010-01-13,这可能表示它是2010年1月13日发布的第860个修订版。这个版本可能包含了作者对初版的修正和更新,以适应Python 3的最新发展。 通过阅读这...

    Dive Into Python 中文译文版

    PDF版本的《Dive Into Python 中文译文版》(diveintopython-pdfzh-cn-5.4b.zip)提供了完整的书籍内容,涵盖了Python的基础知识到高级特性。书中通过实际案例引导读者深入学习,包括但不限于变量、数据类型、控制...

    DiveIntoPython

    《Dive Into Python》是一本深受编程初学者和有经验开发者喜爱的Python编程教程。这本书以其深入浅出的讲解方式,让学习者能够快速掌握Python编程语言的核心概念和实际应用,特别是对于想要涉足Web开发领域的读者,...

    深入Python (Dive Into Python)

    深入python,深入Python (Dive Into Python) 译者序 by limodou 主页(http://phprecord.126.com) Python论坛 本书英文名字为《Dive Into Python》,其发布遵守 GNU 的自由文档许可证(Free Document Lience)的...

    Dive into python

    dive into python英文原版,Dive Into Python 3 covers Python 3 and its differences from Python 2. Compared to Dive Into Python, it’s about 20% revised and 80% new material. The book is now complete, ...

    Dive Into Python 2 中文版

    《Dive Into Python 2 中文版》是一本深度探讨Python编程语言的教程,适合已经有一定编程基础,希望深入理解Python特性和应用的读者。这本书以其详尽的解释和丰富的实例,为Python初学者和进阶者提供了全面的学习...

    Dive Into Python 3

    《深入Python 3》是一本全面且深入介绍Python 3编程语言的电子书籍,旨在帮助读者从...压缩包中的文件“diveintomark-diveintopython3-793871b”很可能是该书的源代码或HTML文件,可以配合阅读,加深对书中示例的理解。

    Dive Into Python 3 无水印pdf

    Dive Into Python 3 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    Dive Into Python 3, r870 (2010).pdf

    Didyoureadtheoriginal“DiveIntoPython”?Didyoubuyit onpaper?(Ifso,thanks!)AreyoureadytotaketheplungeintoPython3?…Ifso,readon.(Ifnoneofthat istrue,you’dbebetteroffstartingatthebeginning.) Python3...

    Dive Into Python V5.4

    《Dive Into Python V5.4》是一本深入学习Python编程语言的经典教程,以其详尽的解释和丰富的实例深受程序员们的喜爱。这个版本是官方提供的最新版本,它不仅包含了PDF格式的完整书籍,还附带了书中所有示例代码,为...

    diveintopython-examples-5.4.rar

    diveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rar

    dive-into-python3 (英文版)+深入python3(中文版)

    《Dive Into Python3》和《深入Python3》是两本深受Python爱好者欢迎的书籍,分别提供了英文和中文的学习资源,旨在帮助读者全面理解和掌握Python3编程语言。这两本书覆盖了Python3的基础语法、高级特性以及实际应用...

    diveintopython3

    在“diveintopython3-master”这个压缩包中,包含了这本书的所有源代码示例。通过这些代码,我们可以学习到以下关键知识点: 1. **Python基础**:包括变量、数据类型(如整型、浮点型、字符串、列表、元组、字典)...

    Dive Into Python中文版

    Dive Into Python中文版,精心整理,epub版本方便阅读,下载阅读.

    Dive Into Python 3 中文版

    ### Dive Into Python 3 中文版 - 安装Python 3 #### 标题解析 - **Dive Into Python 3 中文版**:这本书名表明了内容将深入讲解Python 3的各项特性和使用方法,适合希望深入了解Python 3编程语言的读者。 #### ...

Global site tag (gtag.js) - Google Analytics