`
sillycat
  • 浏览: 2543277 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

DiveIntoPython(十七)

阅读更多
DiveIntoPython(十七)

英文书地址:
http://diveintopython.org/toc/index.html

Chapter 18.Performance Tuning
Performance tuning is a many-splendored thing. Just because Python is an interpreted language doesn't mean you shouldn't worry about code optimization. But don't worry about it too much.

18.1.Diving in
Let's start here: are you sure you need to do it at all? Is your code really so bad? Is it worth the time to tune it? Over the lifetime of your application, how much time is going to be spent running that code, compared to the time spent waiting for a remote database server, or waiting for user input?

This is not to say that code optimization is worthless, but you need to look at the whole system and decide whether it's the best use of your time. Every minute you spend optimizing code is a minute you're not spending adding new features, or writing documentation, or playing with your kids, or writing unit tests.

There are several subtle variations of the Soundex algorithm. This is the one used in this chapter:

Keep the first letter of the name as-is.
Convert the remaining letters to digits, according to a specific table:
B, F, P, and V become 1.
C, G, J, K, Q, S, X, and Z become 2.
D and T become 3.
L becomes 4.
M and N become 5.
R becomes 6.
All other letters become 9.
Remove consecutive duplicates.
Remove all 9s altogether.
If the result is shorter than four characters (the first letter plus three digits), pad the result with trailing zeros.
if the result is longer than four characters, discard everything after the fourth character.
For example, my name, Pilgrim, becomes P942695. That has no consecutive duplicates, so nothing to do there. Then you remove the 9s, leaving P4265. That's too long, so you discard the excess character, leaving P426.

Another example: Woo becomes W99, which becomes W9, which becomes W, which gets padded with zeros to become W000.

example 18.1.soundex/stage1/soundex1a.py
import string, re

charToSoundex = {"A": "9",
                 "B": "1",
                 "C": "2",
                 "D": "3",
                 "E": "9",
                 "F": "1",
                 "G": "2",
                 "H": "9",
                 "I": "9",
                 "J": "2",
                 "K": "2",
                 "L": "4",
                 "M": "5",
                 "N": "5",
                 "O": "9",
                 "P": "1",
                 "Q": "2",
                 "R": "6",
                 "S": "2",
                 "T": "3",
                 "U": "9",
                 "V": "1",
                 "W": "9",
                 "X": "2",
                 "Y": "9",
                 "Z": "2"}

def soundex(source):
    "convert string to Soundex equivalent"

    # Soundex requirements:
    # source string must be at least 1 character
    # and must consist entirely of letters
    allChars = string.uppercase + string.lowercase
    if not re.search('^[%s]+$' % allChars, source):
        return "0000"

    # Soundex algorithm:
    # 1. make first character uppercase
    source = source[0].upper() + source[1:]
   
    # 2. translate all other characters to Soundex digits
    digits = source[0]
    for s in source[1:]:
        s = s.upper()
        digits += charToSoundex[s]

    # 3. remove consecutive duplicates
    digits2 = digits[0]
    for d in digits[1:]:
        if digits2[-1] != d:
            digits2 += d
       
    # 4. remove all "9"s
    digits3 = re.sub('9', '', digits2)
   
    # 5. pad end with "0"s to 4 characters
    while len(digits3) < 4:
        digits3 += "0"
       
    # 6. return first 4 characters
    return digits3[:4]

if __name__ == '__main__':
    from timeit import Timer
    names = ('Woo', 'Pilgrim', 'Flingjingwaller')
    for name in names:
        statement = "soundex('%s')" % name
        t = Timer(statement, "from __main__ import soundex")
        print name.ljust(15), soundex(name), min(t.repeat())

18.2.Using the timeit Module

example 18.2.Introducing timeit
>>> import timeit
>>> t = timeit.Timer("soundex.soundex('Pilgrim')","import soundex")
>>> t.timeit()
7.8085661725163389
>>> t.repeat(3,2000000)
[15.873824745882509, 16.936368728427773, 16.090063681277933]

The timeit module defines one class, Timer, which takes two arguments. Both arguments are strings. The first argument is the statement you wish to time; in this case, you are timing a call to the Soundex function within the soundex with an argument of 'Pilgrim'. The second argument to the Timer class is the import statement that sets up the environment for the statement.

Once you have the Timer object, the easiest thing to do is call timeit(), which calls your function 1 million times and returns the number of seconds it took to do it.

The other major method of the Timer object is repeat(), which takes two optional arguments. The first argument is the number of times to repeat the entire test, and the second argument is the number of times to call the timed statement within each test. Both arguments are optional, and they default to 3 and 1000000 respectively. The repeat() method returns a list of the times each test cycle took, in seconds.

Python has a handy min function that takes a list and returns the smallest value:
>>> min(t.repeat(3,1000000))
8.2049416895164313

18.3.Optimizing Regular Expressions

example soundex/stage1/soundex1a.py

allChars = string.uppercase + string.lowercase
    if not re.search('^[%s]+$' % allChars, source):
        return "0000"

if __name__ == '__main__':
    from timeit import Timer
    names = ('Woo', 'Pilgrim', 'Flingjingwaller')
    for name in names:
        statement = "soundex('%s')" % name
        t = Timer(statement, "from __main__ import soundex")
        print name.ljust(15), soundex(name), min(t.repeat())

E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1a.py
Woo             W000    21.6845738774
Pilgrim         P426 25.4686735325
Flingjingwaller F452 33.7811945373

example soundex/stage1/soundex1b.py
if not re.search('^[A-Za-z]+$', source):
        return "0000"

E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1b.py
Woo             W000 20.6642844272
Pilgrim         P426 24.2337135027
Flingjingwaller F452 32.8864372427

example soundex/stage1/soundex1c.py

isOnlyChars = re.compile('^[A-Za-z]+$').search
def soundex(source):
    if not isOnlyChars(source):
        return "0000"

E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1c.py
Woo             W000 17.2678421419
Pilgrim         P426 20.8697745104
Flingjingwaller F452 29.3527870162

example soundex/stage1/soudex1d.py
if not source:
        return "0000"
    for c in source:
        if not ('A' <= c <= 'Z') and not ('a' <= c <= 'z'):
            return "0000"

It turns out that this technique in soundex1d.py is not faster than using a compiled regular expression (although it is faster than using a non-compiled regular expression):
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1d.py
Woo             W000 17.5661093798
Pilgrim         P426 23.8642883383
Flingjingwaller F452 35.9031505401

It turns out that Python offers an obscure string method. You can be excused for not knowing about it, since it's never been mentioned in this book. The method is called isalpha(), and it checks whether a string contains only letters.

example soundex/stage1/soudnex1e.py
if (not source) and (not source.isalpha()):
        return "0000"

E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1e.py
Woo             W000 15.3807154261
Pilgrim         P426 19.2102524203
Flingjingwaller F452 27.7341740361

example 18.3.Best Result So Far:soundex/stage1/soundex1e.py

18.4.Optimizing Dictionary Lookups

example soundex/stage1/soundex1c.py
def soundex(source):
    # ... input check omitted for brevity ...
    source = source[0].upper() + source[1:]
    digits = source[0]
    for s in source[1:]:
        s = s.upper()
        digits += charToSoundex[s]

example soundex/stage2/soundex2a.py
def soundex(source):
    # ...
    source = source.upper()
    digits = source[0] + "".join(map(lambda c: charToSoundex[c], source[1:]))

Surprisingly, soundex2a.py is not faster:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2a.py
Woo             W000 18.0709193487
Pilgrim         P426 21.4308902388
Flingjingwaller F452 28.9357734014

The overhead of the anonymous lambda function kills any performance you gain by dealing with the string as a list of characters.

example soundex/stage2/soundex2b.py uses a list comprehension instead of ↦ and lambda:
source = source.upper()
    digits = source[0] + "".join([charToSoundex[c] for c in source[1:]])
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2b.py
Woo             W000 14.9096245473
Pilgrim         P426 17.5887675668
Flingjingwaller F452 22.4958635804

It's time for a radically different approach. Dictionary lookups are a general purpose tool. Dictionary keys can be any length string (or many other data types), but in this case we are only dealing with single-character keys and single-character values. It turns out that Python has a specialized function for handling exactly this situation: the string.maketrans function.

example soundex/stage2/soundex2c.py
allChar = string.uppercase + string.lowercase
charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)
def soundex(source):
    # ...
    digits = source[0].upper() + source[1:].translate(charToSoundex)

string.maketrans creates a translation matrix between two strings: the first argument and the second argument. In this case, the first argument is the string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, and the second argument is the string 9123912992245591262391929291239129922455912623919292.

It's the same conversion pattern we were setting up longhand with a dictionary. A maps to 9, B maps to 1, C maps to 2, and so forth. But it's not a dictionary; it's a specialized data structure that you can access using the string method translate, which translates each character into the corresponding digit, according to the matrix defined by string.maketrans.

E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2c.py
Woo             W000 13.0717325805
Pilgrim         P426 15.2769533558
Flingjingwaller F452 19.284963033

example 18.4.Best Result So Far: soundex/stage2/soundex2c.py

18.5.Optimizing List Operations
Here's the code we have so far, in soundex/stage2/soundex2c.py:
digits2 = digits[0]
    for d in digits[1:]:
        if digits2[-1] != d:
            digits2 += d

The first thing to consider is whether it's efficient to check digits[-1] each time through the loop. Are list indexes expensive? Would we be better off maintaining the last digit in a separate variable, and checking that instead?

example soundex/stage3/soundex3a.py
digits2 = ''
    last_digit = ''
    for d in digits:
        if d != last_digit:
            digits2 += d
            last_digit = d

soundex3a.py does not run any faster than soundex2c.py, and may even be slightly slower
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3a.py
Woo             W000 13.0359651855
Pilgrim         P426 14.5332295026
Flingjingwaller F452 18.2346787092

Let's try something radically different. If it's possible to treat a string as a list of characters, it should be possible to use a list comprehension to iterate through the list. The problem is, the code needs access to the previous character in the list, and that's not easy to do with a straightforward list comprehension.

example soundex/stage3/soundex3b.py
digits2 = "".join([digits[i] for i in range(len(digits))
                       if i == 0 or digits[i-1] != digits[i]])

Is this faster? In a word, no.
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3b.py
Woo             W000 13.821099609
Pilgrim         P426 16.882249839
Flingjingwaller F452 22.9767347526

Python can convert a string into a list of characters with a single command: list('abc') returns ['a', 'b', 'c']. Furthermore, lists can be modified in place very quickly. Instead of incrementally building a new list (or string) out of the source string, why not move elements around within a single list?

example soundex/stage3/soundex3c.py
    digits = list(source[0].upper() + source[1:].translate(charToSoundex))
    i=0
    for item in digits:
        if item==digits[i]: continue
        i+=1
        digits[i]=item
    del digits[i+1:]
    digits2 = "".join(digits)
Is this faster than soundex3a.py or soundex3b.py? No, in fact it's the slowest method yet:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3c.py
Woo             W000 16.1901624368
Pilgrim         P426 18.2924290657
Flingjingwaller F452 22.718192396

We haven't made any progress here at all, except to try and rule out several “clever” techniques. The fastest code we've seen so far was the original, most straightforward method (soundex2c.py). Sometimes it doesn't pay to be clever.

example 18.5.Best Result So Far:soundex/stage2/soundex2c.py

18.6.Optimizing String Manipulation
example soundex/stage2/soundex2c.py
    digits3 = re.sub('9', '', digits2)
    while len(digits3) < 4:
        digits3 += "0"
    return digits3[:4]

The first thing to consider is replacing that regular expression with a loop. This code is from soundex/stage4/soundex4a.py:
example soundex/stage4/soundex4a.py
digits3 = ''
    for d in digits2:
        if d != '9':
            digits3 += d
Is soundex4a.py faster? Yes it is:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4a.py
Woo             W000 9.3276673432
Pilgrim         P426 11.0247101238
Flingjingwaller F452 15.2871535349

But wait a minute. A loop to remove characters from a string? We can use a simple string method for that. Here's soundex/stage4/soundex4b.py:
digits3 = digits2.replace('9', '')

Is soundex4b.py faster? That's an interesting question. It depends on the input:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4b.py
Woo             W000 6.70664232465
Pilgrim         P426 7.46835952614
Flingjingwaller F452 10.2336799789

Performance optimizations aren't always uniform; tuning that makes one case faster can sometimes make other cases slower. In this case, the majority of cases will benefit from the change, so let's leave it at that, but the principle is an important one to remember.

examples:
# 5. pad end with "0"s to 4 characters
while len(digits3) < 4:
        digits3 += "0"
# 6. return first 4 characters
return digits3[:4]

example soundex/stage4/soundex4c.py:
digits3 += '000'
return digits3[:4]
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4c.py
Woo             W000 5.39382044366
Pilgrim         P426 7.24943058405
Flingjingwaller F452 10.2510975557

example soundex/stage4/soundex4d.py:
return (digits2.replace('9', '') + '000')[:4]
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4d.py
Woo             W000 5.23971342727
Pilgrim         P426 7.14076314168
Flingjingwaller F452 10.4394474717

It is also significantly less readable, and for not much performance gain. Is that worth it? I hope you have good comments. Performance isn't everything. Your optimization efforts must always be balanced against threats to your program's readability and maintainability.

分享到:
评论

相关推荐

    dive into python3 (中文版)

    Python是一种广泛使用的高级编程语言,以其简洁明了的语法和强大的功能而闻名。《深入Python3(中文版)》是一本系统介绍Python 3的书籍,旨在帮助读者深入学习Python 3的基本知识与应用。本文将根据给定文件的信息...

    《Dive Into Python 3中文版》PDF

    《Dive Into Python 3中文版》是一本深入学习Python 3编程语言的教程,适合初学者和有一定编程基础的开发者。这本书详细介绍了Python 3的各种特性,包括语法、数据结构、函数、类、模块、异常处理、输入/输出、网络...

    Dive into Python3

    《Dive into Python3》的压缩包文件名为diveintopython3-r860-2010-01-13,这可能表示它是2010年1月13日发布的第860个修订版。这个版本可能包含了作者对初版的修正和更新,以适应Python 3的最新发展。 通过阅读这...

    Dive Into Python 中文译文版

    PDF版本的《Dive Into Python 中文译文版》(diveintopython-pdfzh-cn-5.4b.zip)提供了完整的书籍内容,涵盖了Python的基础知识到高级特性。书中通过实际案例引导读者深入学习,包括但不限于变量、数据类型、控制...

    DiveIntoPython

    《Dive Into Python》是一本深受编程初学者和有经验开发者喜爱的Python编程教程。这本书以其深入浅出的讲解方式,让学习者能够快速掌握Python编程语言的核心概念和实际应用,特别是对于想要涉足Web开发领域的读者,...

    深入Python (Dive Into Python)

    深入python,深入Python (Dive Into Python) 译者序 by limodou 主页(http://phprecord.126.com) Python论坛 本书英文名字为《Dive Into Python》,其发布遵守 GNU 的自由文档许可证(Free Document Lience)的...

    Dive into python

    dive into python英文原版,Dive Into Python 3 covers Python 3 and its differences from Python 2. Compared to Dive Into Python, it’s about 20% revised and 80% new material. The book is now complete, ...

    Dive Into Python 2 中文版

    《Dive Into Python 2 中文版》是一本深度探讨Python编程语言的教程,适合已经有一定编程基础,希望深入理解Python特性和应用的读者。这本书以其详尽的解释和丰富的实例,为Python初学者和进阶者提供了全面的学习...

    Dive Into Python 3

    《深入Python 3》是一本全面且深入介绍Python 3编程语言的电子书籍,旨在帮助读者从...压缩包中的文件“diveintomark-diveintopython3-793871b”很可能是该书的源代码或HTML文件,可以配合阅读,加深对书中示例的理解。

    Dive Into Python 3 无水印pdf

    Dive Into Python 3 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    Dive Into Python 3, r870 (2010).pdf

    Didyoureadtheoriginal“DiveIntoPython”?Didyoubuyit onpaper?(Ifso,thanks!)AreyoureadytotaketheplungeintoPython3?…Ifso,readon.(Ifnoneofthat istrue,you’dbebetteroffstartingatthebeginning.) Python3...

    Dive Into Python V5.4

    《Dive Into Python V5.4》是一本深入学习Python编程语言的经典教程,以其详尽的解释和丰富的实例深受程序员们的喜爱。这个版本是官方提供的最新版本,它不仅包含了PDF格式的完整书籍,还附带了书中所有示例代码,为...

    diveintopython-examples-5.4.rar

    diveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rar

    diveintopython3

    在“diveintopython3-master”这个压缩包中,包含了这本书的所有源代码示例。通过这些代码,我们可以学习到以下关键知识点: 1. **Python基础**:包括变量、数据类型(如整型、浮点型、字符串、列表、元组、字典)...

    dive-into-python3 (英文版)+深入python3(中文版)

    《Dive Into Python3》和《深入Python3》是两本深受Python爱好者欢迎的书籍,分别提供了英文和中文的学习资源,旨在帮助读者全面理解和掌握Python3编程语言。这两本书覆盖了Python3的基础语法、高级特性以及实际应用...

    Dive Into Python中文版

    Dive Into Python中文版,精心整理,epub版本方便阅读,下载阅读.

    Dive Into Python 3 中文版

    ### Dive Into Python 3 中文版 - 安装Python 3 #### 标题解析 - **Dive Into Python 3 中文版**:这本书名表明了内容将深入讲解Python 3的各项特性和使用方法,适合希望深入了解Python 3编程语言的读者。 #### ...

Global site tag (gtag.js) - Google Analytics