- 浏览: 2555902 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
nation:
你好,在部署Mesos+Spark的运行环境时,出现一个现象, ...
Spark(4)Deal with Mesos -
sillycat:
AMAZON Relatedhttps://www.godad ...
AMAZON API Gateway(2)Client Side SSL with NGINX -
sillycat:
sudo usermod -aG docker ec2-use ...
Docker and VirtualBox(1)Set up Shared Disk for Virtual Box -
sillycat:
Every Half an Hour30 * * * * /u ...
Build Home NAS(3)Data Redundancy -
sillycat:
3 List the Cron Job I Have>c ...
Build Home NAS(3)Data Redundancy
DiveIntoPython(十七)
英文书地址:
http://diveintopython.org/toc/index.html
Chapter 18.Performance Tuning
Performance tuning is a many-splendored thing. Just because Python is an interpreted language doesn't mean you shouldn't worry about code optimization. But don't worry about it too much.
18.1.Diving in
Let's start here: are you sure you need to do it at all? Is your code really so bad? Is it worth the time to tune it? Over the lifetime of your application, how much time is going to be spent running that code, compared to the time spent waiting for a remote database server, or waiting for user input?
This is not to say that code optimization is worthless, but you need to look at the whole system and decide whether it's the best use of your time. Every minute you spend optimizing code is a minute you're not spending adding new features, or writing documentation, or playing with your kids, or writing unit tests.
There are several subtle variations of the Soundex algorithm. This is the one used in this chapter:
Keep the first letter of the name as-is.
Convert the remaining letters to digits, according to a specific table:
B, F, P, and V become 1.
C, G, J, K, Q, S, X, and Z become 2.
D and T become 3.
L becomes 4.
M and N become 5.
R becomes 6.
All other letters become 9.
Remove consecutive duplicates.
Remove all 9s altogether.
If the result is shorter than four characters (the first letter plus three digits), pad the result with trailing zeros.
if the result is longer than four characters, discard everything after the fourth character.
For example, my name, Pilgrim, becomes P942695. That has no consecutive duplicates, so nothing to do there. Then you remove the 9s, leaving P4265. That's too long, so you discard the excess character, leaving P426.
Another example: Woo becomes W99, which becomes W9, which becomes W, which gets padded with zeros to become W000.
example 18.1.soundex/stage1/soundex1a.py
import string, re
charToSoundex = {"A": "9",
"B": "1",
"C": "2",
"D": "3",
"E": "9",
"F": "1",
"G": "2",
"H": "9",
"I": "9",
"J": "2",
"K": "2",
"L": "4",
"M": "5",
"N": "5",
"O": "9",
"P": "1",
"Q": "2",
"R": "6",
"S": "2",
"T": "3",
"U": "9",
"V": "1",
"W": "9",
"X": "2",
"Y": "9",
"Z": "2"}
def soundex(source):
"convert string to Soundex equivalent"
# Soundex requirements:
# source string must be at least 1 character
# and must consist entirely of letters
allChars = string.uppercase + string.lowercase
if not re.search('^[%s]+$' % allChars, source):
return "0000"
# Soundex algorithm:
# 1. make first character uppercase
source = source[0].upper() + source[1:]
# 2. translate all other characters to Soundex digits
digits = source[0]
for s in source[1:]:
s = s.upper()
digits += charToSoundex[s]
# 3. remove consecutive duplicates
digits2 = digits[0]
for d in digits[1:]:
if digits2[-1] != d:
digits2 += d
# 4. remove all "9"s
digits3 = re.sub('9', '', digits2)
# 5. pad end with "0"s to 4 characters
while len(digits3) < 4:
digits3 += "0"
# 6. return first 4 characters
return digits3[:4]
if __name__ == '__main__':
from timeit import Timer
names = ('Woo', 'Pilgrim', 'Flingjingwaller')
for name in names:
statement = "soundex('%s')" % name
t = Timer(statement, "from __main__ import soundex")
print name.ljust(15), soundex(name), min(t.repeat())
18.2.Using the timeit Module
example 18.2.Introducing timeit
>>> import timeit
>>> t = timeit.Timer("soundex.soundex('Pilgrim')","import soundex")
>>> t.timeit()
7.8085661725163389
>>> t.repeat(3,2000000)
[15.873824745882509, 16.936368728427773, 16.090063681277933]
The timeit module defines one class, Timer, which takes two arguments. Both arguments are strings. The first argument is the statement you wish to time; in this case, you are timing a call to the Soundex function within the soundex with an argument of 'Pilgrim'. The second argument to the Timer class is the import statement that sets up the environment for the statement.
Once you have the Timer object, the easiest thing to do is call timeit(), which calls your function 1 million times and returns the number of seconds it took to do it.
The other major method of the Timer object is repeat(), which takes two optional arguments. The first argument is the number of times to repeat the entire test, and the second argument is the number of times to call the timed statement within each test. Both arguments are optional, and they default to 3 and 1000000 respectively. The repeat() method returns a list of the times each test cycle took, in seconds.
Python has a handy min function that takes a list and returns the smallest value:
>>> min(t.repeat(3,1000000))
8.2049416895164313
18.3.Optimizing Regular Expressions
example soundex/stage1/soundex1a.py
allChars = string.uppercase + string.lowercase
if not re.search('^[%s]+$' % allChars, source):
return "0000"
if __name__ == '__main__':
from timeit import Timer
names = ('Woo', 'Pilgrim', 'Flingjingwaller')
for name in names:
statement = "soundex('%s')" % name
t = Timer(statement, "from __main__ import soundex")
print name.ljust(15), soundex(name), min(t.repeat())
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1a.py
Woo W000 21.6845738774
Pilgrim P426 25.4686735325
Flingjingwaller F452 33.7811945373
example soundex/stage1/soundex1b.py
if not re.search('^[A-Za-z]+$', source):
return "0000"
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1b.py
Woo W000 20.6642844272
Pilgrim P426 24.2337135027
Flingjingwaller F452 32.8864372427
example soundex/stage1/soundex1c.py
isOnlyChars = re.compile('^[A-Za-z]+$').search
def soundex(source):
if not isOnlyChars(source):
return "0000"
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1c.py
Woo W000 17.2678421419
Pilgrim P426 20.8697745104
Flingjingwaller F452 29.3527870162
example soundex/stage1/soudex1d.py
if not source:
return "0000"
for c in source:
if not ('A' <= c <= 'Z') and not ('a' <= c <= 'z'):
return "0000"
It turns out that this technique in soundex1d.py is not faster than using a compiled regular expression (although it is faster than using a non-compiled regular expression):
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1d.py
Woo W000 17.5661093798
Pilgrim P426 23.8642883383
Flingjingwaller F452 35.9031505401
It turns out that Python offers an obscure string method. You can be excused for not knowing about it, since it's never been mentioned in this book. The method is called isalpha(), and it checks whether a string contains only letters.
example soundex/stage1/soudnex1e.py
if (not source) and (not source.isalpha()):
return "0000"
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1e.py
Woo W000 15.3807154261
Pilgrim P426 19.2102524203
Flingjingwaller F452 27.7341740361
example 18.3.Best Result So Far:soundex/stage1/soundex1e.py
18.4.Optimizing Dictionary Lookups
example soundex/stage1/soundex1c.py
def soundex(source):
# ... input check omitted for brevity ...
source = source[0].upper() + source[1:]
digits = source[0]
for s in source[1:]:
s = s.upper()
digits += charToSoundex[s]
example soundex/stage2/soundex2a.py
def soundex(source):
# ...
source = source.upper()
digits = source[0] + "".join(map(lambda c: charToSoundex[c], source[1:]))
Surprisingly, soundex2a.py is not faster:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2a.py
Woo W000 18.0709193487
Pilgrim P426 21.4308902388
Flingjingwaller F452 28.9357734014
The overhead of the anonymous lambda function kills any performance you gain by dealing with the string as a list of characters.
example soundex/stage2/soundex2b.py uses a list comprehension instead of ↦ and lambda:
source = source.upper()
digits = source[0] + "".join([charToSoundex[c] for c in source[1:]])
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2b.py
Woo W000 14.9096245473
Pilgrim P426 17.5887675668
Flingjingwaller F452 22.4958635804
It's time for a radically different approach. Dictionary lookups are a general purpose tool. Dictionary keys can be any length string (or many other data types), but in this case we are only dealing with single-character keys and single-character values. It turns out that Python has a specialized function for handling exactly this situation: the string.maketrans function.
example soundex/stage2/soundex2c.py
allChar = string.uppercase + string.lowercase
charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)
def soundex(source):
# ...
digits = source[0].upper() + source[1:].translate(charToSoundex)
string.maketrans creates a translation matrix between two strings: the first argument and the second argument. In this case, the first argument is the string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, and the second argument is the string 9123912992245591262391929291239129922455912623919292.
It's the same conversion pattern we were setting up longhand with a dictionary. A maps to 9, B maps to 1, C maps to 2, and so forth. But it's not a dictionary; it's a specialized data structure that you can access using the string method translate, which translates each character into the corresponding digit, according to the matrix defined by string.maketrans.
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2c.py
Woo W000 13.0717325805
Pilgrim P426 15.2769533558
Flingjingwaller F452 19.284963033
example 18.4.Best Result So Far: soundex/stage2/soundex2c.py
18.5.Optimizing List Operations
Here's the code we have so far, in soundex/stage2/soundex2c.py:
digits2 = digits[0]
for d in digits[1:]:
if digits2[-1] != d:
digits2 += d
The first thing to consider is whether it's efficient to check digits[-1] each time through the loop. Are list indexes expensive? Would we be better off maintaining the last digit in a separate variable, and checking that instead?
example soundex/stage3/soundex3a.py
digits2 = ''
last_digit = ''
for d in digits:
if d != last_digit:
digits2 += d
last_digit = d
soundex3a.py does not run any faster than soundex2c.py, and may even be slightly slower
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3a.py
Woo W000 13.0359651855
Pilgrim P426 14.5332295026
Flingjingwaller F452 18.2346787092
Let's try something radically different. If it's possible to treat a string as a list of characters, it should be possible to use a list comprehension to iterate through the list. The problem is, the code needs access to the previous character in the list, and that's not easy to do with a straightforward list comprehension.
example soundex/stage3/soundex3b.py
digits2 = "".join([digits[i] for i in range(len(digits))
if i == 0 or digits[i-1] != digits[i]])
Is this faster? In a word, no.
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3b.py
Woo W000 13.821099609
Pilgrim P426 16.882249839
Flingjingwaller F452 22.9767347526
Python can convert a string into a list of characters with a single command: list('abc') returns ['a', 'b', 'c']. Furthermore, lists can be modified in place very quickly. Instead of incrementally building a new list (or string) out of the source string, why not move elements around within a single list?
example soundex/stage3/soundex3c.py
digits = list(source[0].upper() + source[1:].translate(charToSoundex))
i=0
for item in digits:
if item==digits[i]: continue
i+=1
digits[i]=item
del digits[i+1:]
digits2 = "".join(digits)
Is this faster than soundex3a.py or soundex3b.py? No, in fact it's the slowest method yet:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3c.py
Woo W000 16.1901624368
Pilgrim P426 18.2924290657
Flingjingwaller F452 22.718192396
We haven't made any progress here at all, except to try and rule out several “clever” techniques. The fastest code we've seen so far was the original, most straightforward method (soundex2c.py). Sometimes it doesn't pay to be clever.
example 18.5.Best Result So Far:soundex/stage2/soundex2c.py
18.6.Optimizing String Manipulation
example soundex/stage2/soundex2c.py
digits3 = re.sub('9', '', digits2)
while len(digits3) < 4:
digits3 += "0"
return digits3[:4]
The first thing to consider is replacing that regular expression with a loop. This code is from soundex/stage4/soundex4a.py:
example soundex/stage4/soundex4a.py
digits3 = ''
for d in digits2:
if d != '9':
digits3 += d
Is soundex4a.py faster? Yes it is:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4a.py
Woo W000 9.3276673432
Pilgrim P426 11.0247101238
Flingjingwaller F452 15.2871535349
But wait a minute. A loop to remove characters from a string? We can use a simple string method for that. Here's soundex/stage4/soundex4b.py:
digits3 = digits2.replace('9', '')
Is soundex4b.py faster? That's an interesting question. It depends on the input:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4b.py
Woo W000 6.70664232465
Pilgrim P426 7.46835952614
Flingjingwaller F452 10.2336799789
Performance optimizations aren't always uniform; tuning that makes one case faster can sometimes make other cases slower. In this case, the majority of cases will benefit from the change, so let's leave it at that, but the principle is an important one to remember.
examples:
# 5. pad end with "0"s to 4 characters
while len(digits3) < 4:
digits3 += "0"
# 6. return first 4 characters
return digits3[:4]
example soundex/stage4/soundex4c.py:
digits3 += '000'
return digits3[:4]
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4c.py
Woo W000 5.39382044366
Pilgrim P426 7.24943058405
Flingjingwaller F452 10.2510975557
example soundex/stage4/soundex4d.py:
return (digits2.replace('9', '') + '000')[:4]
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4d.py
Woo W000 5.23971342727
Pilgrim P426 7.14076314168
Flingjingwaller F452 10.4394474717
It is also significantly less readable, and for not much performance gain. Is that worth it? I hope you have good comments. Performance isn't everything. Your optimization efforts must always be balanced against threats to your program's readability and maintainability.
英文书地址:
http://diveintopython.org/toc/index.html
Chapter 18.Performance Tuning
Performance tuning is a many-splendored thing. Just because Python is an interpreted language doesn't mean you shouldn't worry about code optimization. But don't worry about it too much.
18.1.Diving in
Let's start here: are you sure you need to do it at all? Is your code really so bad? Is it worth the time to tune it? Over the lifetime of your application, how much time is going to be spent running that code, compared to the time spent waiting for a remote database server, or waiting for user input?
This is not to say that code optimization is worthless, but you need to look at the whole system and decide whether it's the best use of your time. Every minute you spend optimizing code is a minute you're not spending adding new features, or writing documentation, or playing with your kids, or writing unit tests.
There are several subtle variations of the Soundex algorithm. This is the one used in this chapter:
Keep the first letter of the name as-is.
Convert the remaining letters to digits, according to a specific table:
B, F, P, and V become 1.
C, G, J, K, Q, S, X, and Z become 2.
D and T become 3.
L becomes 4.
M and N become 5.
R becomes 6.
All other letters become 9.
Remove consecutive duplicates.
Remove all 9s altogether.
If the result is shorter than four characters (the first letter plus three digits), pad the result with trailing zeros.
if the result is longer than four characters, discard everything after the fourth character.
For example, my name, Pilgrim, becomes P942695. That has no consecutive duplicates, so nothing to do there. Then you remove the 9s, leaving P4265. That's too long, so you discard the excess character, leaving P426.
Another example: Woo becomes W99, which becomes W9, which becomes W, which gets padded with zeros to become W000.
example 18.1.soundex/stage1/soundex1a.py
import string, re
charToSoundex = {"A": "9",
"B": "1",
"C": "2",
"D": "3",
"E": "9",
"F": "1",
"G": "2",
"H": "9",
"I": "9",
"J": "2",
"K": "2",
"L": "4",
"M": "5",
"N": "5",
"O": "9",
"P": "1",
"Q": "2",
"R": "6",
"S": "2",
"T": "3",
"U": "9",
"V": "1",
"W": "9",
"X": "2",
"Y": "9",
"Z": "2"}
def soundex(source):
"convert string to Soundex equivalent"
# Soundex requirements:
# source string must be at least 1 character
# and must consist entirely of letters
allChars = string.uppercase + string.lowercase
if not re.search('^[%s]+$' % allChars, source):
return "0000"
# Soundex algorithm:
# 1. make first character uppercase
source = source[0].upper() + source[1:]
# 2. translate all other characters to Soundex digits
digits = source[0]
for s in source[1:]:
s = s.upper()
digits += charToSoundex[s]
# 3. remove consecutive duplicates
digits2 = digits[0]
for d in digits[1:]:
if digits2[-1] != d:
digits2 += d
# 4. remove all "9"s
digits3 = re.sub('9', '', digits2)
# 5. pad end with "0"s to 4 characters
while len(digits3) < 4:
digits3 += "0"
# 6. return first 4 characters
return digits3[:4]
if __name__ == '__main__':
from timeit import Timer
names = ('Woo', 'Pilgrim', 'Flingjingwaller')
for name in names:
statement = "soundex('%s')" % name
t = Timer(statement, "from __main__ import soundex")
print name.ljust(15), soundex(name), min(t.repeat())
18.2.Using the timeit Module
example 18.2.Introducing timeit
>>> import timeit
>>> t = timeit.Timer("soundex.soundex('Pilgrim')","import soundex")
>>> t.timeit()
7.8085661725163389
>>> t.repeat(3,2000000)
[15.873824745882509, 16.936368728427773, 16.090063681277933]
The timeit module defines one class, Timer, which takes two arguments. Both arguments are strings. The first argument is the statement you wish to time; in this case, you are timing a call to the Soundex function within the soundex with an argument of 'Pilgrim'. The second argument to the Timer class is the import statement that sets up the environment for the statement.
Once you have the Timer object, the easiest thing to do is call timeit(), which calls your function 1 million times and returns the number of seconds it took to do it.
The other major method of the Timer object is repeat(), which takes two optional arguments. The first argument is the number of times to repeat the entire test, and the second argument is the number of times to call the timed statement within each test. Both arguments are optional, and they default to 3 and 1000000 respectively. The repeat() method returns a list of the times each test cycle took, in seconds.
Python has a handy min function that takes a list and returns the smallest value:
>>> min(t.repeat(3,1000000))
8.2049416895164313
18.3.Optimizing Regular Expressions
example soundex/stage1/soundex1a.py
allChars = string.uppercase + string.lowercase
if not re.search('^[%s]+$' % allChars, source):
return "0000"
if __name__ == '__main__':
from timeit import Timer
names = ('Woo', 'Pilgrim', 'Flingjingwaller')
for name in names:
statement = "soundex('%s')" % name
t = Timer(statement, "from __main__ import soundex")
print name.ljust(15), soundex(name), min(t.repeat())
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1a.py
Woo W000 21.6845738774
Pilgrim P426 25.4686735325
Flingjingwaller F452 33.7811945373
example soundex/stage1/soundex1b.py
if not re.search('^[A-Za-z]+$', source):
return "0000"
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1b.py
Woo W000 20.6642844272
Pilgrim P426 24.2337135027
Flingjingwaller F452 32.8864372427
example soundex/stage1/soundex1c.py
isOnlyChars = re.compile('^[A-Za-z]+$').search
def soundex(source):
if not isOnlyChars(source):
return "0000"
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1c.py
Woo W000 17.2678421419
Pilgrim P426 20.8697745104
Flingjingwaller F452 29.3527870162
example soundex/stage1/soudex1d.py
if not source:
return "0000"
for c in source:
if not ('A' <= c <= 'Z') and not ('a' <= c <= 'z'):
return "0000"
It turns out that this technique in soundex1d.py is not faster than using a compiled regular expression (although it is faster than using a non-compiled regular expression):
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1d.py
Woo W000 17.5661093798
Pilgrim P426 23.8642883383
Flingjingwaller F452 35.9031505401
It turns out that Python offers an obscure string method. You can be excused for not knowing about it, since it's never been mentioned in this book. The method is called isalpha(), and it checks whether a string contains only letters.
example soundex/stage1/soudnex1e.py
if (not source) and (not source.isalpha()):
return "0000"
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage1>python soundex1e.py
Woo W000 15.3807154261
Pilgrim P426 19.2102524203
Flingjingwaller F452 27.7341740361
example 18.3.Best Result So Far:soundex/stage1/soundex1e.py
18.4.Optimizing Dictionary Lookups
example soundex/stage1/soundex1c.py
def soundex(source):
# ... input check omitted for brevity ...
source = source[0].upper() + source[1:]
digits = source[0]
for s in source[1:]:
s = s.upper()
digits += charToSoundex[s]
example soundex/stage2/soundex2a.py
def soundex(source):
# ...
source = source.upper()
digits = source[0] + "".join(map(lambda c: charToSoundex[c], source[1:]))
Surprisingly, soundex2a.py is not faster:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2a.py
Woo W000 18.0709193487
Pilgrim P426 21.4308902388
Flingjingwaller F452 28.9357734014
The overhead of the anonymous lambda function kills any performance you gain by dealing with the string as a list of characters.
example soundex/stage2/soundex2b.py uses a list comprehension instead of ↦ and lambda:
source = source.upper()
digits = source[0] + "".join([charToSoundex[c] for c in source[1:]])
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2b.py
Woo W000 14.9096245473
Pilgrim P426 17.5887675668
Flingjingwaller F452 22.4958635804
It's time for a radically different approach. Dictionary lookups are a general purpose tool. Dictionary keys can be any length string (or many other data types), but in this case we are only dealing with single-character keys and single-character values. It turns out that Python has a specialized function for handling exactly this situation: the string.maketrans function.
example soundex/stage2/soundex2c.py
allChar = string.uppercase + string.lowercase
charToSoundex = string.maketrans(allChar, "91239129922455912623919292" * 2)
def soundex(source):
# ...
digits = source[0].upper() + source[1:].translate(charToSoundex)
string.maketrans creates a translation matrix between two strings: the first argument and the second argument. In this case, the first argument is the string ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz, and the second argument is the string 9123912992245591262391929291239129922455912623919292.
It's the same conversion pattern we were setting up longhand with a dictionary. A maps to 9, B maps to 1, C maps to 2, and so forth. But it's not a dictionary; it's a specialized data structure that you can access using the string method translate, which translates each character into the corresponding digit, according to the matrix defined by string.maketrans.
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage2>python soundex2c.py
Woo W000 13.0717325805
Pilgrim P426 15.2769533558
Flingjingwaller F452 19.284963033
example 18.4.Best Result So Far: soundex/stage2/soundex2c.py
18.5.Optimizing List Operations
Here's the code we have so far, in soundex/stage2/soundex2c.py:
digits2 = digits[0]
for d in digits[1:]:
if digits2[-1] != d:
digits2 += d
The first thing to consider is whether it's efficient to check digits[-1] each time through the loop. Are list indexes expensive? Would we be better off maintaining the last digit in a separate variable, and checking that instead?
example soundex/stage3/soundex3a.py
digits2 = ''
last_digit = ''
for d in digits:
if d != last_digit:
digits2 += d
last_digit = d
soundex3a.py does not run any faster than soundex2c.py, and may even be slightly slower
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3a.py
Woo W000 13.0359651855
Pilgrim P426 14.5332295026
Flingjingwaller F452 18.2346787092
Let's try something radically different. If it's possible to treat a string as a list of characters, it should be possible to use a list comprehension to iterate through the list. The problem is, the code needs access to the previous character in the list, and that's not easy to do with a straightforward list comprehension.
example soundex/stage3/soundex3b.py
digits2 = "".join([digits[i] for i in range(len(digits))
if i == 0 or digits[i-1] != digits[i]])
Is this faster? In a word, no.
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3b.py
Woo W000 13.821099609
Pilgrim P426 16.882249839
Flingjingwaller F452 22.9767347526
Python can convert a string into a list of characters with a single command: list('abc') returns ['a', 'b', 'c']. Furthermore, lists can be modified in place very quickly. Instead of incrementally building a new list (or string) out of the source string, why not move elements around within a single list?
example soundex/stage3/soundex3c.py
digits = list(source[0].upper() + source[1:].translate(charToSoundex))
i=0
for item in digits:
if item==digits[i]: continue
i+=1
digits[i]=item
del digits[i+1:]
digits2 = "".join(digits)
Is this faster than soundex3a.py or soundex3b.py? No, in fact it's the slowest method yet:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage3>python soundex3c.py
Woo W000 16.1901624368
Pilgrim P426 18.2924290657
Flingjingwaller F452 22.718192396
We haven't made any progress here at all, except to try and rule out several “clever” techniques. The fastest code we've seen so far was the original, most straightforward method (soundex2c.py). Sometimes it doesn't pay to be clever.
example 18.5.Best Result So Far:soundex/stage2/soundex2c.py
18.6.Optimizing String Manipulation
example soundex/stage2/soundex2c.py
digits3 = re.sub('9', '', digits2)
while len(digits3) < 4:
digits3 += "0"
return digits3[:4]
The first thing to consider is replacing that regular expression with a loop. This code is from soundex/stage4/soundex4a.py:
example soundex/stage4/soundex4a.py
digits3 = ''
for d in digits2:
if d != '9':
digits3 += d
Is soundex4a.py faster? Yes it is:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4a.py
Woo W000 9.3276673432
Pilgrim P426 11.0247101238
Flingjingwaller F452 15.2871535349
But wait a minute. A loop to remove characters from a string? We can use a simple string method for that. Here's soundex/stage4/soundex4b.py:
digits3 = digits2.replace('9', '')
Is soundex4b.py faster? That's an interesting question. It depends on the input:
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4b.py
Woo W000 6.70664232465
Pilgrim P426 7.46835952614
Flingjingwaller F452 10.2336799789
Performance optimizations aren't always uniform; tuning that makes one case faster can sometimes make other cases slower. In this case, the majority of cases will benefit from the change, so let's leave it at that, but the principle is an important one to remember.
examples:
# 5. pad end with "0"s to 4 characters
while len(digits3) < 4:
digits3 += "0"
# 6. return first 4 characters
return digits3[:4]
example soundex/stage4/soundex4c.py:
digits3 += '000'
return digits3[:4]
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4c.py
Woo W000 5.39382044366
Pilgrim P426 7.24943058405
Flingjingwaller F452 10.2510975557
example soundex/stage4/soundex4d.py:
return (digits2.replace('9', '') + '000')[:4]
E:\book\opensource\python\diveintopython-5.4\py\soundex\stage4>python soundex4d.py
Woo W000 5.23971342727
Pilgrim P426 7.14076314168
Flingjingwaller F452 10.4394474717
It is also significantly less readable, and for not much performance gain. Is that worth it? I hope you have good comments. Performance isn't everything. Your optimization efforts must always be balanced against threats to your program's readability and maintainability.
发表评论
-
NodeJS12 and Zlib
2020-04-01 07:44 479NodeJS12 and Zlib It works as ... -
Traefik 2020(1)Introduction and Installation
2020-03-29 13:52 339Traefik 2020(1)Introduction and ... -
Private Registry 2020(1)No auth in registry Nginx AUTH for UI
2020-03-18 00:56 440Private Registry 2020(1)No auth ... -
Buffer in NodeJS 12 and NodeJS 8
2020-02-25 06:43 390Buffer in NodeJS 12 and NodeJS ... -
NodeJS ENV Similar to JENV and PyENV
2020-02-25 05:14 482NodeJS ENV Similar to JENV and ... -
Prometheus HA 2020(3)AlertManager Cluster
2020-02-24 01:47 426Prometheus HA 2020(3)AlertManag ... -
Serverless with NodeJS and TencentCloud 2020(5)CRON and Settings
2020-02-24 01:46 340Serverless with NodeJS and Tenc ... -
GraphQL 2019(3)Connect to MySQL
2020-02-24 01:48 252GraphQL 2019(3)Connect to MySQL ... -
GraphQL 2019(2)GraphQL and Deploy to Tencent Cloud
2020-02-24 01:48 454GraphQL 2019(2)GraphQL and Depl ... -
GraphQL 2019(1)Apollo Basic
2020-02-19 01:36 330GraphQL 2019(1)Apollo Basic Cl ... -
Serverless with NodeJS and TencentCloud 2020(4)Multiple Handlers and Running wit
2020-02-19 01:19 316Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(3)Build Tree and Traverse Tree
2020-02-19 01:19 323Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(2)Trigger SCF in SCF
2020-02-19 01:18 297Serverless with NodeJS and Tenc ... -
Serverless with NodeJS and TencentCloud 2020(1)Running with Component
2020-02-19 01:17 314Serverless with NodeJS and Tenc ... -
NodeJS MySQL Library and npmjs
2020-02-07 06:21 293NodeJS MySQL Library and npmjs ... -
Python Library 2019(1)requests and aiohttp
2019-12-18 01:12 264Python Library 2019(1)requests ... -
NodeJS Installation 2019
2019-10-20 02:57 577NodeJS Installation 2019 Insta ... -
Monitor Tool 2019(2)Monit on Multiple Instances and Email Alerts
2019-10-18 10:57 271Monitor Tool 2019(2)Monit on Mu ... -
Sqlite Database 2019(1)Sqlite3 Installation and Docker phpsqliteadmin
2019-09-05 11:24 379Sqlite Database 2019(1)Sqlite3 ... -
Supervisor 2019(2)Ubuntu and Multiple Services
2019-08-19 10:53 380Supervisor 2019(2)Ubuntu and Mu ...
相关推荐
Python是一种广泛使用的高级编程语言,以其简洁明了的语法和强大的功能而闻名。《深入Python3(中文版)》是一本系统介绍Python 3的书籍,旨在帮助读者深入学习Python 3的基本知识与应用。本文将根据给定文件的信息...
《Dive Into Python 3中文版》是一本深入学习Python 3编程语言的教程,适合初学者和有一定编程基础的开发者。这本书详细介绍了Python 3的各种特性,包括语法、数据结构、函数、类、模块、异常处理、输入/输出、网络...
《Dive into Python3》的压缩包文件名为diveintopython3-r860-2010-01-13,这可能表示它是2010年1月13日发布的第860个修订版。这个版本可能包含了作者对初版的修正和更新,以适应Python 3的最新发展。 通过阅读这...
PDF版本的《Dive Into Python 中文译文版》(diveintopython-pdfzh-cn-5.4b.zip)提供了完整的书籍内容,涵盖了Python的基础知识到高级特性。书中通过实际案例引导读者深入学习,包括但不限于变量、数据类型、控制...
《Dive Into Python》是一本深受编程初学者和有经验开发者喜爱的Python编程教程。这本书以其深入浅出的讲解方式,让学习者能够快速掌握Python编程语言的核心概念和实际应用,特别是对于想要涉足Web开发领域的读者,...
- **在线地址**:本书可通过官方网址http://diveintopython.org/(英文原版)及http://www.woodpecker.org.cn/diveintopython(中文版)获取。 - **版本更新**:建议通过官方渠道获取最新版本,确保内容的准确性和...
深入python,深入Python (Dive Into Python) 译者序 by limodou 主页(http://phprecord.126.com) Python论坛 本书英文名字为《Dive Into Python》,其发布遵守 GNU 的自由文档许可证(Free Document Lience)的...
dive into python英文原版,Dive Into Python 3 covers Python 3 and its differences from Python 2. Compared to Dive Into Python, it’s about 20% revised and 80% new material. The book is now complete, ...
《Dive Into Python 2 中文版》是一本深度探讨Python编程语言的教程,适合已经有一定编程基础,希望深入理解Python特性和应用的读者。这本书以其详尽的解释和丰富的实例,为Python初学者和进阶者提供了全面的学习...
《深入Python 3》是一本全面且深入介绍Python 3编程语言的电子书籍,旨在帮助读者从...压缩包中的文件“diveintomark-diveintopython3-793871b”很可能是该书的源代码或HTML文件,可以配合阅读,加深对书中示例的理解。
Dive Into Python 3 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除
Didyoureadtheoriginal“DiveIntoPython”?Didyoubuyit onpaper?(Ifso,thanks!)AreyoureadytotaketheplungeintoPython3?…Ifso,readon.(Ifnoneofthat istrue,you’dbebetteroffstartingatthebeginning.) Python3...
《Dive Into Python V5.4》是一本深入学习Python编程语言的经典教程,以其详尽的解释和丰富的实例深受程序员们的喜爱。这个版本是官方提供的最新版本,它不仅包含了PDF格式的完整书籍,还附带了书中所有示例代码,为...
diveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rar
在“diveintopython3-master”这个压缩包中,包含了这本书的所有源代码示例。通过这些代码,我们可以学习到以下关键知识点: 1. **Python基础**:包括变量、数据类型(如整型、浮点型、字符串、列表、元组、字典)...
《Dive Into Python3》和《深入Python3》是两本深受Python爱好者欢迎的书籍,分别提供了英文和中文的学习资源,旨在帮助读者全面理解和掌握Python3编程语言。这两本书覆盖了Python3的基础语法、高级特性以及实际应用...
Dive Into Python中文版,精心整理,epub版本方便阅读,下载阅读.
### Dive Into Python 3 中文版 - 安装Python 3 #### 标题解析 - **Dive Into Python 3 中文版**:这本书名表明了内容将深入讲解Python 3的各项特性和使用方法,适合希望深入了解Python 3编程语言的读者。 #### ...