`
sillycat
  • 浏览: 2556071 次
  • 性别: Icon_minigender_1
  • 来自: 成都
社区版块
存档分类
最新评论

DiveIntoPython(十四)

阅读更多
DiveIntoPython(十四)

英文书地址:
http://diveintopython.org/toc/index.html

Chapter 15.Refactoring

15.1.Handling bugs

example 15.1.The bug
>>> import roman5
>>> roman5.fromRoman("")
0
>>>

Remember in the previous section when you kept seeing that an empty string would match the regular expression you were using to check for valid Roman numerals? Well, it turns out that this is still true for the final version of the regular expression. And that's a bug; you want an empty string to raise an InvalidRomanNumeralError exception just like any other sequence of characters that don't represent a valid Roman numeral.

example 15.2.Testing for the bug(romantest61.py)
class FromRomanBadInput(unittest.TestCase):                                     

    # previous test cases omitted for clarity (they haven't changed)

    def testBlank(self):
        """fromRoman should fail with blank string"""
        self.assertRaises(roman.InvalidRomanNumeralError, roman.fromRoman, "")

example 15.3.Output of romantest61.py against roman61.py
E:\book\opensource\python\diveintopython-5.4\py\roman\stage6>python romantest61.py
..F..........
======================================================================
FAIL: fromRoman should fail with blank string
----------------------------------------------------------------------
Traceback (most recent call last):
  File "romantest61.py", line 123, in testBlank
    self.assertRaises(roman61.InvalidRomanNumeralError, roman61.fromRoman, "")
AssertionError: InvalidRomanNumeralError not raised

----------------------------------------------------------------------
Ran 13 tests in 0.328s

FAILED (failures=1)

example 15.4.Fixing the bug(roman62.py)

def fromRoman(s):
    """convert Roman numeral to integer"""
    if not s:
        raise InvalidRomanNumeralError, 'Input can not be blank'
    if not re.search(romanNumeralPattern, s):
        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s

    result = 0
    index = 0
    for numeral, integer in romanNumeralMap:
        while s[index:index+len(numeral)] == numeral:
            result += integer
            index += len(numeral)
    return result

example 15.5.Output of romantest62.py against roman62.py

15.2.Handling changing requirements
Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful.And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change.

Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 M characters in a row to represent 4000. If you make this change, you'll be able to expand the range of convertible numbers from 1..3999 to 1..4999.

example 15.6.Modifying test cases for new requirements(romantest71.py)
class KnownValues(unittest.TestCase):
    knownValues = (
...snip...
                    (4000, 'MMMM'),                                      
                    (4500, 'MMMMD'),
                    (4888, 'MMMMDCCCLXXXVIII'),
                    (4999, 'MMMMCMXCIX'))

The existing known values don't change (they're all still reasonable values to test), but you need to add a few more in the 4000 range. Here I've included 4000 (the shortest), 4500 (the second shortest), 4888 (the longest), and 4999 (the largest).

example 15.7.Output of romantest71.py against roman71.py

example 15.8.Coding the new requirements(roman72.py)
def toRoman(n):
    """convert integer to Roman numeral"""
    if not (0 < n < 5000):                                                        
        raise OutOfRangeError, "number out of range (must be 1..4999)"
    if int(n) <> n:
        raise NotIntegerError, "non-integers can not be converted"

    result = ""
    for numeral, integer in romanNumeralMap:
        while n >= integer:
            result += numeral
            n -= integer
    return result

#Define pattern to detect valid Roman numerals
romanNumeralPattern = '^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$'

def fromRoman(s):
    """convert Roman numeral to integer"""
    if not s:
        raise InvalidRomanNumeralError, 'Input can not be blank'
    if not re.search(romanNumeralPattern, s):
        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s

    result = 0
    index = 0
    for numeral, integer in romanNumeralMap:
        while s[index:index+len(numeral)] == numeral:
            result += integer
            index += len(numeral)
    return result

toRoman only needs one small change, in the range check. Where you used to check 0 < n < 4000, you now check 0 < n < 5000. And you change the error message that you raise to reflect the new acceptable range (1..4999 instead of 1..3999). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds 'M' for each thousand that it finds; given 4000, it will spit out 'MMMM'. The only reason it didn't do this before is that you explicitly stopped it with the range check.)

The only change is to romanNumeralPattern; if you look closely, you'll notice that you added another optional M in the first section of the regular expression. This will allow up to 4 M characters instead of 3, meaning you will allow the Roman numeral equivalents of 4999 instead of 3999.

example 15.9.Output of romantest72.py against roman72.py

15.3.Refactoring
The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.

Refactoring is the process of taking working code and making it work better. Usually, “better” means “faster”, although it can also mean “using less memory”, or “using less disk space”, or simply “more elegantly”. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any program.

It's probably not worth trying to do away with the regular expression altogether (it would be difficult, and it might not end up any faster), but you can speed up the function by precompiling the regular expression.

example 15.10.Compiling regular expressions
>>> import re
>>> pattern = '^M?M?M?$'
>>> re.search(pattern,'M')
<_sre.SRE_Match object at 0x013985D0>
>>> compiledPattern = re.compile(pattern)
>>> compiledPattern
<_sre.SRE_Pattern object at 0x01397380>
>>> dir(compiledPattern)
['__copy__', '__deepcopy__', 'findall', 'finditer', 'match', 'scanner', 'search', 'split', 'sub', 'subn']
>>> compiledPattern.search('M')
<_sre.SRE_Match object at 0x01398608>

This is the new syntax: re.compile takes a regular expression as a string and returns a pattern object. Note there is no string to match here. Compiling a regular expression has nothing to do with matching it against any specific strings (like 'M'); it only involves the regular expression itself.

The compiled pattern object returned from re.compile has several useful-looking functions, including several (like search and sub) that are available directly in the re module.

Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call the methods on the pattern object directly.

example 15.11.Compiled regular expressions in roman81.py

romanNumeralPattern = \
    re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$')

def fromRoman(s):
    """convert Roman numeral to integer"""
    if not s:
        raise InvalidRomanNumeralError, 'Input can not be blank'
    if not romanNumeralPattern.search(s):                                   
        raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s

    result = 0
    index = 0
    for numeral, integer in romanNumeralMap:
        while s[index:index+len(numeral)] == numeral:
            result += integer
            index += len(numeral)
    return result

This looks very similar, but in fact a lot has changed. romanNumeralPattern is no longer a string; it is a pattern object which was returned from re.compile.

example 15.12.Output of romantest81.py against roman81.py
E:\book\opensource\python\diveintopython-5.4\py\roman\stage8>python romantest81.py
.............
----------------------------------------------------------------------
Ran 13 tests in 0.375s

example 15.13.roman82.py
#old version
#romanNumeralPattern = \
#   re.compile('^M?M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$')

#new version
romanNumeralPattern = \
    re.compile('^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$')

example 15.14.Output of romantest82.py against roman82.py
E:\book\opensource\python\diveintopython-5.4\py\roman\stage8>python romantest82.py
.............
----------------------------------------------------------------------
Ran 13 tests in 0.375s

I time-tested just the regular expressions, and found that the search function is 11% faster with this syntax.

example 15.15.roman83.py
#old version
#romanNumeralPattern = \
#   re.compile('^M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$')

#new version
romanNumeralPattern = re.compile('''
    ^                   # beginning of string
    M{0,4}              # thousands - 0 to 4 M's
    (CM|CD|D?C{0,3})    # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 C's),
                        #            or 500-800 (D, followed by 0 to 3 C's)
    (XC|XL|L?X{0,3})    # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 X's),
                        #        or 50-80 (L, followed by 0 to 3 X's)
    (IX|IV|V?I{0,3})    # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 I's),
                        #        or 5-8 (V, followed by 0 to 3 I's)
    $                   # end of string
    ''', re.VERBOSE)

the re.VERBOSE flag, which tells Python that there are in-line comments within the regular expression itself. The comments and all the whitespace around them are not considered part of the regular expression; the re.compile function simply strips them all out when it compiles the expression. This new, “verbose” version is identical to the old version, but it is infinitely more readable.

example 15.16.Output of romantest83.py against roman83.py

This new, “verbose” version runs at exactly the same speed as the old version. In fact, the compiled pattern objects are the same, since the re.compile function strips out all the stuff you added.

15.4.Postscript
The biggest headache (and performance drain) in the program as it is currently written is the regular expression, which is required because you have no other way of breaking down a Roman numeral. But there's only 5000 of them; why don't you just build a lookup table once, then simply read that?

example 15.17.roman9.py
#Roman numerals must be less than 5000
MAX_ROMAN_NUMERAL = 4999

#Define digit mapping
romanNumeralMap = (('M',  1000),
                   ('CM', 900),
                   ('D',  500),
                   ('CD', 400),
                   ('C',  100),
                   ('XC', 90),
                   ('L',  50),
                   ('XL', 40),
                   ('X',  10),
                   ('IX', 9),
                   ('V',  5),
                   ('IV', 4),
                   ('I',  1))

#Create tables for fast conversion of roman numerals.
#See fillLookupTables() below.
toRomanTable = [ None ]  # Skip an index since Roman numerals have no zero
fromRomanTable = {}

def toRoman(n):
    """convert integer to Roman numeral"""
    if not (0 < n <= MAX_ROMAN_NUMERAL):
        raise OutOfRangeError, "number out of range (must be 1..%s)" % MAX_ROMAN_NUMERAL
    if int(n) <> n:
        raise NotIntegerError, "non-integers can not be converted"
    return toRomanTable[n]

def fromRoman(s):
    """convert Roman numeral to integer"""
    if not s:
        raise InvalidRomanNumeralError, "Input can not be blank"
    if not fromRomanTable.has_key(s):
        raise InvalidRomanNumeralError, "Invalid Roman numeral: %s" % s
    return fromRomanTable[s]

def toRomanDynamic(n):
    """convert integer to Roman numeral using dynamic programming"""
    result = ""
    for numeral, integer in romanNumeralMap:
        if n >= integer:
            result = numeral
            n -= integer
            break
    if n > 0:
        result += toRomanTable[n]
    return result

def fillLookupTables():
    """compute all the possible roman numerals"""
    #Save the values in two global tables to convert to and from integers.
    for integer in range(1, MAX_ROMAN_NUMERAL + 1):
        romanNumber = toRomanDynamic(integer)
        toRomanTable.append(romanNumber)
        fromRomanTable[romanNumber] = integer

fillLookupTables()

example 15.18.Output of romantest9.py against roman9.py
E:\book\opensource\python\diveintopython-5.4\py\roman\stage9>python romantest9.py
.............
----------------------------------------------------------------------
Ran 13 tests in 0.078s

OK

It is much faster.

分享到:
评论

相关推荐

    dive into python3 (中文版)

    Python是一种广泛使用的高级编程语言,以其简洁明了的语法和强大的功能而闻名。《深入Python3(中文版)》是一本系统介绍Python 3的书籍,旨在帮助读者深入学习Python 3的基本知识与应用。本文将根据给定文件的信息...

    《Dive Into Python 3中文版》PDF

    《Dive Into Python 3中文版》是一本深入学习Python 3编程语言的教程,适合初学者和有一定编程基础的开发者。这本书详细介绍了Python 3的各种特性,包括语法、数据结构、函数、类、模块、异常处理、输入/输出、网络...

    Dive into Python3

    《Dive into Python3》的压缩包文件名为diveintopython3-r860-2010-01-13,这可能表示它是2010年1月13日发布的第860个修订版。这个版本可能包含了作者对初版的修正和更新,以适应Python 3的最新发展。 通过阅读这...

    Dive Into Python 中文译文版

    PDF版本的《Dive Into Python 中文译文版》(diveintopython-pdfzh-cn-5.4b.zip)提供了完整的书籍内容,涵盖了Python的基础知识到高级特性。书中通过实际案例引导读者深入学习,包括但不限于变量、数据类型、控制...

    DiveIntoPython

    《Dive Into Python》是一本深受编程初学者和有经验开发者喜爱的Python编程教程。这本书以其深入浅出的讲解方式,让学习者能够快速掌握Python编程语言的核心概念和实际应用,特别是对于想要涉足Web开发领域的读者,...

    dive into python(中文版)

    - **在线地址**:本书可通过官方网址http://diveintopython.org/(英文原版)及http://www.woodpecker.org.cn/diveintopython(中文版)获取。 - **版本更新**:建议通过官方渠道获取最新版本,确保内容的准确性和...

    深入Python (Dive Into Python)

    深入python,深入Python (Dive Into Python) 译者序 by limodou 主页(http://phprecord.126.com) Python论坛 本书英文名字为《Dive Into Python》,其发布遵守 GNU 的自由文档许可证(Free Document Lience)的...

    Dive into python

    dive into python英文原版,Dive Into Python 3 covers Python 3 and its differences from Python 2. Compared to Dive Into Python, it’s about 20% revised and 80% new material. The book is now complete, ...

    Dive Into Python 3

    《深入Python 3》是一本全面且深入介绍Python 3编程语言的电子书籍,旨在帮助读者从...压缩包中的文件“diveintomark-diveintopython3-793871b”很可能是该书的源代码或HTML文件,可以配合阅读,加深对书中示例的理解。

    Dive Into Python 2 中文版

    《Dive Into Python 2 中文版》是一本深度探讨Python编程语言的教程,适合已经有一定编程基础,希望深入理解Python特性和应用的读者。这本书以其详尽的解释和丰富的实例,为Python初学者和进阶者提供了全面的学习...

    Dive Into Python 3 无水印pdf

    Dive Into Python 3 英文无水印pdf pdf所有页面使用FoxitReader和PDF-XChangeViewer测试都可以打开 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    Dive Into Python 3, r870 (2010).pdf

    Didyoureadtheoriginal“DiveIntoPython”?Didyoubuyit onpaper?(Ifso,thanks!)AreyoureadytotaketheplungeintoPython3?…Ifso,readon.(Ifnoneofthat istrue,you’dbebetteroffstartingatthebeginning.) Python3...

    diveintopython3

    在“diveintopython3-master”这个压缩包中,包含了这本书的所有源代码示例。通过这些代码,我们可以学习到以下关键知识点: 1. **Python基础**:包括变量、数据类型(如整型、浮点型、字符串、列表、元组、字典)...

    Dive Into Python V5.4

    《Dive Into Python V5.4》是一本深入学习Python编程语言的经典教程,以其详尽的解释和丰富的实例深受程序员们的喜爱。这个版本是官方提供的最新版本,它不仅包含了PDF格式的完整书籍,还附带了书中所有示例代码,为...

    diveintopython-examples-5.4.rar

    diveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rardiveintopython-examples-5.4.rar

    dive-into-python3 (英文版)+深入python3(中文版)

    《Dive Into Python3》和《深入Python3》是两本深受Python爱好者欢迎的书籍,分别提供了英文和中文的学习资源,旨在帮助读者全面理解和掌握Python3编程语言。这两本书覆盖了Python3的基础语法、高级特性以及实际应用...

    Dive Into Python中文版

    Dive Into Python中文版,精心整理,epub版本方便阅读,下载阅读.

    Dive Into Python 3 中文版

    ### Dive Into Python 3 中文版 - 安装Python 3 #### 标题解析 - **Dive Into Python 3 中文版**:这本书名表明了内容将深入讲解Python 3的各项特性和使用方法,适合希望深入了解Python 3编程语言的读者。 #### ...

Global site tag (gtag.js) - Google Analytics