DiveIntoPython(五)

sillycat

浏览: 2563807 次
性别:
来自: 成都

最近访客更多访客>>

huageng520

learnmore

u012363178

ymgjava

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Scripts

OS Python F#MongoDB Django

DiveIntoPython(五)

英文书地址：
http://diveintopython.org/toc/index.html

Chapter 6.Exceptions and File Handling
6.1. Handling Exceptions
Like many other programming languages, Python has exception handling via try...except blocks.

An error occurred, the exception was printed (depending on your IDE, perhaps in an intentionally jarring shade of red), and that was that. This is called an unhandled exception. When the exception was raised, there was no code to explicitly notice it and deal with it, so it bubbled its way back to the default behavior built in to Python, which is to spit out some debugging information and give up. In the IDE, that's no big deal, but if that happened while your actual Python program was running, the entire program would come to a screeching halt.

Sometimes an exception is really because you have a bug in your code (like accessing a variable that doesn't exist), but many times, an exception is something you can anticipate. If you're opening a file, it might not exist. If you're connecting to a database, it might be unavailable, or you might not have the correct security credentials to access it. If you know a line of code may raise an exception, you should handle the exception using a try...except block.

example 6.1.Opening a Non-Existent File
try:
    fsock = open("/notthere")
except IOError:
    print "The file does not exist, exiting gracefully"
print "This line will always print"

The file does not exist, exiting gracefully
This line will always print

But the file doesn't exist, so this raises the IOError exception. Since you haven't provided any explicit check for an IOError exception, Python just prints out some debugging information about what happened and then gives up.

You're trying to open the same non-existent file, but this time you're doing it within a try...except block.

When the open method raises an IOError exception, you're ready for it. The except IOError: line catches the exception and executes your own block of code, which in this case just prints a more pleasant error message.

6.1.1.Using Exceptions For Other Purposes
The next example demonstrates how to use an exception to support platform-specific functionality. This code comes from the getpass module, a wrapper module for getting a password from the user. Getting a password is accomplished differently on UNIX, Windows, and Mac OS platforms, but this code encapsulates all of those differences.

example 6.2.Supporting Platform-Specific Functionality
try:
    import termios, TERMIOS
except ImportError:
    try:
        import msvcrt
    except ImportError:
        try:
            from EasyDialogs import AskPassword
        except ImportError:
            getpass = default_getpass
        else:
            getpass = AskPassword
    else:
        getpass = win_getpass
else:
    getpass = unix_getpass

A try...except block can have an else clause, like an if statement. If no exception is raised during the try block, the else clause is executed afterwards. In this case, that means that the from EasyDialogs import AskPassword import worked, so you should bind getpass to the AskPassword function. Each of the other try...except blocks has similar else clauses to bind getpass to the appropriate function when you find an import that works.

6.2.Working with File Objects
example 6.3.Opening a File
>>> f = open("d:/data/LastName.mp3","rb")
>>> f
<open file 'd:/data/LastName.mp3', mode 'rb' at 0x0141BB10>
>>> f.mode
'rb'
>>> f.name
'd:/data/LastName.mp3'

The open method can take up to three parameters: a filename, a mode, and a buffering parameter. Only the first one, the filename, is required; the other two are optional. If not specified, the file is opened for reading in text mode. Here you are opening the file for reading in binary mode.

The open function returns an object (by now, this should not surprise you). A file object has several useful attributes.

6.2.1.Reading Files
example 6.4.Reading a File
>>> f
<open file 'd:/data/LastName.mp3', mode 'rb' at 0x0141BB10>
>>> f.tell()
0L
>>> f.seek(-128,2)
>>> f.tell()
5313754L
>>> tagData = f.read(128)
>>> tagData
'TAGLast Name \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Carrie Underwood\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Last Name \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00                              \x0c'
>>> f.tell()
5313882L

A file object maintains state about the file it has open. The tell method of a file object tells you your current position in the open file. Since you haven't done anything with this file yet, the current position is 0, which is the beginning of the file.

The seek method of a file object moves to another position in the open file. The second parameter specifies what the first one means; 0 means move to an absolute position (counting from the start of the file), 1 means move to a relative position (counting from the current position), and 2 means move to a position relative to the end of the file. Since the MP3 tags you're looking for are stored at the end of the file, you use 2 and tell the file object to move to a position 128 bytes from the end of the file.

The read method reads a specified number of bytes from the open file and returns a string with the data that was read. The optional parameter specifies the maximum number of bytes to read. If no parameter is specified, read will read until the end of the file. (You could have simply said read() here, since you know exactly where you are in the file and you are, in fact, reading the last 128 bytes.) The read data is assigned to the tagData variable, and the current position is updated based on how many bytes were read.

The tell method confirms that the current position has moved. If you do the math, you'll see that after reading 128 bytes, the position has been incremented by 128.

6.2.2.Closing Files
Open files consume system resources, and depending on the file mode, other programs may not be able to access them. It's important to close files as soon as you're finished with them.

example 6.5. Closing a File
>>> f
<open file 'd:/data/LastName.mp3', mode 'rb' at 0x0141BB10>
>>> f.closed
False
>>> f.close()
>>> f
<closed file 'd:/data/LastName.mp3', mode 'rb' at 0x0141BB10>
>>> f.closed
True
>>> f.seek(0)
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
ValueError: I/O operation on closed file
>>> f.tell()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
ValueError: I/O operation on closed file
>>> f.read()
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
ValueError: I/O operation on closed file
>>> f.close()

The closed attribute of a file object indicates whether the object has a file open or not. In this case, the file is still open (closed is False).

To close a file, call the close method of the file object. This frees the lock (if any) that you were holding on the file, flushes buffered writes (if any) that the system hadn't gotten around to actually writing yet, and releases the system resources.

Just because a file is closed doesn't mean that the file object ceases to exist. The variable f will continue to exist until it goes out of scope or gets manually deleted. However, none of the methods that manipulate an open file will work once the file has been closed; they all raise an exception.

Calling close on a file object whose file is already closed does not raise an exception; it fails silently.

6.2.3.Handling I/O Errors
example 6.6.File Objects in MP3FileInfo
        try:
fsock = open(filename, "rb", 0)
try:
                fsock.seek(-128, 2)
                tagdata = fsock.read(128)
            finally:
                fsock.close()
            .
            .
            .
        except IOError:
            pass

This is new: a try...finally block. Once the file has been opened successfully by the open function, you want to make absolutely sure that you close it, even if an exception is raised by the seek or read methods. That's what a try...finally block is for: code in the finally block will always be executed, even if something in the try block raises an exception. Think of it as code that gets executed on the way out, regardless of what happened before.

6.2.4.Writing to Files
As you would expect, you can also write to files in much the same way that you read from them. There are two basic file modes:
"Append" mode will add data to the end of the file.
"write" mode will overwrite the file.

Either mode will create the file automatically if it doesn't already exist

example 6.7.Writing to Files
>>> logfile = open("d:/data/test.log","w")
>>> logfile.write("test success!")
>>> logfile.close()
>>> print file("d:/data/test.log").read()
test success!
>>> logfile = open("d:/data/test.log","a")
>>> logfile.write("line 2")
>>> logfile.close()
>>> print file("d:/data/test.log").read()
test success!line 2
>>> logfile.closed
True
>>> logfile
<closed file 'd:/data/test.log', mode 'a' at 0x0141BBB0>

You can add data to the newly opened file with the write method of the file object returned by open.

file is a synonym for open. This one-liner opens the file, reads its contents, and prints them.

Actually you could do this even if the file didn't exist, because opening the file for appending will create the file if necessary. But appending will never harm the existing contents of the file.

You can write a carriage return with the "\n" character. Since you didn't do this, everything you wrote to the file ended up smooshed together on the same line.

6.3.Iterating with for Loops
example 6.8.Introducing the for Loop
>>> li = ['a','b','e']
>>> for s in li:
... print s
...
a
b
e
>>> print "\n".join(li)
a
b
e

The syntax for a for loop is similar to list comprehensions. li is a list, and s will take the value of each element in turn, starting from the first element.

This is the reason you haven't seen the for loop yet: you haven't needed it yet. It's amazing how often you use for loops in other languages when all you really want is a join or a list comprehension.

example 6.9.Simple Counters
>>> for i in range(5):
... print i
...
0
1
2
3
4
>>> li = ['a','b','c','d','e']
>>> for i in range(len(li)):
... print li[i]
...
a
b
c
d
e

Don't ever do this. This is Visual Basic-style thinking. Break out of it. Just iterate through the list, as shown in the previous example.

example 6.10.Iterating Through a Dictionary
>>> import os
>>> for k,v in os.environ.items():
... print '%s=%s' % (k,v)
...
TMP=C:\DOCUME~1\sillycat\LOCALS~1\Temp
COMPUTERNAME=LUOHUA
USERDOMAIN=LUOHUA
COMMONPROGRAMFILES=C:\Program Files\Common Files
...snip...
>>> print "\n".join(["%s=%s" % (k,v)
... for k,v in os.environ.items()])
TMP=C:\DOCUME~1\sillycat\LOCALS~1\Temp
COMPUTERNAME=LUOHUA
USERDOMAIN=LUOHUA
...snip...

With multi-variable assignment and list comprehensions, you can replace the entire for loop with a single statement. Whether you actually do this in real code is a matter of personal coding style. I like it because it makes it clear that what I'm doing is mapping a dictionary into a list, then joining the list into a single string. Other programmers prefer to write this out as a for loop. The output is the same in either case, although this version is slightly faster, because there is only one print statement instead of many.

example 6.11. for Loop in MP3FileInfo
tagDataMap = {"title"   : ( 3, 33, stripnulls),
                  "artist" : ( 33, 63, stripnulls),
                  "album"   : ( 63, 93, stripnulls),
                  "year"    : ( 93, 97, stripnulls),
                  "comment" : ( 97, 126, stripnulls),
                  "genre"   : (127, 128, ord)}
            if tagdata[:3] == "TAG":
                for tag, (start, end, parseFunc) in self.tagDataMap.items():
                    self[tag] = parseFunc(tagdata[start:end])

Once you read the last 128 bytes of the file, bytes 3 through 32 of those are always the song title, 33 through 62 are always the artist name, 63 through 92 are the album name, and so forth. Note that tagDataMap is a dictionary of tuples, and each tuple contains two integers and a function reference.

This looks complicated, but it's not. The structure of the for variables matches the structure of the elements of the list returned by items. Remember that items returns a list of tuples of the form (key, value). The first element of that list is ("title", (3, 33, <function stripnulls>)), so the first time around the loop, tag gets "title", start gets 3, end gets 33, and parseFunc gets the function stripnulls.

6.4.Using sys.modules
Modules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary sys.modules.

example 6.12.Introducing sys.modules
>>> import sys
>>> print '\n'.join(sys.modules.keys())
heapq
code
pywin.framework.cmdline
pywin.idle.traceback
pywin.framework.sys
functools
...snip...

>>> print sys.version
2.6.4 (r264:75706, Jan 22 2010, 16:41:54) [MSC v.1500 32 bit (Intel)]
>>> print sys.version_info
(2, 6, 4, 'final', 0)

The sys module contains system-level information, such as the version of Python you're running (sys.version or sys.version_info), and system-level options such as the maximum allowed recursion depth (sys.getrecursionlimit() and sys.setrecursionlimit()).

sys.modules is a dictionary containing all the modules that have ever been imported since Python was started; the key is the module name, the value is the module object. Note that this is more than just the modules your program has imported. Python preloads some modules on startup, and if you're using a Python IDE, sys.modules contains all the modules imported by all the programs you've run within the IDE.

example 6.13.Using sys.modules
>>> import fileinfo
>>> fileinfo
<module 'fileinfo' from 'E:\book\opensource\python\diveintopython-5.4\py\fileinfo.pyc'>
>>> sys.modules['fileinfo']
<module 'fileinfo' from 'E:\book\opensource\python\diveintopython-5.4\py\fileinfo.pyc'>

As new modules are imported, they are added to sys.modules. This explains why importing the same module twice is very fast: Python has already loaded and cached the module in sys.modules, so importing the second time is simply a dictionary lookup.

The next example shows how to use the __module__ class attribute with the sys.modules dictionary to get a reference to the module in which a class is defined.

example 6.14.The __module__ Class Attribute
>>> from fileinfo import MP3FileInfo
>>> MP3FileInfo.__module__
'fileinfo'
>>> sys.modules[MP3FileInfo.__module__]
<module 'fileinfo' from 'E:\book\opensource\python\diveintopython-5.4\py\fileinfo.pyc'>

Every Python class has a built-in class attribute __module__, which is the name of the module in which the class is defined.

Combining this with the sys.modules dictionary, you can get a reference to the module in which a class is defined.

example 6.15. sys.modules in fileinfo.py
def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
        "get file info class from filename extension"
        subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
        return hasattr(module, subclass) and getattr(module, subclass) or FileInfo

You'll plow through this line later, after you dive into the os module.

In English, this line of code says, “If this module has the class named by subclass then return it, otherwise return the base class FileInfo.”

6.5.Working with Directories

example 6.16.Constructing Pathnames
>>> import os
>>> os.path.join("d:/data/","LastName.mp3")
'd:/data/LastName.mp3'
>>> os.path.expanduser("~")
'C:\\Documents and Settings\\sillycat'
>>> os.path.join(os.path.expanduser("~"),"python")
'C:\\Documents and Settings\\sillycat\\python'

expanduser will expand a pathname that uses ~ to represent the current user's home directory. This works on any platform where users have a home directory, like Windows, UNIX, and Mac OS X; it has no effect on Mac OS.

example 6.17.Splitting Pathnames
>>> os.path.split("d:/data/LastName.mp3")
('d:/data', 'LastName.mp3')
>>> (filepath,filename) = os.path.split('d:/data/LastName.mp3')
>>> filepath
'd:/data'
>>> filename
'LastName.mp3'
>>> (shortname,extension) = os.path.splitext(filename)
>>> shortname
'LastName'
>>> extension
'.mp3'

os.path also contains a function splitext, which splits a filename and returns a tuple containing the filename and the file extension. You use the same technique to assign each of them to separate variables.

example 6.18.Listing Directories
>>> os.listdir("d:\\data\\")
['inventory.xls', 'INVESC10010650\xc5\xcc\xb5\xe3.xls', 'LastName.mp3', 'sample.html', 'test.log', 'test.xls']
>>>
>>> dirname = "c:\\"
>>> os.listdir(dirname)
['.rnd', '360Downloads', '360Rec', 'app-engine-patch-sample', 'AUTOEXEC.BAT', 'avatar.xml', 'bar.emf', 'bea', 'boot.ini', 'bootfont.bin', 'CONFIG.SYS', 'DELL', 'Django-1.1.1', 'Documents and Settings', 'DsoFramer', 'google_appengine', 'heap', 'Intel', 'IO.SYS', 'mongodb-win32-i386-1.2.3', 'MSDOS.SYS', 'MSOCache', 'm_agent_attribs.cfg', 'np-mswmp.dll', 'NTDETECT.COM', 'ntldr', 'pagefile.sys', 'Program Files', 'Python26', 'RECYCLER', 'System Volume Information', 'test.dat', 'WINDOWS', 'YingSoft']
>>> [f for f in os.listdir(dirname)
... if os.path.isdir(os.path.join(dirname,f))]
['360Downloads', '360Rec', 'app-engine-patch-sample', 'bea', 'DELL', 'Django-1.1.1', 'Documents and Settings', 'DsoFramer', 'google_appengine', 'heap', 'Intel', 'mongodb-win32-i386-1.2.3', 'MSOCache', 'Program Files', 'Python26', 'RECYCLER', 'System Volume Information', 'WINDOWS', 'YingSoft']

The listdir function takes a pathname and returns a list of the contents of the directory.

listdir returns both files and folders, with no indication of which is which.

os.path also has a isdir function which returns 1 if the path represents a directory, and 0 otherwise. You can use this to get a list of the subdirectories within a directory.

You can use list filtering and the isfile function of the os.path module to separate the files from the folders. isfile takes a pathname and returns 1 if the path represents a file, and 0 otherwise.You can use os.getcwd() to get the current working directory.

example 6.19.Listing Directories in fileinfo.py
def listDirectory(directory, fileExtList):
    "get list of file info objects for files of particular extensions"
    fileList = [os.path.normcase(f)
                for f in os.listdir(directory)]
    fileList = [os.path.join(directory, f)
               for f in fileList
                if os.path.splitext(f)[1] in fileExtList]

Iterating through the list with f, you use os.path.normcase(f) to normalize the case according to operating system defaults. normcase is a useful little function that compensates for case-insensitive operating systems that think that mahadeva.mp3 and mahadeva.MP3 are the same file. For instance, on Windows and Mac OS, normcase will convert the entire filename to lowercase; on UNIX-compatible systems, it will return the filename unchanged.

For each file you care about, you use os.path.join(directory, f) to construct the full pathname of the file, and return a list of the full pathnames.

Whenever possible, you should use the functions in os and os.path for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like os.path.split work on UNIX, Windows, Mac OS, and any other platform supported by Python.

example 6.20.Listing Directories with glob
>>> os.listdir("d:\\data\\")
['inventory.xls', 'INVESC10010650\xc5\xcc\xb5\xe3.xls', 'LastName.mp3', 'sample.html', 'test.log', 'test.xls']
>>> import glob
>>> glob.glob("d:\\data\\*.mp3")
['d:\\data\\LastName.mp3']
>>> glob.glob("d:\\data\\L*.mp3")
['d:\\data\\LastName.mp3']
>>> glob.glob("d:\\*\\*.mp3")
['d:\\data\\LastName.mp3']

The glob module, on the other hand, takes a wildcard and returns the full path of all files and directories matching the wildcard. Here the wildcard is a directory path plus "*.mp3", which will match all .mp3 files. Note that each element of the returned list already includes the full path of the file.

Now consider this scenario: you have a music directory, with several subdirectories within it, with .mp3 files within each subdirectory. You can get a list of all of those with a single call to glob, by using two wildcards at once. One wildcard is the "*.mp3" (to match .mp3 files), and one wildcard is within the directory path itself, to match any subdirectory within c:\music.

6.6.Putting It All Together
Once again, all the dominoes are in place. You've seen how each line of code works. Now let's step back and see how it all fits together.

example 6.21.listDirectory
def listDirectory(directory, fileExtList):
    "get list of file info objects for files of particular extensions"
    fileList = [os.path.normcase(f) for f in os.listdir(directory)]
    fileList = [os.path.join(directory, f) for f in fileList \
                if os.path.splitext(f)[1] in fileExtList]
    def getFileInfoClass(filename, module=sys.modules[FileInfo.__module__]):
        "get file info class from filename extension"
        subclass = "%sFileInfo" % os.path.splitext(filename)[1].upper()[1:]
        return hasattr(module, subclass) and getattr(module, subclass) or FileInfo
    return [getFileInfoClass(f)(f) for f in fileList]

The nested function getFileInfoClass can be called only from the function in which it is defined, listDirectory. As with any other function, you don't need an interface declaration or anything fancy; just define the function and code it.

Now that you've seen the os module, this line should make more sense. It gets the extension of the file (os.path.splitext(filename)[1]), forces it to uppercase (.upper()), slices off the dot ([1:]), and constructs a class name out of it with string formatting. So c:\music\ap\mahadeva.mp3 becomes .mp3 becomes .MP3 becomes MP3 becomes MP3FileInfo.

Having constructed the name of the handler class that would handle this file, you check to see if that handler class actually exists in this module. If it does, you return the class, otherwise you return the base class FileInfo. This is a very important point: this function returns a class. Not an instance of a class, but the class itself.

Note that listDirectory is completely generic. It doesn't know ahead of time which types of files it will be getting, or which classes are defined that could potentially handle those files. It inspects the directory for the files to process, and then introspects its own module to see what special handler classes (like MP3FileInfo) are defined. You can extend this program to handle other types of files simply by defining an appropriately-named class: HTMLFileInfo for HTML files, DOCFileInfo for Word .doc files, and so forth. listDirectory will handle them all, without modification, by handing off the real work to the appropriate classes and collating the results.

分享到：

DiveIntoPython(六) | ivy和maven问题

2010-03-15 22:43
浏览 1204
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论