[Python源码学习]之bytecode

tomhibolu

浏览: 1431564 次

最近访客更多访客>>

monkeytear

dreamtan

phight

linuxtiandi0001

博主相关

博客

微博

相册

留言

关于我

文章分类

全部博客 (1655)

社区版块

存档分类

源码： xxx.py文件或字符串	==>	字节码：可缓存在xxx.pyc	==>	结果
pythonX.dll libpythonX.X.a	pythonX.dll libpythonX.X.a
`Py_CompileString***（...）`	`PyEval_Eval***（...）`
compile	eval

Python 代码首先被编译成 bytecode，然后才被解释器进行执行。
bytecode 可被缓存动.pyc或.pyo文件内。
bytecode 对应源码中的 PyCodeObject 结构体对象

生成 .pyc 文件

代码中通过import使用到的.py文件会自动编译成.pyc文件，如何手动来编译呢？

交互模式或者代码中

>>> import py_compile
>>> py_compile.compile('hello.py')
>>>

或者使用命令行

python3 -m py_compile hello.py

生成的文件(个人机子上的结果)：

__pycache__/hello.cpython-32.pyc

将当前目录下的文件都编译成 .pyc 使用compileall模块

python -m compileall .

这儿的py_compile和compileall使用的都是builtins模块的compile()函数

builtins

在python执行环境中，builtins模块中：

compile()	编译成字节码，code对象(PyCodeObject)
eval()、exec()	执行

一个例子：

>>> a = "1+2"
>>> b = compile(a, "test.py", 'single')
>>> type(b)
<class 'code'>
>>> eval(b)
3

它们对应C高层接口中的下面两类函数：

`Py_CompileString***（...）`	将python代码编译成bytecode
`PyEval_Eval***（...）`	执行这个bytecode

代码

compile() 和 eval()、exec() 是内建模块中的函数，所以瞅瞅

Python/bltinmodule.c

中定义的方法：

static PyMethodDef builtin_methods[] = {
//...
  {"compile", (PyCFunction)builtin_compile, METH_VARARGS|METH_KEYWORDS, compile_doc},
//...
  {"eval",  builtin_eval,       METH_VARARGS, eval_doc},
  {"exec",  builtin_exec,       METH_VARARGS, exec_doc},
//...
  {NULL,    NULL},
};

其中：

builtin_compile() 调用PyAST_CompileEx或Py_CompileStringExFlags

static PyObject *
builtin_compile(PyObject *self, PyObject *args, PyObject *kwds)
{
....
    is_ast = PyAST_Check(cmd);
    if (is_ast) {
...
            result = (PyObject*)PyAST_CompileEx(mod, filename,
...
        goto finally;
    }
...
    result = Py_CompileStringExFlags(str, filename, start[mode], &cf, optimize);
    goto finally;

finally:
    Py_DECREF(filename_obj);
    return result;
}

eval() 调用PyEval_EvalCode（对于bytecode）或PyRun_StringFlags（对字符串）

static PyObject *
builtin_eval(PyObject *self, PyObject *args)
{
...
    if (PyCode_Check(cmd)) {
        return PyEval_EvalCode(cmd, globals, locals);
    }

    cf.cf_flags = PyCF_SOURCE_IS_UTF8;
    str = source_as_string(cmd, "eval", "string, bytes or code", &cf);
...
    (void)PyEval_MergeCompilerFlags(&cf);
    result = PyRun_StringFlags(str, Py_eval_input, globals, locals, &cf);
    Py_XDECREF(tmp);
    return result;
}

恩，这样一来，总算将C代码和python代码联系上了。

PyCodeObject

前面提到的 bytecode，具体到源码中，就是PyCodeObject对象了(对应python环境中的code)：

定义

先看一下该结构体的定义：

/* Bytecode object */
typedef struct {
    PyObject_HEAD
    int co_argcount;            /* #arguments, except *args */
    int co_kwonlyargcount;      /* #keyword only arguments */
    int co_nlocals;             /* #local variables */
    int co_stacksize;           /* #entries needed for evaluation stack */
    int co_flags;               /* CO_..., see below */
    PyObject *co_code;          /* instruction opcodes */
    PyObject *co_consts;        /* list (constants used) */
    PyObject *co_names;         /* list of strings (names used) */
    PyObject *co_varnames;      /* tuple of strings (local variable names) */
    PyObject *co_freevars;      /* tuple of strings (free variable names) */
    PyObject *co_cellvars;      /* tuple of strings (cell variable names) */
    /* The rest doesn't count for hash or comparisons */
    PyObject *co_filename;      /* unicode (where it was loaded from) */
    PyObject *co_name;          /* unicode (name, for reference) */
    int co_firstlineno;         /* first source line number */
    PyObject *co_lnotab;        /* string (encoding addr<->lineno mapping) See
                                   Objects/lnotab_notes.txt for details. */
    void *co_zombieframe;     /* for optimization only (see frameobject.c) */
    PyObject *co_weakreflist;   /* to support weakrefs to code objects */
} PyCodeObject;

各个成员什么含义？源码中有解释了，下面我们直接看看：如何在python中查看这些成员

查看code的成员

Python提供了简单的封装，于是，我们可以直接查看这些成员。例子：

>>> c = compile("1+2", "test.py", "single")
>>> c.co_argcount
0
>>> c.co_code
b'd\x03\x00Fd\x02\x00S'
>>> c.co_consts
(1, 2, None, 3)
>>> c.co_name
'<module>'
>>> c.co_filename
'test.py'

其中 co_code 就是字节码了：d\x03\x00Fd\x02\x00S

那么如何理解这些代码？？

字节码

co_code 写成10进制：10030701002083

100	指令码： LOAD_CONST
3	co_consts中的第3个常数
0
70	指令码： PRINT_EXPR
100	指令码： LOAD_CONST
2	co_consts中的第2个常数
0
83	指令码： RETURN_VALUE

指令码定义在文件 Include/opcode.h 中。

不过这样阅读指令码真的很难受，幸好，python提供了 dis 模块

dis

用它来看看前面的例子

>>> c = compile("1+2", "test.py", "single")
>>> import dis
>>> dis.dis(c)
  1           0 LOAD_CONST               3 (3) 
              3 PRINT_EXPR           
              4 LOAD_CONST               2 (None) 
              7 RETURN_VALUE

恩，一目了然。最开始的那个1是行号，指令码前面的数字是它在co_code中的索引。

恩，dis 是很有用的东西，不过偶还没学会怎么利用它。

参考

分享到：

C++小问题3则 | hdu 4022 Bombing

2011-09-10 18:25
浏览 881
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论