`
jimcgnu
  • 浏览: 35481 次
  • 来自: ...
最近访客 更多访客>>
文章分类
社区版块
存档分类
最新评论

转:Python in Google(notes took in PyCon)

阅读更多
TITLE OF PAPER: Python at Google
URL OF PRESENTATION: --not available--
PRESENTED BY: Greg Stein
REPRESENTING: Google

CONFERENCE: PyCon 2005
DATE: March 25, 2005
LOCATION: Marvin Theater
--------------------------------------------------------------------------

REAL-TIME NOTES / ANNOTATIONS OF THE PAPER:
{If you've contributed, add your name, e-mail & URL at the bottom}

[ A new copy of the O'Reilly Python Success Stories booklet will be produced
Contact Stephan Diebel @ pythonology.org ]


"Python has been an important part of Google since the beginning, and remains so as the system grows and evolved. Today dozens of Google engineers use Python, and we're looking for more people with skils in this language"
-- Peter Norvig, Director of Search Quality at Google

My background
Python developer
10 years
Contributed to Python itself
Authored a number of modules and applications
ViewCVS

Open Source Guy
Contributed to numerous projects (including Python)
Current chairman of the Apache Software Foundation
ViewCVS, written entirely in Python
Contributed to Subversion, Apache server

"We consider Python to be our 'secret sauce'"
--Paul Everitt, talking about Digital Creations, circa 1996
This is a recognition of how Python can help a business.

My view of Python in the workplace
Python at eShop
1995 "What in the world is Python?"
1996 "This is great stuff."
(MS acquired eShop in '96)

Python at Microsoft
1996: "It's called what?"
1997: "You actually shipped Python code?" (MerchantServer 1.0)
1998: "Nice prototype. We'll rewrite it in the next version." And they
did, in C++.


Python in the workplace (continued)
Python at CollabNet
2001: "No, we don't really use Python here." (they used Java)
2003: "Definitely! Write that in Python"

Python caught on here like a virus, moving from developer to developer.

Python at Google
2004 "Of *course* we use Python. Why wouldn't we?"


Changing attitudes over time
Small companies eventually "Got it" ahead of the curve
Champion was needed

Larger Companies follow Python's growth curve
Supporting environment was needed

A number of factors made Python possible in larger organizations:
It is now possible. Here's why:
Python had to grow for it to become "business acceptable"
Large enough talent pool - "where are we going to be able to find these people?"
Support services: Books, Consulting, World Wide Web
Follow the trailblazers
Python passed the tipping point years ago
Not a problem to incorporate it into your business, lots of support,
consulting

Business advantage
"These are some of the reasons we use Python at Google"
Highly adaptable
Changing requirements
- You need a language that is very flexible, so you can adapt your tools during development
Changes in computing environment

Rapid development
For new and experienced developers
The market moves very very quick; you want to be able to keep up with it. If it takes two years for you to respond to something that is needed today, you're behind the curve.

Easy to maintain - most important point in Greg's viwe
You can come back a year later, look at that code, and understand what
is going on

Google's programming environment
Primary Languages
C++
Java
Python

If you want to write a piece of something else, like Perl, you have to
almost get special permission. (Exceptions in ops, but for actual
product stuff, see above)

Miscellaneous
Some Perl used by Operations (others almost have to get permission to use Perl)
PHP creeeps in for internal webapps
Saw Ruby sneaking around
Small amount of C#

In actual progress stuff, C++, Java, Python


SWIG is your friend
SWIG: Simplified Wrapper Interface Generator
www.swig.org
Started by David Beazley
Multi-language environment
A lot of people at Google don't know Python and produce C++ code.
SWIG pulls these "islands together"--they have a lot of stuff lying
around written in various languages. SWIG examines a C++ header file
and auto-generates Python bindings
So for all of our libraries that we have - for parsing HTML,
crawling HTTP and so on - they are made available to Python
using SWIG.
Good for Google programmers who use C++ but don't know Python
Very fast mechanism for integration

Integrated into build system
Makes it very easy for us to add a rule into our build system to just add a library into our python dependancy module

Where do we use it?
Across our internal network
Across a system lifecycle
Live Services

Basic Network
<diagram servers="" to="" infrastructure="" through="" pushing="" development="" of="">

Some usage to support development
Wrappers for Version control (Perforce) (JB note: Perforce can output
marshalled Python objects -- very cool, extremely useful for scripting. Also see svn SWIG mention in Q&A)
They improved branch management.
Running unit tests on checkin
People "earn" their ability to check in after then understand code
guidelines, etc.
Automatically enforce style guidelines
Build System (itself written in Python)
Packaging
We've got giant bundles of code and giant bundles of data which need to
be delivered up to the servers.
Packaging system is built in Python
Third generation of this system
Ability to roll back a version
We can keep iterating and moving forward because we're building all this stuff in Python

Some usage in the network infrastructure
Binary/data pusher
Figures out best way to send stuff from one place to
another -- dev to data center, etc
We're on third/fourth generation of this, keep increasing the scale of
the problem. Python's making that possible - able to iterate quickly
Package repository

Some usage on production servers
Monitoring
Is this thing still alive? Is it running? Does it think it's healthy? Is
it seeing problems with the hard disk? Is the CPU temperature fine?
All of this information is gathered with a little Python program running on the server, then collected by another Python program.
Auto-restart

Complete the Lifecycle
Log reporting
We generate a "large" amount of log information
Data is pulled back from the servers
Analyzed using lots of Python tools
Ad group needs to spot fraudulent clicks. This is a constant cat-and-
mouse game with the script kiddies writing fraudulent ad clickers.
Easy to alter the reports based on ever-changing needs
Every time we find some way people are fraudulently clicking our ads, we
patch that hole. It's a continuous process.


Python-based servics
Google Groups
"Python Old-timers" David Jeske and Brandon Long (of eGroups and
Neotonic/ClearSilver) are the leads on Groups.
All built using Python code
Highly pythonic
They didn't use that giant mountain of C++ stuff
code.google.com
Stein and DiBona
Others? We have so much going on...

How code.google.com was built (block diagram)
/\ \/
Front end Stuff
/\ \/
code.google.com
SWIG
Google Stuff

The funky front end stuff deals with denial of service attacks, reporting, blocking IPs known to be bad
We get to take advantage because we've wrapped this
The HTTP server it's built on has all of the reporting and monitoring things on it - the "Google Stuff"


code.google.com
goopy package - support for functional style programming
Functional stuff to start with
Place to put future modules

Closing
We have a lot of Python code, covering a broad range of needs.
Python has helped Google for many, many years.
SWIG is underrated.
I saw a little rant on Guido's blog (Guido shakes head) - it's kind of difficult to get your head wrapped around it but when you need access to some library of functionality from Python you don't need to go and bulid it yourself - you can use SWIG to wrap it automatically. This fits the Python ideal of smart reuse.
We are now starting to open-source some of the pile.


Questions and Answers (a good 25 minutes for these)

Q: When are you going to open source the build system? (Guido)
A: I don't know. If I recall, Greg has talked about it
Chris DiBona: We're thinking of releasing some of our wrappers around
Perforce first

Q: About SWIG, have you looked at the Boost::Python library?
A: I did see that come up recently; I don't think we use it a lot but it has
been mentioned. I'll take a closer look at it.

Q: What about ctypes?
A: I saw that a while ago on a different project. As far as I know we don't use
it, SWIG works well with our build system
Q: elaborates on ctypes/SWIG differences. While SWIG will build a
Python wrapper for a given C lib, ctypes will let you dynamically load up a C
lib and call its functions.
A: calldll does something similar for windows environment

Q: Do you do anything in regard to network monitoring / SNMP with Python?
A: We do have a very large internal network, lots of traffic, the Ops guys do
have monitors to watch the flow, have to schedule moving large (100 GB or 1
TB-size) files.

Q: (Alex Martelli - who is starting at Google in three days) Back to the
wrapping issue. SWIG and ctypes will not help at all with C++ templates -
Boost is better in this regard. SWIG has been extended to support templets
recently.
A: We do use some templates, but we normally try to avoid them and use SWIG. In
that sense, SWIG works well for us. Some of the template stuff I'd like
better access to, and I end up having to do some extra goo to get things
working.

Q: What is missing from the Python ecosystem?
A: (Anna Ravenscroft, Alex's wife, yells "Alex") But we've solved that problem.

Today they are mostly using Python 2.2, trying to figure out how to use
Python 2.3 -- big upgrade problem

Q: How do you evangelize people who are happy with C++ and SQL and don't seem to
want to try Python?
A: We make it easy to use any of the languages, and don't really force people to
use a different language. The different applications are based on what the
team understands best. We make it easy for all of these things to interact -
if you have a server written in Java we have a custom RPC system that helps
bridge the gap and communicate with other servers.

Q: How many software engineers roughly does Google employ (Steve Holden)?
A: I do know that the public employee count is over 3,000 employees as of
December, but I don't know the break-out in terms of numbers of engineers.
It's hundreds of engineers but I can't really say any more.
Some of the apps written in Java (blogger) can communicate with C++ using
RPC
, so not using Python is not a problem

Q: You must have masses of linguistic data (terabytes). How do you access that
data so fast?
A: Yes. I don't know, I don't work in that area. As far as speed, "we just
throw servers at it."

Q: Within Google, is there anything for which Python is considered inappopriate?
A: Is there anything where Python is not appropriate? Well yeah, something like
our indexing system where we scan the web pages and produce an index. Python
is good, and fast, and IronPython is even faster, but it's not fast enough.
We use C for that.

For other things, it's based on the engineering team. We make it possible for
the teams to use what language they like.

Personally, I'd like to see more Python, so some of the things I've been
doing have been working on enabling that.

Q: What kind of bug-tracking system do you use?
A: Bug tracking. Our system is not that good.
We have one, anybody in their right mind has one
Bugzilla derivative
MS has an awesome bug tracking system
Even what I had at collab.net was better
Google's looking at different options for fixing that system.

Q: I want to jump in with another comment on wrapping. I have a plotting library
in C++ with heavy use of templates and I tried wrapping it in three different
things (cxx, Boost, and SWIG). SWIG is actually pretty good now, swig
template support is much better than it used to be. Boost makes things way,
way too big.
A: Based on this feedback it seems like Boost is capable in certain environments
and is definitely worth looking at. Need to evaluate before using.

Q: SWIG performance in real time environment?
A: It is a non-issue. However, I was challenged about this at MS: someone said
"Python won't be fast enough!" I said, "how fast does it have to be? 1000
pages per second?" He couldn't say. So I said "then just don't worry about
unless it proves too slow."
We did go ahead and rewrite some of Python the stuff into ActiveX COM objects
and ASP and... it was slower (laughter and applause).
Much time in Python is spent outside the interpreter loop; much time is
spent, e.g., in the String object, which is written in C.

[On code.google.com] There's still that Global Interpreter Lock in there, but
I still saw some SERIOUS page performance on that thing. Don't be afraid of
bringing Python into your projects.. Your bottleneck will be the network
bandwidth (some person on a 56kbps line), not Python

Q: Mentioned a number of languages used at Google. We use Python because it's
terser (among other reasons). Can you speculate on lines of code in various
languages at Google? (Do you even know total lines of code at Google?)
A: I have no idea. It's a LOT.
Joke from audience: the code counter is still running!
C++ is probably the majority, probably followed by Python.
C++, Python, Java - gut feeling

Q: Five years from now, if people are right about Moore's law, more
multiprocessor systems. What about the getting rid of the Global Interpreter
Lock project that you did a few years ago?
A: Wow. Yeah, that was a few years ago. Back in '96 I made a few patches to
Python 1.4 to get rid of the GIL. We used that at MS to make free threaded
COM objects. We were getting a lot of lock contention. We had to protect
different data structures - like in Python there are pools of frame objects
which had to be protected (??). Things were blocking around those pools. For
2 processors there was a bonus, but for 3 or 4 it was actually slower.

Free threading - Python's thread state was one of the benefits from that set
of patches. sys.exc_info was another.

The Global Interpreter Lock hasn't actually been a problem.

Q: Every once in a while, you are going to introduce a bug into the system. How
do you guys debug across the language boundaries?
A: We don't have any particular tools, or antyhing like that. Have libraries for
logging. My favorite technique is adding print statements (applause/
laughter). It would be wonderful if we had special tools but we don't.

Some people ask what IDE they should use for cross language Java/Python
development. Eclipse is quite good, but even that doesn't have any cross-
language stuff.


Q: Do you have any current hobby projects that you are working on that you can
talk about?
A: Stuff outside Google they can't tell me not to talk about.
Subversion based wiki (subwiki)
svn exposes its libraries to Python via SWIG
You could build a new svn client or interact with a server from Python
ViewCVS does this
subwiki uses the svn repository to store the wiki pages
Googly stuff - mostly code.google.com

Q: What does Google have to say about web application frameworks
A: It's a tough one. Lot of stuff set up in C++. code.google.com was not built
using an off-the-shelf framework; we used Google's custom HTTP server.

GMail is not written in Python. I don't actually know if it's C++ or Java. (Chris DiBona: it's Java.)

Q: Followup - is there anything that Google can contribute (via open source) in the web framework arena?
A: Got a lot of stuff we've been talking about moving into the open soruce arena. Sturelies on Google-specific stuff, won't be interesting outside of Google.


Q: Tim O'Reilly talked about Google redefining applications. In this view we're
sort of moving away from Google 1.0. When you upgrade, what sort of staging
environment do you have?
A: We definitely have staging environments. One of the things built in to the
systems I talked about for moving things out. The main web server -
www.google.com - is a BIG chunk of code and data - because we have
translations and stuff for everything. In any case, they're called canary
servers (chuckles from crowd) - we put stuff on the canary servers and see if
they're going to fall over. Also, because we get so much traffic we can turn
a knob and expose something like 1% of our traffic to those servers. If they
don't fall over, we expose some more.
The "turning the knob" is a little command line tool written in Python.

Q: (Alex Martelli) Prompted by your mention of unwrapping pieces so they can be
open source. It actually sounds like something that's a very good software
engineering exercise, because it forces decoupling from your proprietary
stuff. Even if we never open source the actual pieces, just having done the
unwrapping seems like a big advantage.
A: It would be a big advantage if we were distributing code. For us, a 50 MB
executable is not a problem, though you'd never try to push that to a client
too often. While it would be an interesting engineering exercise and would
improve the code it has not been a priority.

Chris DiBona followup: Opening your code tends to make it better, for example in our (?)malloc library we said it worked faster for these situations, and when we looked at it we found a bug in our code.


--------------------------------------------------------------------------
REFERENCES: {as documents / sites are referenced add them below}
http://www.swig.org
http://code.google.com

--------------------------------------------------------------------------
QUOTES:
"We don't do that at Microsoft; we ship C++ code"
"Python passed the tipping point years ago"
"[You can] read [Python] in 2 hours, program in it in 2 days, be productive for the company in 2 weeks."
"We use a LOT of SWIG"
"We've got quite a few servers..." (laughter)
"I've worked in large environments before, but nothing on the order of this"
"We have a lot of log data"
"Today we're using primarily Python 2.2 deployed on our servers, but we're trying to work out how to move to Python 2.3."
"Our bug tracking system is not that good"
"Pushing bits out to some guy on a 56k modem IS your bottle neck. Pulling records out of a database is your bottleneck. It's very rarely going to by Python."
"I think we probably have more Python code than we have Java" - a guess
"I think we probably have more Python than we do Java, because of all of those tools and things for supporting the environment, wrappers and all these things."
"Mr. Ascher. That's Dr. Ascher, to you."
"My favourite debugging environment is PRINT."

--------------------------------------------------------------------------
CONTRIBUTORS: {add your name, e-mail address and URL below}
Ted Leung </diagram>
分享到:
评论

相关推荐

    learnpythoninminutes:Python 快速入门

    在PyCon India 2014大会上,这个研讨会可能深入浅出地介绍了Python的核心概念,包括变量、数据类型、控制结构(如if-else语句、for循环和while循环)、函数、模块、类和对象等。Python的这些基础知识是构建任何复杂...

    pycon:PyCon谈话示例

    约翰·里斯(John Reese)的Python谈话和示例 反正协程是什么? 正如在加利福尼亚州Petaluma的North Bay Python 2019上看到的那样: 友好和现代的AsyncIO Pycon US 2019在克利夫兰举办的研讨会: 驯服Python包装的...

    data-wrangling-pycon:Python数据整理简介

    在这里,您将找到一些有用的脚本和数据,以开始使用Python进行数据整理。 如果您使用的是Python 2.7+,则要安装要求 $ pip install -r requirements.txt 如果使用的是Python 3.4+,则要安装要求 $ pip install -r...

    goodreads:Python中Goodreads API的简单客户端

    一个简单的客户端,用Python包装了Goodreads API的各个部分。 应用授权 设置开发人员凭据并将其存储为环境变量CLIENT_ID和CLIENT_SECRET之后,对应用进行授权(需要完成一次)非常简单: ..代码块:: pycon &gt;&gt;&gt; ...

    pycon:PyCon 幻灯片

    【PyCon 幻灯片】是一个关于Python编程语言和技术交流活动——PyCon的幻灯片资源集合。PyCon是全球范围内的Python开发者大会,每年都会举办,旨在促进Python社区的交流与学习,分享最新的技术发展和实践经验。这些...

    颜色分类leetcode-pycon:使用Python分析空间数据-Pycon2017-哥伦比亚

    胡安·门德斯Pycon 2017 哥伦比亚 要求 Python 2.7.x Python 点 地理信息系统 pip install -r requirements.txt 酿造安装mapnik brew 安装空间索引 数据 世界边界 ( ) 关于“odebrech”的推文 ( ) 地震 ( ) 爱德卡 ...

    python-docs-fr:Python文档的法文翻译

    该项目是与 (法语法语协会)合作进行的, 的目标是通过所有法语国家(包括PyCon的法语版本, ,几个城市并维护了许多社区项目,以丰富生态系统。 协助翻译 您可以贡献: 通过提供Github拉取请求(推荐的解决方案...

    python2和python3的差异详情

    知识点一:Python2和Python3的主要差异 Python2和Python3是Python语言的两个主要版本,它们之间存在一些差异。首先,Python2和Python3对print函数的处理方式不同。在Python2中,print是一个语句,而在Python3中,...

    pycon2015:我在 PyCon 2015 上的演讲中的示例代码

    在PyCon 2015大会上,一场名为“ ”的演讲展示了Python编程的强大与灵活性。这次演讲涵盖了Python语言的各种核心概念和技术,通过一系列生动的示例代码,让参会者深入理解了Python在实际开发中的应用。以下是根据...

    python-3.8.1-amd64.rar

    - **Python社区**: 全球各地有许多Python用户群组和会议,如PyCon等,是交流和学习的好地方。 总之,下载并安装“python-3.8.1-amd64.rar”压缩包中的Python环境,将为你开启一个功能强大且易学的编程世界。无论是...

    py-must-watch:必须观看有关Python的视频

    托马斯·沃特( Thomas Wouters):进阶Python(或了解Python) (Google) 视频: [01:15:43] 2009年 大卫·比兹利(David Beazley): Python GIL内部 影片: [01:01:03] 2010年 布兰登·罗德斯:强大的字典...

    pyconchina-2014-talk:Python 中的实用机器学习

    PyconChina2014之Python机器学习实践讲稿 安装依赖 pip install -r requirements.txt 生成 make

    sklearn_pycon2015:我的Pycon 2015 scikit学习教程的材料

    PyCon 2015 Scikit学习教程 注意:有关更新的教程内容,请参见 授课教师:Jake VanderPlas 电子邮件: 推特: github的: 该存储库将包含与我的PyCon 2015 scikit-learn教程相关的文件和其他信息。 视频 本教程...

    pycon-2015:PyCon的幻灯片和代码-2015-使用Python和d3消费政府数据

    【标题】"PyCon 2015: 使用Python和d3消费政府数据"是关于在2015年PyCon大会上的一场演讲,重点讨论了如何利用Python编程语言和d3.js(Data-Driven Documents)JavaScript库来分析和可视化政府公开数据。PyCon是一个...

    pycon2015-workshop:PyCon 2015 研讨会材料

    #欢迎来到 PyCon 2015 的 Cloudpipe! 准备/安装 您必须在本次研讨会中使用 Python 2 并安装我们的multyvac fork: pip install vac :warning: 如果您已经安装了 multyvac,您可能想要删除~/.multyvac 。 另请...

    2014-slides:PyCon 2014 的幻灯片

    标题“2014-slides:PyCon 2014 的幻灯片”揭示了这是一个关于Python编程语言的会议——PyCon 2014的演讲材料集合。PyCon是一年一度的全球性Python开发者大会,参与者分享、讨论与Python相关的最新技术、最佳实践以及...

    2017java源码-pycon2017-closures:PyconSK2017中的Python,Java,C#,JavaScript的幻灯

    2017年java源码Python中的闭包 这是在PyCon SK 2017上提供的Closures in Python对话Closures in Python支持材料。 谈话的目的是向观众介绍一般的封闭剂。 演示还包括来自其他语言(例如Java,C#或Javascript)的...

    python官方3.9.2版本exe安装包

    - Python有全球性的开发者社区,如PyCon会议,以及各种地方性的Python用户组,为学习者提供交流平台。 总之,Python 3.9.2是一个功能强大的编程工具,适用于各种用途。通过"python-3.9.2.exe"安装包,Windows用户...

    PyConApac2015:PyCon APAC 2015 演示代码

    PyCon APAC 2015 演示代码 幻灯片的网址: : 对于演示代码: 静态地图应用 谷歌静态地图演示 在 GoogleStaticMapsDemo 文件夹下启动开发 Web 服务器(python -m CGIHTTPServer) 连接到 网络静态地图演示 在 ...

    pycon-2017-eda-tutorial, PyCon 2017教程的资源,"Exploratory data analysis in python".zip

    pycon-2017-eda-tutorial, PyCon 2017教程的资源,"Exploratory data analysis in python" EDA教程这个 repo 保存了为教程开发的内容,在 python的Exploratory探索性数据分析中,在 2017年05月17日的PyCon 。...

Global site tag (gtag.js) - Google Analytics