`
hqman
  • 浏览: 361632 次
  • 性别: Icon_minigender_1
  • 来自: 苏州
社区版块
存档分类
最新评论

Dropbox python开发中6点教训(每十五分钟同步100万个文件)

 
阅读更多

Dropbox saves one million files every 15 minutes,  more tweets than even Twitterers tweet. That mind blowing statistic was revealed by Rian Hunter, a Dropbox Engineer, in his presentation How Dropbox Did It and How Python Helped at PyCon 2011.

The first part of the presentation is some Dropbox lore, origin stories and other foundational myths. We learn that Dropbox is a startup company located in San Francisco that has probably one of the most popular file synchronization and sharing tools in the world, shipping Python on the desktop and supporting millions of users and growing every day

About half way through the talk turns technical. Not a lot of info on how Dropbox handles this massive scale was dropped, but there were a number of good lessons to ponder:

  1. Use Python
    • 99.9 % of their code is in Python. Used on the server backend; desktop client, website controller logic, API backend, and analytics.
    • Can't use Python on the Android due to memory constraints.
    • Runs on a single code base using Python. Dropbox runs on Windows, Mac, Linux using tools like PyObjs, WxPython, types, py2exe, py2app, PyWin32.
    • Pros: 
      • Developers talk to each other and express ideas in Python
      • Easy to learn, easy to read, easy to write, easy for new people to pick up.
    • Cons: 
      • Don't be silly. 
      • OK, it can use too much memory and be too slow. Not a big deal on the server side, just buy bigger machines. On the client side you can't get an old Power PC user to upgrade.
      • Coding in a mixed environment of Python and C creates problems because it's hard to profile across the language boundaries like you want to do when fixing memory and CPU problems.
      • Memory fragmentation issues are reason why scripting languages may not be a good idea for long running processes.
  2. Just Work Baby
    • Shouldn't matter what file system you are on, what OS you are using, what applications you are using. The product should always just work.
    • Python helped them iterate fast through all the different error cases they experienced on the wide variety of platforms they support.
  3. Release Early
    • Code something in a day and release it. Python makes that easy.
  4. Use C for Inner Loops - Optimizing CPU is easy
    • A way to handle the too slow problem.
    • Optimize inner loops to reduce CPU time. 
    • 44% of overhead when looping in Python vs C (2.88s vs 1.61)
    • Python VM bytecode dispatches are really slow. 
    • Many tools exist for profiling CPU. 
    • CPU optimizations are usually limited to small code sections.
  5. Poll - Polling 30 Milion Clients All Over the World Doesn't Scale 
    • Created an HTTP notification structure to avoid polling the server on the client site.
  6. Custom Memory Allocator - Optimizing Memory is Hard
    • This was there biggest problem for a while. Could use huge amounts of memory and the memory would never be freed. For large sync they could use up to 1.5GB, now they rarely use more than 100MB.
    • Hard because: 
      • Few tools exist for profiling memory for Python and C
      • Memory bloat has so many causes: leaks in Python and C code; memory fragmentation; inefficient use of memory.
    • Fixing obvious memory inefficiencies didn't help. They thought there was a memory leak, but there wasn't.
    • Problem turned out to be memory fragmentation. Memory fragmentation is what happens when different sized memory blocks are continually being deleted and allocated. What happens is contiguous blocks of memory can no longer be allocated. CPython doesn't have a garbage collector, so all this memory simply wasn't able to be allocated and the heap continually grew so memory requests could be satisfied.
    • Solution was to create a custom allocator. The file meta-data object grows a lot when doing transfers, so the obvious low hanging fruit was to create a custom allocator in C using mmap.

Future Directions

  • Dropbox on toasters. File sharing on toasters will be really big.
  • They see folders as a unifying metaphor for storing, organizing, and accessing data in the cloud and on any device, anywhere, anytime. 

Related Articles 

 

分享到:
评论

相关推荐

    Python-dbxfs允许您将Dropbox文件夹挂载就像它是本地文件系统一样

    Python-dbxfs是一个开源工具,它实现了将Dropbox云存储作为一个本地文件系统挂载的功能,让用户可以像操作本地文件一样方便地与Dropbox交互。这个工具对于经常在Python环境中处理Dropbox文件的开发者来说,提供了极...

    Dropbox 最好的同步本地文件的网络存储在线应用

    Dropbox是一个提供同步本地文件的网络存储在线应用。支持在多台电脑多种操作中自动同步。并可当作大容量的网络硬盘使用。Dropbox采用免费试用+高级服务收费的Freemium模式,最初2GB空间免费,此后则需要按月支付存储...

    dropbox for linux 核心文件

    Dropbox是一款广受欢迎的云存储服务,它允许用户在多台设备之间同步文件和数据。在Linux系统中,虽然没有官方的图形化安装程序,但可以通过手动方式安装Dropbox的核心文件来实现服务。本篇文章将详细讲解如何在Linux...

    Python库 | dropbox-1.2.tar.gz

    Python库是开发者在编程时经常会...通过这个库,开发者可以轻松地在Python应用中实现对Dropbox的文件操作,如上传、下载、同步等。使用该库前,需要先将其正确安装,然后按照库提供的API接口编写代码,实现所需功能。

    Laravel开发-dropbox

    在本文中,我们将深入探讨如何在Laravel框架中集成Dropbox服务,以便在你的Web应用程序中实现文件存储和同步功能。Laravel是一个流行的PHP框架,以其优雅的语法和丰富的生态系统而受到开发者的喜爱。Dropbox则是一个...

    ubuntu上安装dropbox所需文件

    6. **同步文件**:现在,你可以在Dropbox的本地同步文件夹中添加文件,它们将自动上传到云端。同样,任何在线更改也会同步到你的Ubuntu系统。 请注意,这些步骤可能会因Ubuntu的版本和Dropbox的更新而略有不同。...

    Python库 | dropbox-5.2.1-py3-none-any.whl

    在Python开发中,Dropbox SDK是一个重要的工具,它使得开发者能够方便地与Dropbox云存储服务进行交互。本文将深入探讨Dropbox SDK的版本5.2.1,及其在Python环境中的具体应用。 首先,我们关注的核心文件是"dropbox...

    Python_Git和Dropbox之间的透明桥梁使用Dropbox共享文件夹作为Git远程.zip

    在IT行业中,版本控制是开发工作中的重要环节,而Git是目前最流行的分布式版本控制系统。同时,云存储服务如Dropbox则提供了方便的数据同步和共享功能。本资料"Python_Git和Dropbox之间的透明桥梁使用Dropbox共享...

    同时运行多个dropbox

    6. **使用多个操作系统用户**:在支持多用户的系统(如Windows或macOS)上,为每个Dropbox账户创建一个用户,这样每个用户都可以有自己的Dropbox客户端。 7. **编程脚本**:对于技术熟练的用户,可以通过编写脚本...

    DropBox 2.2.10简体中文安装版

    DropBox 2.2.10简体中文安装版是一款深受用户喜爱的云存储服务,专为方便用户在多设备间同步和分享文件而设计。这个版本特别为中国用户提供了简体中文界面,使得操作更加直观易懂,尤其适合对英文界面不太熟悉的用户...

    dropbox的android开发包及文档

    在Android平台上,开发者可以利用Dropbox的SDK来集成Dropbox服务,让用户能够直接在自己的应用程序中访问Dropbox账户,进行文件的上传、下载、同步等操作。这份"dropbox-android-sdk-1.5.3"压缩包包含了Android版...

    Laravel开发-laravel-dropbox

    在本文中,我们将深入探讨如何在Laravel框架中集成Dropbox服务,以便在Web应用程序中实现文件存储和同步。Laravel-Dropbox是为Laravel 5设计的一个Dropbox API桥接器,它使得开发者可以方便地利用Dropbox的云存储...

    Python-Dropbox的现实密码强度评估器的Python实现

    在Python中实现这个功能,可以为用户提供即时的反馈,鼓励他们选择更为复杂的密码,从而提高系统的安全性。 zxcvbn库的核心算法基于多种策略来评估密码强度,包括: 1. **常见词汇匹配**:库会检查密码是否包含...

    Dropbox PDF

    这个文件夹与其他普通文件夹相似,但具备一项特别的功能:任何保存到该文件夹中的文件都会被同步到您在其他设备上的Dropbox账户以及Dropbox网站上。 在Dropbox文件夹上方,有一个绿色图标用于指示Dropbox的状态: ...

    64位Ubuntu安装DropBox所需文件

    在这个场景中,我们需要两个关键文件:`dropbox-lnx.x86_64-2.4.7.tar.gz`和`dropbox_1.4.0_amd64.deb`,它们是Dropbox在Linux系统上的安装包。 `dropbox-lnx.x86_64-2.4.7.tar.gz`是Dropbox的可执行文件,它以.tar...

    Python-Dropbox-Clone:这是一个简单的python服务器和客户端,可将源目录中的文件从客户端同步到服务器上的目标文件夹-python source file

    这是一个简单,简单的python服务器和客户端实现,可将源目录中的文件从客户端同步到服务器上的目标目录。 总体上,通过粗略的测试和基本功能的实施,该项目花费了3 1/2小时才能完成。 当前,客户端和服务器可以...

    Go-一个类似Dropbox的文件管理器可让您在任何位置管理数据

    标题中的“Go-一个类似Dropbox的文件管理器可让您在任何位置管理数据”表明我们正在讨论一个使用Go语言开发的文件管理应用,其功能类似于知名的云存储服务Dropbox。这款应用允许用户跨多种协议和平台管理他们的数据...

    Dropbox网络共享文件

    标题中的“Dropbox网络共享文件”指的是Dropbox这一云存储服务,它允许用户在网络上存储、共享和同步他们的文件。Dropbox的核心功能就是提供一个在线的存储空间,使得用户可以随时随地访问自己的文件,无论是在家里...

    Python-pyfilesystem2一个Python的文件系统抽象层

    `pyfilesystem2` 是一个非常有用的 Python 库,它提供了一个统一的接口来处理各种类型的文件系统,包括本地磁盘、网络共享、压缩文件(如 ZIP 和 Tar)、版本控制系统(如 Git 和 SVN),甚至是内存中的虚拟文件系统...

Global site tag (gtag.js) - Google Analytics