`
love~ruby+rails
  • 浏览: 849084 次
  • 性别: Icon_minigender_1
  • 来自: lanzhou
社区版块
存档分类
最新评论

Everything is Unix

阅读更多

Recently there has been some chatter on various programming blogs about how we should be using classic Unix features to build more scalable infrastructure. This all started when Ryan Tomayko wrote I like Unicorn because it’s Unix. The gist of that post was that Eric Wong’s Unicorn, an HTTP server written in Ruby, performed extremely well despite being written in Ruby because Eric wasn’t afraid to drop down to the lower-level Unix system calls instead of using the language’s traditional higher level abstractions.

Ryan does an excellent job of explaining exactly how following age-old Unix design patterns and using those system calls is the right way to go. He provided the code for a simple TCP “echo server” that can handle clients very efficiently. Not to be outdone, other popular scripting languages saw their advocates step forward with examples of doing the same thing. In Python is Unix, Jacob Kaplan-Moss provides a Python implementation of the same echo server. In Perl is Unix, Aristotle Pagaltzis presented a Perl implementation of a pre-forking echo server as well.

What you notice in each example is that the code is surprisingly readable and simple, letting the operating system to the really heavy lifting. In fact, they’re all very similar. Most of the real differences are syntactic sugar from the particular language. And that’s the whole point of these examples. Linux and Unix have some amazing built-in facilities for solving common problems and they’ve been around for a long time. But the reality is that many of the people coding in higher level languages like Ruby, Python, or Perl may not even be aware of them. Making matters worse, as Ryan points out, is a lack of documentation. Some of the high-level languages (Ruby in particular) do a poor job and really describing the low-level calls they expose or why you might use them. So if you don’t already have more than a basic understanding of Unix systems programming to fall back on, the odds are really not in your favor.

I won’t reiterate the benefits of the networking system calls that Ryan, Jacob, and Aristotle showcase in their examples. But I would like to consider a few other Unixisms that are ofter overlooked and can make some classes of problems easier to solve.

Multi-process with fork()

Ryan touches on this a bit in his post, but I’d like to draw a bit more attention to the power of the fork() system call. When you have a lot of work to do, using fork() to make one or more “clones” of your process so you can divide and conquer works quite well–especially on multi-core CPUs. Using waidpid(), the parent process has a reliable mechanism for waiting for all the workers to exit().

One of the main benefits of fork() is that child processes will inherit almost everything from their parent. That can be especially helpful if there’s a large amount of identical information that each worker needs to have fast access to. The parent can read that data in before forking and each child will also inherit a copy. But thanks to modern copy-on-write techniques, the child process ends up sharing the exact same copy that the parent had. So if the parent reads in 512MB of data and then forks 10 children, you don’t need 5GB of RAM to support the children. Unless they start modifying the data, memory bloat will not be an issue.

The best part of all is that this all completely automatic, on by default, and something you get for free. There’s not low-level programming required on your part to get these benefits. The operating system knows how to do these things and does them quite well.

Atomic File Operations and Locking

Often times when you have multiple process all trying to fetch data from the same pool or perform the same service, you can end up end up with a hard to recreate and debug race condition. You may end up with multiple process believe they have exclusive access to a given resource, duplicating each other’s work and potentially causing a myriad of problems that could be challenging to undo.

A classic Unix solution to this problem is to use an atomic file operation. The common choice is to use one of the atomic filesystem metadata operations, such as rename(). The basic operation is like this. If you have multiple processes all trying to get exclusive access to a resource, you can use a file to mediate that. Each process will try to create /tmp/data.lock and write its process id (PID) into it. The process that wins is then allowed to use the resource, removing the file when done. For added safety, processes should check to see if the current lock holder is alive. If not, they may treat the lock as stale and remove it.

But simply trying to create the file and write a PID into it is not atomic. You could check to see if the file exists, create it, and write to it. That’s the simple approach that sounds good on paper, but if multiple processes are trying to do that at once, you end up with a number of possible races. The traditional solution is for each process to create a uniquely named file such as /tmp/data.lock.$PID (though that can be improved too) and then use an atomic file operation that results in creating /tmp/data.lock. The two common choices for an atomic operation are link and rename() or link().

The rename() call is what the Unix command mv uses under the hood. It will change the file from its temporary name to the final name as long as that file doesn’t already exist. The link() call will try to create a new directory entry (a hard link) that references the same file. In either case, the underlying file (the inode) does not change, only the meta data in the file system does. And those changes are atomic.

This works well as long as you’re not using NFS. That’s a whole can of worms all its own.

Others

Those are just a few examples of letting the OS doing some of your heavy lifting in sticky situations. If you’re been working in scripting languages for a while but haven’t spent much time looking at lower-level facilities like this, it might be worth making the time do do you. You might be surprised by how much you can learn and how much better your code could be. In addition to those described so far, I suggest learning more about signals and your your language of choice implements them. And if you have the time and inclination, I highly recommend a copy of Advanced Programming in the Unix Environment (second edition) by Richard Stevens. It’s truly a classic that provides rich examples of so many useful things that Unix can handle for you.

Have you found yourself simplifying code and making it more reliable by stepping back and letting the lower-level system calls do the hard work? Tell us about it in the comments.

分享到:
评论

相关推荐

    返璞归真UNIX技术内幕书配套光盘.rar

    8. **程序设计哲学**:UNIX的设计哲学,如“做一件事并做好”(Do One Thing and Do It Well)和“一切皆文件”(Everything Is a File),这些理念至今仍影响着软件工程。 通过分析这些源代码,开发者不仅可以提升...

    操作系统学习-unix资料

    Unix是由贝尔实验室的研究人员在1960年代末开发的,其核心设计原则是“一切皆文件”(Everything is a file),这意味着所有资源,包括硬件设备,都被抽象为文件进行操作。这种设计理念简化了系统架构,使得程序设计...

    Unix Shell Programming - Third Edition

    Kochan, Patrick Wood Publisher : Sams Publishing Pub Date : February 27, 2003 ISBN : 0-672-32490-3 Pages : 456 Slots : 1 <br/>Unix Shell Programming is a tutorial ...

    超酷的UNIX IPC 解析与代码例子 - Beej Unix IPC Guide!

    - **Everything you know is wrong**:纠正了一些关于信号处理的常见误解。 - **Some signals to make you popular**:列出了几种常见的信号及其用途,例如SIGINT、SIGTERM等。 #### 五、管道 - **“These pipes ...

    Article_Src.zip_.src.zi_DNS Csharp_Everything and More_dns resol

    As the best way to do this, I have built a Dig class which acts like the good-old-Unix-style dig. Its acts like dig, but it is not a complete dig implementation. It does, however, do everything you ...

    Classic Shell Scripting

    Shell scripting is essential for Unix users and system administrators-a way to quickly harness and customize the full power of any Unix system. With shell scripts, you can combine the fundamental ...

    Linux System Programming, 2nd edition

    The majority of both Unix and Linux code is still written at the system level, and Linux System Programming focuses on everything above the kernel, where applications such as Apache, bash, cp, vim, ...

    Kerberos权威指南 Kerberos The Definitive Guide

    In addition to covering the basic principles behind cryptographic authentication, it covers everything from basic installation to advanced topics like cross-realm authentication, defending against ...

    搜索本地文件及目录

    在计算机操作系统中,"一切皆文件"(Everything is a file)是Unix哲学的一个核心概念,意味着所有的资源,包括硬件设备、系统调用、进程等,都可以通过文件系统接口进行访问和操作。这一理念使得文件系统成为了操作...

    jbmail - 好用的郵伺服器測試工具

    Although there is no direct Linux/UNIX support yet, JBMail runs nearly perfectly under Wine v20041201 (even with SSL/TLS connections). Stay informed about JBMail developments by subscribing to our ...

    MS在linux的安装过程.doc

    autorun.inf ISScript8.Msi SplashBitmap.bmp Container.ico Legal UNIX Data1.cab License Pack unix.zip [msilm16 tmp]$ cd UNIX/ [msilm16 UNIX]$ ls Install IRIX License_Pack Linux_IA64 Tru64 installer ...

    英文原版-Linux System Administration 1st Edition

    Linux System Administration is ideal as an introduction to Linux for Unix veterans, MCSEs, and mainframe administrators, and as an advanced (and refresher) guide for existing Linux administrators who...

    Security Warrior

    Covering everything from reverse engineering to SQL attacks, and including topics like social engineering, antiforensics, and common attacks against UNIX and Windows systems, this book teaches you to...

    Emacs for Linux

    Emacs (Editor Macros) is an advanced text editor developed by the Free Software Foundation and widely used in Unix-like systems, including Linux. It is renowned for its extensibility and ...

    stanford parser

    About A natural language parser is a program ... Simple scripts are included to invoke the parser on a Unix or Windows system. For another system, you merely need to similarly configure the classpath.

    Utilities for Windows NT 源码

    Users who come from other operating systems like Unix or VMS and think it is unnecessary to do everything with a GUI interface. Users who want to automate tasks like password changing in a batch file ...

Global site tag (gtag.js) - Google Analytics