dikar

浏览: 2116276 次
性别:
来自: 杭州

最近访客更多访客>>

junoy

sky3063

hzw1199

sagadan

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

fsync() Across Platforms

博客分类：

LINUX

使用mongoDB时，默认的数据cache flush就是使用的fsync，这里找了一篇好文章故分享下，

转自：http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html

fsync() Across Platforms

When an application writes a file, the data does not become permanent immediately. The write operation first moves the data into the operating system cache in RAM, where it is vulnerable to system crashes and loss of power. The second step is the transfer to the hard disk, which normally has write caching enabled. The disk acknowledges the data straight away, but keeps it in the disk write cache which is still volatile memory. The data is now safe from system crash¹, but is not safe from loss of power. On a modern disk, this may be 16MB or more of data in unknown state.

As performance enhancements in ext4 have made committing data to disk a contentious issue, I’ve written a note on how different platforms handle data consistency.

Delayed Allocation

The root of the latest problem is an optimisation called delayed allocation. In delayed allocation, the file system does not decide where on the disk to save the file until it is necessary to transfer the data to the disk. Linux users have become accustomed to the ordered data mode of ext3, where file data is written to disk before changes to metadata². Ordered data mode only writes out data when it knows the destination on disk, so with delayed allocation the metadata will go to disk first. If an application creates a file before a system crash, the file may exist after the crash, but with zero length. This caused user complaints when implemented in XFS, and again when implemented in ext4.

ext4 has implemented a workaround for the common case of creating a new file with a temporary name, then renaming it to its final name. This produces an approximation to the ext3 behaviour by allocating blocks when a file is renamed.

The Platforms

Apple

Apple implements delayed allocation in HFS+. When the application calls fsync() on a file HFS+ allocates disk blocks for the file data and transfers that data to the disk, but fsync() does not wait for the disk to complete writing the blocks from its cache. A complete flush to disk requires the F_FULLSYNC operation of fcntl().

Reports of zero length files after crashes are rare on Mac OS X, suggesting that system applications are well behaved and the window of opportunity for corruption is short. It is advisable to call fsync() for safety here.

Microsoft

The allocation strategy of NTFS is not visible externally, but experiments suggest that it does not implement delayed allocation. Applications can open files with the FILE_FLAG_WRITE_THROUGH flag, which causes all writes to be sent directly to the disk. The FlushFileBuffers()call will ensure that data from a file is written to the disk, then flushed from the disk’s write cache and committed.

Windows Vista and Server 2008 introduce a new mechanism: Transactional NTFS. This allows applications to perform database style transactions in the filesystem, ensuring that a set of file operations either completes or fails entirely.

Linux

The ext3 file system allocates disk blocks immediately on write. When combined with the ordered data mode this ensures that application data is written consistently. Unfortunately fsync() on ext3 has developed a reputation as an expensive operation, so developers avoid it. fsync() on ext3 writes all file data to the disk, and waits for the disk to commit the data from its write cache, but only if the file metadata has changedand the file system is not mapped via LVM³.

fsync() on ext4 allocates disk blocks for the file data, then writes the data to disk and waits for the disk to commit the data from its write cache, with the same limitations as ext3.

Unfortunately Linux does not offer an equivalent of F_FULLSYNC or FlushFileBuffers() unless the hard disk write cache is disabled.

Summary

The table below shows how to achieve different levels of consistency on recent versions of each platform covered above.

	Mac OS X	Windows	Linux
Write to disk without cache flush.	fsync()	FILE_FLAG_WRITE_THROUGH	fsync()
Write to disk and wait for cache flush.	F_FULLSYNC	FlushFileBuffers()	None⁴
Transactions	None	Transactional NTFS	None

Ignoring some worst case scenarios. ↩
Metadata is data about the file, such as timestamps and permissions. On most Linux filesystems the file name is not part of the metadata, as the file may have multiple hard links. ↩
This should be fixed in kernel 2.6.29 for simple cases. ↩
If the hard disk write cache is disabled, fsync() on ext3 and ext4 will provide a complete sync to disk. ↩

分享到：

java访问svn | (转)shell环境变量以及set,env,export的区 ...

2011-12-09 14:31
浏览 1213
评论(0)
分类:操作系统
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

fsync() Across Platforms

fsync() Across Platforms

Delayed Allocation

The Platforms

Apple

Microsoft

Linux

Summary

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

fsync() Across Platforms

fsync() Across Platforms

Delayed Allocation

The Platforms

Apple

Microsoft

Linux

Summary

评论

发表评论

相关推荐

eclipse classpath太长的问题

linux 检测工具

svn: 目录中的条目从本地编码转换到 UTF8 失败

linux trace工具

linux 命令 图像

AWK & SED

SEDA

linux ulimit

收集的一些mysql相关的文章

linux 内存屏障 volatile

GDB 调试相关

Uninterruptible sleep

linux下图片转换为pdf

Linux下mms下载

bash for循环

linux 零拷贝技术

linux 安装Systemtap

linux Kprobes

Linux 可加载内核模块

linux Systemtap

最近访客更多访客>>

linux 命令图像