Advantages of Kosmix's KFS vs. HDFS

standalone

浏览: 615322 次
性别:
来自: 上海

最近访客更多访客>>

liujun.1980

rkikbs

yy629

songhait

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop
cloud

HDFS KFS

A post about KFS vs. HDFS

October 02, 2007

Advantages of Kosmix's KFS vs. HDFS

I was excited to learn last week that my friends at Kosmix have decided to open source a project long in the works: the Kosmix Distributed File System, or KFS (see the offical blog post). A number of people have commented on this release including Ethan Stock of zVents, who plans to use KFS along with their HyperTable clone of BigTable, and Rich Skrenta, who gives an excellent list of features of KFS.

Now, as a dumb product manager, my biggest questions were about KFS vs. HDFS, which is the distributed file system built by the Hadoop project. Powerset already makes extensive use of the Hadoop stack, including HDFS. So, I asked Sriram Rao, the lead engineer of KFS if he could explain to me what the different is between HDFS and KFS. Here are some of his answers, which I think give more insight into why Kosmix chose to build KFS.

So why did Kosmix build KFS instead of using HDFS? Apparently, KFS/HDFS were done in parallel. The implementation was done from 2006-2007 and now Kosmix feels it's in a releasable state. One of the reasons to stick with KFS over HDFS is that HDFS is written in Java and Kosmix's back-end is written in C++ and they were worried about the speed of the JNI interface.
File writing - HDFS writes to a file once and read many times. But, when writing to a file, you have to write from the start to the end and that is it. Conversely, in KFS you can write to a file as many times as you want and write anywhere in the file (i.e., seek and write) and append to an existing file. I've heard that Yahoo is working to fix this problem in HDFS, but it still isn't implemented.
Data integrity - Currently, with HDFS, after you write to a file, the data becomes “visible” to other apps only when the application closes the file. So, if the process were to crash before closing, the data written is lost. With KFS, the data becomes visible when it gets pushed out to the chunkservers. For performance, clients cache data; when the cache is full or when the applicatiohn choses, data gets flushed out.
Data rebalancing - KFS has rudimentary support for automatic rebalancing. When you add new nodes/there is a change in space utilization amongst nodes, the system may migrate chunks from over-utilized nodes to under-utilized nodes. HDFS doesn’t have such support now.

Hopefully I transcribed these accurately! Definitely check out the KFS project, as the more people contributing, the better. Powerset will be evaluating KFS in the coming weeks to see if it has any features that can propel us ahead of using HDFS.

KFS

分享到：

HDFS的写操作策略 | ZFS Features Summary

2009-07-21 17:12
浏览 1505
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Advantages of Kosmix's KFS vs. HDFS

A post about KFS vs. HDFS

October 02, 2007

Advantages of Kosmix's KFS vs. HDFS

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Advantages of Kosmix's KFS vs. HDFS

A post about KFS vs. HDFS

October 02, 2007

Advantages of Kosmix's KFS vs. HDFS

评论

发表评论

相关推荐

hadoop-2.2.0 build failure due to missing dependancy

HDFS中租约管理源代码分析

Question on HBase source code

Using the libjars option with Hadoop

What's Xen?

学习hadoop之基于protocol buffers的 RPC

学习hadoop之基于protocol buffers的 RPC

Hadoop RPC 一问

Hadoop Version Graph

Hadoop 2.0 代码分析---MapReduce

how to study hadoop?

首相发怒记之hadoop篇

Cloud Security?

一个HDFS Error

hadoop cluster at ebay

[转]hadoop at ebay

【读书笔记】Data warehousing and analytics infrastructure at facebook

cassandra example

想了解Thrift，留个记号

impact of total region numbers?

最近访客更多访客>>