- 浏览: 83705 次
- 性别:
- 来自: 成都
文章分类
最新评论
-
pdsljlj:
请问,public url 都可以通过reset post修改 ...
golang开源分布式文件系统weed-fs -
jimmygan:
月影无痕 写道网站所有者也是webadmin吗? 你这样做很不 ...
centos环境下nginx+php搭建 -
月影无痕:
网站所有者也是webadmin吗? 你这样做很不安全?详细原因 ...
centos环境下nginx+php搭建
weedfs,http://code.google.com/p/ weed-fs /
go语言,代码很少,3个可执行文件,很强悍
[root@ghost-rider weedfs]# ls
weedclient weedmaster weedvolume
部署测试,
服务器,192.168.2.100
1.启动master服务:
[root@ghost-rider weedfs]# ./weedmaster
2012/07/25 15:10:15 Volume Size Limit is 32768 MB
2012/07/25 15:10:15 Setting file id sequence 10000
2012/07/25 15:10:15 Start directory service at http://127.0.0.1:9333
2012/07/25 15:13:09 Saving file id sequence 20000 to /tmp/directory.seq
2.启动磁盘卷服务,挂载本地的/tmp目录
weedvolume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" &
[root@ghost-rider weedfs]# 2012/07/25 15:11:03 Store started on dir: /tmp with 5 volumes
2012/07/25 15:11:03 store joined at localhost:9333
2012/07/25 15:11:03 Start storage service at http://127.0.0.1:8080 public url localhost:8080
客户端:
1.第一步,获取一个自动分配的id,唯一的文件标示
A:\>curl http://192.168.2.100:9333/dir/assign
{"count":"1","fid":"3,2711f0c5341e","publicUrl":"localhost:8080","url":"127.0.0.
1:8080"}
A:\>ls
12.png 2012-07-25_114343.png despath output_1.jpg temp
2012-07-25_103150.png 2012-07-25_121223.png logo.jpg srcpath
2.将当前目录的12.png上传到服务器
A:\>curl -F file=@12.png http://192.168.2.100:8080/3,2711f0c5341e
{"size":1049185}
3.使用浏览器就可以直接访问刚刚上传的文件了
http://192.168.2.100:8080/3,2711f0c5341e
挂载其他目录,(-volumes=5-7参数用来设置该卷所拥有的卷id范围)
[root@ghost-rider weedfs]# mkdir /var/weedfs
[root@ghost-rider weedfs]# ./weedvolume -dir="/var/weedfs" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &
[3] 31467
[root@ghost-rider weedfs]# 2012/07/25 15:23:35 Store started on dir: /var/weedfs with 3 volumes
2012/07/25 15:23:35 store joined at localhost:9333
2012/07/25 15:23:35 Start storage service at http://127.0.0.1:8081 public url localhost:8081
查看目录下面都有些什么文件
[root@ghost-rider weedfs]# cd /var/weedfs/
[root@ghost-rider weedfs]# ls
5.dat 5.idx 6.dat 6.idx 7.dat 7.idx
如果将访问的端口改一下,http://192.168.2.100:8081/3,2711f0c5341e
服务端直接报异常,这块处理看来还不是很完善,另外是否存在单点故障呢?复制策略呢?
2012/07/25 15:26:49 http: panic serving 192.168.2.151:10935: runtime error: invalid memory address or nil pointer dereference
/home/chris/apps/go/src/pkg/net/http/server.go:576 (0x44e357)
/home/chris/apps/go/src/pkg/runtime/proc.c:1443 (0x411327)
/home/chris/apps/go/src/pkg/runtime/runtime.c:128 (0x411df3)
/home/chris/apps/go/src/pkg/runtime/thread_linux.c:209 (0x414ce6)
/home/chris/apps/go/src/pkg/sync/atomic/asm_amd64.s:12 (0x4e0e6c)
/home/chris/apps/go/src/pkg/sync/mutex.go:40 (0x48b2d2)
/home/chris/dev/workspace/home/weed-fs/src/pkg/storage/volume.go:87 (0x453cb1)
/home/chris/dev/workspace/home/weed-fs/src/pkg/storage/store.go:101 (0x453386)
/home/chris/dev/workspace/home/weed-fs/src/cmd/weedvolume/weedvolume.go:56 (0x4010f9)
/home/chris/dev/workspace/home/weed-fs/src/cmd/weedvolume/weedvolume.go:39 (0x400e3e)
/home/chris/apps/go/src/pkg/net/http/server.go:690 (0x442303)
/home/chris/apps/go/src/pkg/net/http/server.go:926 (0x443185)
/home/chris/apps/go/src/pkg/net/http/server.go:656 (0x442116)
/home/chris/apps/go/src/pkg/runtime/proc.c:271 (0x40f42d)
原来是参数不对,8081端口对应的应该是卷5-7,而我传的是3,这块异常应该更加明确一点才好
图片的id分为3部分,第一个数字为卷id,非负32位整型,第二个是文件的id,非负64位整型,,第三部分是cookie,长度是32位非负整型,随机生成防止猜测,
所以id的唯一长度为:8+1+16+8=33
由于服务器可能会变动,所以实际存储文件的地址会变,通过接口可以得到最新的卷地址
[root@ghost-rider weedfs]# curl http://localhost:9333/dir/lookup?volumeId=3
{"Url":"127.0.0.1:8080","PublicUrl":"localhost:8080"}
除了通过接口来获取一个唯一id之外,还可以自己指定,id最好不要能够被猜解,
A:\>curl -F file=@12.png http://192.168.2.100:8080/3,123
{"size":1049185}
虽然可以正常访问到文件,但是服务端会有错误,
2012/07/25 15:54:32 Invalid fid 123 length 3
自己生成的id最好满足 规则
A:\>curl -F file=@12.png http://192.168.2.100:8080/3,123412345678
{"size":1049185}
另外,还可以随意增加一个后缀名,方便访问
http://192.168.2.100:8080/3,123412345678.png
ok,测试告一段路,简单介绍下weed-fs
weedfs架构介绍(翻译)
常见的分布式文件系统都会将每个文件拆分放到多个chunk块里面去,然后由一个中央服务器来保存这些文件名和块索引之间的映射关系以及这些块所在服务器等元数据信息。
因此这些中心的master服务器无法有效的处理大量小文件的情况,因为所有的请求都需要经过chunk master服务器,在大并发请求的情况下,响应速度势必要下降。
Weed-FS的master server选择管理数据卷(data volumes)而不是数据块,每个数据卷大小是32GB,能够保存大量的文件( 小文件 ),每个存储节点能够拥有很多个数据卷,master节点只需要保存这些卷的元数据就可以了,并且这些数据量很少,并且大部分情况下是很少会变化的。
实际的文件的元数据是保存在每个卷服务器的每个卷里面的,所以每个卷服务只管理自己文件的元数据,并且每个文件的元数据只有16个字节,所有文件的元数据都可以放在内存中,所以实际上每次文件的访问请求都只会执行一次磁盘操作
作为比较,你可以想想linux的xfs文件系统的inode t结构需要占用536个字节。
Master Server和Volume Server
架构简直是巨简单,实际的数据在存储节点的卷服务上,一个卷服务器可以包含多个卷,并且读写同时支持基本的权限验证(basic auth)
所有的卷都由master服务器来管理,master服务器包含了卷id和卷服务器的mapping,这些信息基本不变,可以很好的缓存起来。
每次写文件请求,master服务器会生成一个key,是一个增长的64位的无符号的整数,因为一般来说写请求没有读请求繁忙,一个master服务能够很好的支持大量并发。
读写文件(Write and Read files)
当一个客户端发送一个写文件请求,master返回这样格式的id,
然后client通过REST的方式自己去联系卷节点来上传文件。
当一个客户端使用标识: 来读取文件的时候,它可以通过联系master节点,通过参数 和 ,或者是来自缓存,来获取到实际的文件地址,或者直接返回文件内容数据给客户端。
存储大小(Storage Size)
当前代码实现的是,每个卷大小是 8x232=32G 字节,这是因为weefs是按照8个字节来进行对齐,通过修改2句代码,耗费一些padding空间,我们可以很轻松的扩展到64G或者128G或者更大,最大支持2 32 个卷,所以理论总容量可达:8 x 2 32 x 2 32 = 8 x 4G x 4G = 128GG bytes,每个独立文件的大小不能超过卷的大小。
节省内存(saving memory)
所以卷服务器上的文件元数据信息都来自内存而不不需要磁盘访问,每个文件占用16个字节的map对象(?<64bit key, 32bit offset, 32bit size>),当然这个你不用担心,磁盘肯定要比内存先用完。
与其它文件系统比较
HDFS :
hdfs使用块,适合大文件,weedfs是小文件的理想存储,速度快,支持高并发
MogileFS :
WeedFS 只有2个组件: 目录服务器(directory server),存储节点( storage nodes).
MogileFS 有3个组件: 跟踪(tracers),数据库( database),存储节点( storage nodes).
分层越多,访问越慢,操作更加复杂,故障率越高。
GlusterFS:
weedfs是非POSIX接口兼容的,只是简单的实现,GlusterFS是POSIX兼容的,更复杂一些。
Mongo's GridFS将文件拆分成块,通过中心的mongodb来存储元数据信息,每次读写都需要请求元数据,并发上不去,什么都扯淡。
TODO:
weed-fs将提供fail-over的masterserver节点(累死hadoop的second namenode)
Weed-FS 将支持数据的多份拷贝,目前只有一份,根据需求,多份甚至是根据数据来调整以及优化等
总之,很小很精悍。
附带介绍原文,翻译不到位,自行对照,:)
Weed-FS is a simple and highly scalable distributed file system. There are two objectives:
to store billions of files!
to serve the files fast!
Instead of supporting full POSIX file system semantics, Weed-FS choose to implement only a key~file mapping. Similar to the word "NoSQL", you can call it as "NoFS".
Instead of managing all file metadata in a central master, Weed-FS choose to manages file volumes in the central master, and let volume servers manage files and the metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers' memories, allowing faster file access with just one disk read operation!
Weed-FS models after? Facebook's Haystack design paper .
By default, the master node runs on port 9333, and the volume nodes runs on port 8080. Here I will start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. Here I just use localhost as example.
Weed-FS uses HTTP REST operations to write, read, delete. The return results are JSON or JSONP format.
Start Master Server
> ./weedmaster
Start Volume Servers
> weedvolume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" &
> weedvolume -dir="/tmp/data2" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &
Here is a simple usage on how to save a file:
> curl http://localhost:9333/dir/assign
{"fid":"3,01637037d6","url":"127.0.0.1:8080","publicUrl":"localhost:8080"}
First, send a HTTP request to get an fid and a volume server url.
> curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6
{"size": 43234}
Second, send a HTTP multipart POST request to the volume server url+'/'+fid, to really store the file content.
Now you can save the fid, 3,01637037d6 in this case, to some database field.
The number 3 here, is a volume id. After the comma, it's one file key, 01, and a file cookie, 637037d6.
The volume id is an unsigned 32 bit integer. The file key is an unsigned 64bit integer. The file cookie is an unsigned 32bit integer, used to prevent URL guessing.
The file key and file cookie are both coded in hex. You can store the tuple in your own format, or simply store the fid as string, in theory, you would need 8+1+16+8=33 bytes. A char(33) would be enough, if not more than enough, since most usage would not need 2^32 volumes.
Here is the example on how to render the URL.
> curl http://localhost:9333/dir/lookup?volumeId=3
{"Url":"127.0.0.1:8080","PublicUrl":"localhost:8080"}
First lookup the volume server's URLs by the file's volumeId. However, since usually there are not too many volume servers, and volumes does not move often, you can cache the results most of the time.
Now you can take the public url, render the url or directly read from the volume server via url:
?http://localhost:8080/3,01637037d6.jpg
Notice we add an file extension ".jpg" here. It's optional and just one way for the client to specify the file content type.
Usually distributed file system split each file into chunks, and a central master keeps a mapping of a filename and a chunk index to chunk handles, and also which chunks each chunk server has.
This has the draw back that the central master can not handle many small files efficiently, and since all read requests need to go through the chunk master, responses would be slow for many concurrent web users.
Instead of managing chunks, Weed-FS choose to manage data volumes in the master server. Each data volume is size 32GB, and can hold a lot of files. And each storage node can has many data volumes. So the master node only needs to store the metadata about the volumes, which is fairly small amount of data and pretty stale most of the time.
The actual file metadata is stored in each volume on volume servers. Since each volume server only manage metadata of files on its own disk, and only 16 bytes for each file, all file access can read file metadata just from memory and only needs one disk operation to actually read file data.
For comparison, consider that an xfs inode t structure in Linux is 536 bytes.
Master Server and Volume Server
The architecture is fairly simple. The actual data is stored in volumes on storage nodes. One volume server can have multiple volumes, and can both support read and write access with basic authentication.
All volumes are managed by a master server. The master server contains volume id to volume server mapping. This is fairly static information, and could be cached easily.
On each write request, the master server also generates a file key, which is a growing 64bit unsigned integer. Since the write requests are not as busy as read requests, one master server should be able to handle the concurrency well.
Write and Read files
When a client sends a write request, the master server returns for the file. The client then contact the volume node and POST the file content via REST.
When a client needs to read a file based on , it can ask the master server by the for the , or from cache. Then the client can HTTP GET the content via REST, or just render the URL on web pages and let browsers to fetch the content.
Please see the example for details on write-read process.
In current implementation, each volume can be size of 8x2 32 =32G bytes. This is because of aligning contents to 8 bytes. We can be easily increased to 64G, or 128G, or more, by changing 2 lines of code, at the cost of some wasted padding space due to alignment.
There can be 2 32 volumes. So total system size is 8 x 2 32 x 2 32 = 8 x 4G x 4G = 128GG bytes. (Sorry, I don't know the word for giga of giga bytes.)
Each individual file size is limited to the volume size.
All file meta information on volume server is readable from memory without disk access. Each file just takes an 16-byte map entry of<64bit key, 32bit offset, 32bit size>. Of course, each map entry has its own the space cost for the map. But usually the disk runs out before the memory does.
Compared to Other File Systems
Frankly, I don't use other distributed file systems too often. All seems more complicated than necessary. Please correct me if anything here is wrong.
HDFS uses the chunk approach for each file, and is ideal for streaming large files.
WeedFS is ideal for serving relatively smaller files quickly and concurrently.
Compared to MogileFS
WeedFS has 2 components: directory server, storage nodes.
MogileFS has 3 components: tracers, database, storage nodes.
One more layer means slower access, more operation complexity, more failure possibility.
Compared to GlusterFS
WeedFS is not POSIX compliant, and has simple implementation.
GlusterFS is POSIX compliant, much more complex.
Compared to Mongo's GridFS
Mongo's GridFS splits files into chunks and manage chunks in the central mongodb. For every read or write request, the database needs to query the metadata. It's OK if this is not a bottleneck yet, but for a lot of concurrent reads this unnecessary query could slow things down.
On the contrary, Weed-FS uses large file volume of 32G size to store lots of files, and only manages file volumes in the master server. Each volume manages file metadata themselves. So all the file metadata is spread onto the volume nodes memories, and just one disk read is needed.
Weed-FS will support fail-over master server.
Weed-FS will support multiple copies of the data. Right now, data has just one copy. Depending on demands, multiple copy, and even data-center awareness and optimization will be implemented.
Weed-FS may add more optimization for pictures. For example, automatically resizing pictures when storing them.
WeedFS does not plan to add namespaces.
To use WeedFS, the namespace is supposed to be managed by the clients. Many use cases, like a user's avatar picture, do not really need namespaces. Actually, it takes some effort to create and maintain the file path in order to avoid too many files under a directory.
Advanced users can actually create the namespace layer on top of the Key-file store, just like how the common file system creates the namespace on top of inode for each file.
./weedmaster
./weedvolume -dir="/var/weedfs1" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" &
./weedvolume -dir="/var/weedfs2" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &
go语言,代码很少,3个可执行文件,很强悍
[root@ghost-rider weedfs]# ls
weedclient weedmaster weedvolume
部署测试,
服务器,192.168.2.100
1.启动master服务:
[root@ghost-rider weedfs]# ./weedmaster
2012/07/25 15:10:15 Volume Size Limit is 32768 MB
2012/07/25 15:10:15 Setting file id sequence 10000
2012/07/25 15:10:15 Start directory service at http://127.0.0.1:9333
2012/07/25 15:13:09 Saving file id sequence 20000 to /tmp/directory.seq
2.启动磁盘卷服务,挂载本地的/tmp目录
weedvolume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" &
[root@ghost-rider weedfs]# 2012/07/25 15:11:03 Store started on dir: /tmp with 5 volumes
2012/07/25 15:11:03 store joined at localhost:9333
2012/07/25 15:11:03 Start storage service at http://127.0.0.1:8080 public url localhost:8080
客户端:
1.第一步,获取一个自动分配的id,唯一的文件标示
A:\>curl http://192.168.2.100:9333/dir/assign
{"count":"1","fid":"3,2711f0c5341e","publicUrl":"localhost:8080","url":"127.0.0.
1:8080"}
A:\>ls
12.png 2012-07-25_114343.png despath output_1.jpg temp
2012-07-25_103150.png 2012-07-25_121223.png logo.jpg srcpath
2.将当前目录的12.png上传到服务器
A:\>curl -F file=@12.png http://192.168.2.100:8080/3,2711f0c5341e
{"size":1049185}
3.使用浏览器就可以直接访问刚刚上传的文件了
http://192.168.2.100:8080/3,2711f0c5341e
挂载其他目录,(-volumes=5-7参数用来设置该卷所拥有的卷id范围)
[root@ghost-rider weedfs]# mkdir /var/weedfs
[root@ghost-rider weedfs]# ./weedvolume -dir="/var/weedfs" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &
[3] 31467
[root@ghost-rider weedfs]# 2012/07/25 15:23:35 Store started on dir: /var/weedfs with 3 volumes
2012/07/25 15:23:35 store joined at localhost:9333
2012/07/25 15:23:35 Start storage service at http://127.0.0.1:8081 public url localhost:8081
查看目录下面都有些什么文件
[root@ghost-rider weedfs]# cd /var/weedfs/
[root@ghost-rider weedfs]# ls
5.dat 5.idx 6.dat 6.idx 7.dat 7.idx
如果将访问的端口改一下,http://192.168.2.100:8081/3,2711f0c5341e
服务端直接报异常,这块处理看来还不是很完善,另外是否存在单点故障呢?复制策略呢?
2012/07/25 15:26:49 http: panic serving 192.168.2.151:10935: runtime error: invalid memory address or nil pointer dereference
/home/chris/apps/go/src/pkg/net/http/server.go:576 (0x44e357)
/home/chris/apps/go/src/pkg/runtime/proc.c:1443 (0x411327)
/home/chris/apps/go/src/pkg/runtime/runtime.c:128 (0x411df3)
/home/chris/apps/go/src/pkg/runtime/thread_linux.c:209 (0x414ce6)
/home/chris/apps/go/src/pkg/sync/atomic/asm_amd64.s:12 (0x4e0e6c)
/home/chris/apps/go/src/pkg/sync/mutex.go:40 (0x48b2d2)
/home/chris/dev/workspace/home/weed-fs/src/pkg/storage/volume.go:87 (0x453cb1)
/home/chris/dev/workspace/home/weed-fs/src/pkg/storage/store.go:101 (0x453386)
/home/chris/dev/workspace/home/weed-fs/src/cmd/weedvolume/weedvolume.go:56 (0x4010f9)
/home/chris/dev/workspace/home/weed-fs/src/cmd/weedvolume/weedvolume.go:39 (0x400e3e)
/home/chris/apps/go/src/pkg/net/http/server.go:690 (0x442303)
/home/chris/apps/go/src/pkg/net/http/server.go:926 (0x443185)
/home/chris/apps/go/src/pkg/net/http/server.go:656 (0x442116)
/home/chris/apps/go/src/pkg/runtime/proc.c:271 (0x40f42d)
原来是参数不对,8081端口对应的应该是卷5-7,而我传的是3,这块异常应该更加明确一点才好
图片的id分为3部分,第一个数字为卷id,非负32位整型,第二个是文件的id,非负64位整型,,第三部分是cookie,长度是32位非负整型,随机生成防止猜测,
所以id的唯一长度为:8+1+16+8=33
由于服务器可能会变动,所以实际存储文件的地址会变,通过接口可以得到最新的卷地址
[root@ghost-rider weedfs]# curl http://localhost:9333/dir/lookup?volumeId=3
{"Url":"127.0.0.1:8080","PublicUrl":"localhost:8080"}
除了通过接口来获取一个唯一id之外,还可以自己指定,id最好不要能够被猜解,
A:\>curl -F file=@12.png http://192.168.2.100:8080/3,123
{"size":1049185}
虽然可以正常访问到文件,但是服务端会有错误,
2012/07/25 15:54:32 Invalid fid 123 length 3
自己生成的id最好满足 规则
A:\>curl -F file=@12.png http://192.168.2.100:8080/3,123412345678
{"size":1049185}
另外,还可以随意增加一个后缀名,方便访问
http://192.168.2.100:8080/3,123412345678.png
ok,测试告一段路,简单介绍下weed-fs
weedfs架构介绍(翻译)
常见的分布式文件系统都会将每个文件拆分放到多个chunk块里面去,然后由一个中央服务器来保存这些文件名和块索引之间的映射关系以及这些块所在服务器等元数据信息。
因此这些中心的master服务器无法有效的处理大量小文件的情况,因为所有的请求都需要经过chunk master服务器,在大并发请求的情况下,响应速度势必要下降。
Weed-FS的master server选择管理数据卷(data volumes)而不是数据块,每个数据卷大小是32GB,能够保存大量的文件( 小文件 ),每个存储节点能够拥有很多个数据卷,master节点只需要保存这些卷的元数据就可以了,并且这些数据量很少,并且大部分情况下是很少会变化的。
实际的文件的元数据是保存在每个卷服务器的每个卷里面的,所以每个卷服务只管理自己文件的元数据,并且每个文件的元数据只有16个字节,所有文件的元数据都可以放在内存中,所以实际上每次文件的访问请求都只会执行一次磁盘操作
作为比较,你可以想想linux的xfs文件系统的inode t结构需要占用536个字节。
Master Server和Volume Server
架构简直是巨简单,实际的数据在存储节点的卷服务上,一个卷服务器可以包含多个卷,并且读写同时支持基本的权限验证(basic auth)
所有的卷都由master服务器来管理,master服务器包含了卷id和卷服务器的mapping,这些信息基本不变,可以很好的缓存起来。
每次写文件请求,master服务器会生成一个key,是一个增长的64位的无符号的整数,因为一般来说写请求没有读请求繁忙,一个master服务能够很好的支持大量并发。
读写文件(Write and Read files)
当一个客户端发送一个写文件请求,master返回这样格式的id,
然后client通过REST的方式自己去联系卷节点来上传文件。
当一个客户端使用标识: 来读取文件的时候,它可以通过联系master节点,通过参数 和 ,或者是来自缓存,来获取到实际的文件地址,或者直接返回文件内容数据给客户端。
存储大小(Storage Size)
当前代码实现的是,每个卷大小是 8x232=32G 字节,这是因为weefs是按照8个字节来进行对齐,通过修改2句代码,耗费一些padding空间,我们可以很轻松的扩展到64G或者128G或者更大,最大支持2 32 个卷,所以理论总容量可达:8 x 2 32 x 2 32 = 8 x 4G x 4G = 128GG bytes,每个独立文件的大小不能超过卷的大小。
节省内存(saving memory)
所以卷服务器上的文件元数据信息都来自内存而不不需要磁盘访问,每个文件占用16个字节的map对象(?<64bit key, 32bit offset, 32bit size>),当然这个你不用担心,磁盘肯定要比内存先用完。
与其它文件系统比较
HDFS :
hdfs使用块,适合大文件,weedfs是小文件的理想存储,速度快,支持高并发
MogileFS :
WeedFS 只有2个组件: 目录服务器(directory server),存储节点( storage nodes).
MogileFS 有3个组件: 跟踪(tracers),数据库( database),存储节点( storage nodes).
分层越多,访问越慢,操作更加复杂,故障率越高。
GlusterFS:
weedfs是非POSIX接口兼容的,只是简单的实现,GlusterFS是POSIX兼容的,更复杂一些。
Mongo's GridFS将文件拆分成块,通过中心的mongodb来存储元数据信息,每次读写都需要请求元数据,并发上不去,什么都扯淡。
TODO:
weed-fs将提供fail-over的masterserver节点(累死hadoop的second namenode)
Weed-FS 将支持数据的多份拷贝,目前只有一份,根据需求,多份甚至是根据数据来调整以及优化等
总之,很小很精悍。
附带介绍原文,翻译不到位,自行对照,:)
Weed-FS is a simple and highly scalable distributed file system. There are two objectives:
to store billions of files!
to serve the files fast!
Instead of supporting full POSIX file system semantics, Weed-FS choose to implement only a key~file mapping. Similar to the word "NoSQL", you can call it as "NoFS".
Instead of managing all file metadata in a central master, Weed-FS choose to manages file volumes in the central master, and let volume servers manage files and the metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers' memories, allowing faster file access with just one disk read operation!
Weed-FS models after? Facebook's Haystack design paper .
By default, the master node runs on port 9333, and the volume nodes runs on port 8080. Here I will start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. Here I just use localhost as example.
Weed-FS uses HTTP REST operations to write, read, delete. The return results are JSON or JSONP format.
Start Master Server
> ./weedmaster
Start Volume Servers
> weedvolume -dir="/tmp" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" &
> weedvolume -dir="/tmp/data2" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &
Here is a simple usage on how to save a file:
> curl http://localhost:9333/dir/assign
{"fid":"3,01637037d6","url":"127.0.0.1:8080","publicUrl":"localhost:8080"}
First, send a HTTP request to get an fid and a volume server url.
> curl -F file=@/home/chris/myphoto.jpg http://127.0.0.1:8080/3,01637037d6
{"size": 43234}
Second, send a HTTP multipart POST request to the volume server url+'/'+fid, to really store the file content.
Now you can save the fid, 3,01637037d6 in this case, to some database field.
The number 3 here, is a volume id. After the comma, it's one file key, 01, and a file cookie, 637037d6.
The volume id is an unsigned 32 bit integer. The file key is an unsigned 64bit integer. The file cookie is an unsigned 32bit integer, used to prevent URL guessing.
The file key and file cookie are both coded in hex. You can store the tuple in your own format, or simply store the fid as string, in theory, you would need 8+1+16+8=33 bytes. A char(33) would be enough, if not more than enough, since most usage would not need 2^32 volumes.
Here is the example on how to render the URL.
> curl http://localhost:9333/dir/lookup?volumeId=3
{"Url":"127.0.0.1:8080","PublicUrl":"localhost:8080"}
First lookup the volume server's URLs by the file's volumeId. However, since usually there are not too many volume servers, and volumes does not move often, you can cache the results most of the time.
Now you can take the public url, render the url or directly read from the volume server via url:
?http://localhost:8080/3,01637037d6.jpg
Notice we add an file extension ".jpg" here. It's optional and just one way for the client to specify the file content type.
Usually distributed file system split each file into chunks, and a central master keeps a mapping of a filename and a chunk index to chunk handles, and also which chunks each chunk server has.
This has the draw back that the central master can not handle many small files efficiently, and since all read requests need to go through the chunk master, responses would be slow for many concurrent web users.
Instead of managing chunks, Weed-FS choose to manage data volumes in the master server. Each data volume is size 32GB, and can hold a lot of files. And each storage node can has many data volumes. So the master node only needs to store the metadata about the volumes, which is fairly small amount of data and pretty stale most of the time.
The actual file metadata is stored in each volume on volume servers. Since each volume server only manage metadata of files on its own disk, and only 16 bytes for each file, all file access can read file metadata just from memory and only needs one disk operation to actually read file data.
For comparison, consider that an xfs inode t structure in Linux is 536 bytes.
Master Server and Volume Server
The architecture is fairly simple. The actual data is stored in volumes on storage nodes. One volume server can have multiple volumes, and can both support read and write access with basic authentication.
All volumes are managed by a master server. The master server contains volume id to volume server mapping. This is fairly static information, and could be cached easily.
On each write request, the master server also generates a file key, which is a growing 64bit unsigned integer. Since the write requests are not as busy as read requests, one master server should be able to handle the concurrency well.
Write and Read files
When a client sends a write request, the master server returns for the file. The client then contact the volume node and POST the file content via REST.
When a client needs to read a file based on , it can ask the master server by the for the , or from cache. Then the client can HTTP GET the content via REST, or just render the URL on web pages and let browsers to fetch the content.
Please see the example for details on write-read process.
In current implementation, each volume can be size of 8x2 32 =32G bytes. This is because of aligning contents to 8 bytes. We can be easily increased to 64G, or 128G, or more, by changing 2 lines of code, at the cost of some wasted padding space due to alignment.
There can be 2 32 volumes. So total system size is 8 x 2 32 x 2 32 = 8 x 4G x 4G = 128GG bytes. (Sorry, I don't know the word for giga of giga bytes.)
Each individual file size is limited to the volume size.
All file meta information on volume server is readable from memory without disk access. Each file just takes an 16-byte map entry of<64bit key, 32bit offset, 32bit size>. Of course, each map entry has its own the space cost for the map. But usually the disk runs out before the memory does.
Compared to Other File Systems
Frankly, I don't use other distributed file systems too often. All seems more complicated than necessary. Please correct me if anything here is wrong.
HDFS uses the chunk approach for each file, and is ideal for streaming large files.
WeedFS is ideal for serving relatively smaller files quickly and concurrently.
Compared to MogileFS
WeedFS has 2 components: directory server, storage nodes.
MogileFS has 3 components: tracers, database, storage nodes.
One more layer means slower access, more operation complexity, more failure possibility.
Compared to GlusterFS
WeedFS is not POSIX compliant, and has simple implementation.
GlusterFS is POSIX compliant, much more complex.
Compared to Mongo's GridFS
Mongo's GridFS splits files into chunks and manage chunks in the central mongodb. For every read or write request, the database needs to query the metadata. It's OK if this is not a bottleneck yet, but for a lot of concurrent reads this unnecessary query could slow things down.
On the contrary, Weed-FS uses large file volume of 32G size to store lots of files, and only manages file volumes in the master server. Each volume manages file metadata themselves. So all the file metadata is spread onto the volume nodes memories, and just one disk read is needed.
Weed-FS will support fail-over master server.
Weed-FS will support multiple copies of the data. Right now, data has just one copy. Depending on demands, multiple copy, and even data-center awareness and optimization will be implemented.
Weed-FS may add more optimization for pictures. For example, automatically resizing pictures when storing them.
WeedFS does not plan to add namespaces.
To use WeedFS, the namespace is supposed to be managed by the clients. Many use cases, like a user's avatar picture, do not really need namespaces. Actually, it takes some effort to create and maintain the file path in order to avoid too many files under a directory.
Advanced users can actually create the namespace layer on top of the Key-file store, just like how the common file system creates the namespace on top of inode for each file.
./weedmaster
./weedvolume -dir="/var/weedfs1" -volumes=0-4 -mserver="localhost:9333" -port=8080 -publicUrl="localhost:8080" &
./weedvolume -dir="/var/weedfs2" -volumes=5-7 -mserver="localhost:9333" -port=8081 -publicUrl="localhost:8081" &
相关推荐
《Varconf配置中心后端代码解析——基于Golang的分布式统一配置系统》 在现代软件开发中,配置管理是一项至关重要的任务。一个优秀的配置中心能够有效地解决分布式系统中配置的集中管理和动态更新问题,提高系统的...
Jaeger是这样一款开源的分布式追踪系统,由Uber开发并维护,它支持多种语言,包括Golang,是CNCF(Cloud Native Computing Foundation)的毕业项目。 **Jaeger与Golang** Jaeger提供了丰富的SDK,其中Golang SDK是...
在这个基于Golang的分布式系统中,我们可以看到几个关键的技术点和设计思路。 首先,Golang是系统选用的主要编程语言。Golang以其高效的性能、内置的并发支持和内存安全特性,成为了构建高性能分布式系统的理想选择...
Go-riot是一个基于Go语言构建的开源分布式搜索引擎,它的设计目标是实现简单高效,这使得它在处理大规模数据检索时具备高性能和可扩展性。在深入探讨Go-riot之前,我们先来了解一下Go语言和分布式系统的基础知识。 ...
综上所述,基于Golang的分布式百万级即时通讯系统充分利用了Go语言的特性,结合优秀的开源库和工具,实现了高效、可靠的即时通讯服务。这个项目不仅展示了Go语言在构建大规模系统中的优势,也为其他开发者提供了学习...
DTM分布式事务Golang使用案例-go-zero业务代码
在IT行业中,Go语言(Golang)因其高效、简洁的语法和强大的并发能力,成为了构建分布式系统的首选语言之一。特别是对于构建多业务线的定时任务系统,Go的特性更是能够发挥出色的效果。本文将深入探讨如何使用Go语言...
Golang 小项目(3)- 图书管理系统
《SisNoise:基于Golang的分布式文件系统详解》 SisNoise,一个采用Golang语言构建的分布式文件系统,其设计目标是提供高效、稳定且可扩展的存储解决方案。在深入探讨SisNoise之前,我们需要先理解分布式文件系统的...
使用Golang实现文本汇总和排名 - Textrank
在Linux ARM64系统上安装golang-linux-arm64 SDK,通常通过下载官方发布的预编译二进制包进行。安装后,设置好`GOPATH`和`GOROOT`环境变量,确保Go工具链可以正确识别和操作项目文件。 四、编程实践 1. 交叉编译:...
结合Redis,一个开源的、基于网络的、键值存储系统,可以构建分布式环境下的令牌桶算法,确保算法的可扩展性和高可用性。 **描述分析:** "令牌桶算法对速率限制和网络拥塞控制非常有用" 速率限制是网络管理的关键...
2. **高效存储**:为了处理海量日志,BeanWatch可能采用时间序列数据库(如InfluxDB)或分布式文件系统(如HDFS)来存储日志,这些系统都设计有高效的查询和存储策略。 3. **查询与过滤**:BeanWatch提供了强大的...
【标题】:“基于Golang的分布式爬虫管理平台”是一个高度集成的系统,它利用了Golang的强大性能和网络编程能力来构建一个高效且可扩展的解决方案。在分布式环境中,爬虫管理平台允许用户通过单一界面管理和调度多...
开源项目-johnnadratowski-golang-neo4j-bolt-driver.zip是一个专注于Go语言的开源项目,由johnnadratowski开发,旨在提供对Neo4J数据库的Bolt协议支持。这个驱动程序使得Go开发者能够高效、稳定地与Neo4J图形数据库...
golang-html5-sse-example, HTML5服务器发送的事件 Golang HTML5 SSE示例这是一个简单的例子,说明如何使用发送事件发送事件。 从服务器的角度来看,SSE几乎与长轮询相同。 客户端发出一个建立TCP连接的获取请求。 ...
使用Golang发送和接收ARP协议-ARP协议分析
Redis,一个开源的、基于键值对的数据存储系统,被广泛应用于缓存、消息队列、数据持久化等多个场景。Golang,由于其简洁的语法和高效的性能,也成为了现代Web开发的热门选择。将两者结合,我们可以构建高效且可扩展...