Hadoop SequenceFile Writer And Reader

chenwq

浏览: 567260 次
性别:
来自: 济南

最近访客更多访客>>

thtf2001

u012363178

jiumoji

song0394

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop
MapReduce

package cn.edu.xmu.dm.mpdemo.ioformat;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.CompressionType;
import org.apache.hadoop.io.Text;

/**
 * desc: SequenceFileWriter
 * <code>SequenceFileWriteDemo</code>
 * 
 * @author chenwq (irwenqiang@gmail.com)
 * @version 1.0 2012/05/19
 */
public class SequenceFileWriteDemo {
	private static final String[] DATA = { "One, two, buckle my shoe",
			"Three, four, shut the door", "Five, six, pick up sticks",
			"Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

	public static void main(String[] args) throws IOException {
		String uri = args[0];
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		Path path = new Path(uri);

		IntWritable key = new IntWritable();
		Text value = new Text();
		SequenceFile.Writer writer = null;
		try {
			/**
			 * fs: outputstream
			 * conf: configuration object
			 * key: the key' type
			 * value: the value's type
			 */
			writer = SequenceFile.createWriter(fs, conf, path, key.getClass(),
					value.getClass());
//			writer = SequenceFile.createWriter(fs, conf, path, key.getClass(),
//					value.getClass(), CompressionType.BLOCK);
			for (int i = 0; i < 100; i++) {
				key.set(100 - i);
				value.set(DATA[i % DATA.length]);
				System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
						value);
				writer.append(key, value);
			}
		} finally {
			IOUtils.closeStream(writer);
		}
	}
}

package cn.edu.xmu.dm.mpdemo.ioformat;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;

/**
 * desc: SequenceFileReader
 * <code>SequenceFileReadDemo</code>
 * 
 * @author chenwq (irwenqiang@gmail.com)
 * @version 1.0 2012/05/19
 */
public class SequenceFileReadDemo {
	public static void main(String[] args) throws IOException {
		String uri = args[0];
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		Path path = new Path(uri);

		SequenceFile.Reader reader = null;
		try {
			reader = new SequenceFile.Reader(fs, path, conf);
			Writable key = (Writable) ReflectionUtils.newInstance(
					reader.getKeyClass(), conf);
			Writable value = (Writable) ReflectionUtils.newInstance(
					reader.getValueClass(), conf);
			long position = reader.getPosition();
			while (reader.next(key, value)) {
				String syncSeen = reader.syncSeen() ? "*" : "";
				System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
						value);
				position = reader.getPosition(); // beginning of next record
			}
		} finally {
			IOUtils.closeStream(reader);
		}
	}
}

使用Block压缩后的大小对比:

root@ubuntu:~# hadoop fs -ls mpdemo/
Found 2 items
-rw-r--r--   3 root supergroup       4788 2012-05-19 00:11 /user/root/mpdemo/seqinput
-rw-r--r--   3 root supergroup        484 2012-05-19 00:17 /user/root/mpdemo/seqinputblock

0
顶

0
踩

分享到：

Custom KeyValueTextInputFormat | Hadoop Archive解决海量小文件存储

2012-05-19 15:22
浏览 2077
评论(0)
分类:行业应用
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop SequenceFile Writer And Reader

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Hadoop SequenceFile Writer And Reader

评论

发表评论

相关推荐

Parallel K-Means Clustering Based on MapReduce

Pagerank在Hadoop上的实现原理

Including external jars in a Hadoop job

[转]BSP模型与实例分析（一）

Hadoop中两表JOIN的处理方法

Hadoop DistributedCache

MapReduce，组合式，迭代式，链式

Hadoop ChainMap

广度优先BFS的MapReduce实现

HADOOP程序日志

TFIDF based on MapReduce

个人Hadoop 错误列表

Hadoop Map&Reduce个数优化设置以及JVM重用

有空读下

Hadoop MapReduce Job性能调优——修改Map和Reduce个数

Hadoop用于和Map Reduce作业交互的命令

Eclipse：Run on Hadoop 没有反应

Hadoop0.20+ custom MultipleOutputFormat

Custom KeyValueTextInputFormat

Hadoop Archive解决海量小文件存储

最近访客更多访客>>