`

Hadoop SequenceFile Writer And Reader

 
阅读更多

 

package cn.edu.xmu.dm.mpdemo.ioformat;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.CompressionType;
import org.apache.hadoop.io.Text;

/**
 * desc: SequenceFileWriter
 * <code>SequenceFileWriteDemo</code>
 * 
 * @author chenwq (irwenqiang@gmail.com)
 * @version 1.0 2012/05/19
 */
public class SequenceFileWriteDemo {
	private static final String[] DATA = { "One, two, buckle my shoe",
			"Three, four, shut the door", "Five, six, pick up sticks",
			"Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

	public static void main(String[] args) throws IOException {
		String uri = args[0];
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		Path path = new Path(uri);

		IntWritable key = new IntWritable();
		Text value = new Text();
		SequenceFile.Writer writer = null;
		try {
			/**
			 * fs: outputstream
			 * conf: configuration object
			 * key: the key' type
			 * value: the value's type
			 */
			writer = SequenceFile.createWriter(fs, conf, path, key.getClass(),
					value.getClass());
//			writer = SequenceFile.createWriter(fs, conf, path, key.getClass(),
//					value.getClass(), CompressionType.BLOCK);
			for (int i = 0; i < 100; i++) {
				key.set(100 - i);
				value.set(DATA[i % DATA.length]);
				System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
						value);
				writer.append(key, value);
			}
		} finally {
			IOUtils.closeStream(writer);
		}
	}
}

 

 

 

package cn.edu.xmu.dm.mpdemo.ioformat;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;

/**
 * desc: SequenceFileReader
 * <code>SequenceFileReadDemo</code>
 * 
 * @author chenwq (irwenqiang@gmail.com)
 * @version 1.0 2012/05/19
 */
public class SequenceFileReadDemo {
	public static void main(String[] args) throws IOException {
		String uri = args[0];
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		Path path = new Path(uri);

		SequenceFile.Reader reader = null;
		try {
			reader = new SequenceFile.Reader(fs, path, conf);
			Writable key = (Writable) ReflectionUtils.newInstance(
					reader.getKeyClass(), conf);
			Writable value = (Writable) ReflectionUtils.newInstance(
					reader.getValueClass(), conf);
			long position = reader.getPosition();
			while (reader.next(key, value)) {
				String syncSeen = reader.syncSeen() ? "*" : "";
				System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
						value);
				position = reader.getPosition(); // beginning of next record
			}
		} finally {
			IOUtils.closeStream(reader);
		}
	}
}

 

 

 

使用Block压缩后的大小对比:

 

root@ubuntu:~# hadoop fs -ls mpdemo/
Found 2 items
-rw-r--r--   3 root supergroup       4788 2012-05-19 00:11 /user/root/mpdemo/seqinput
-rw-r--r--   3 root supergroup        484 2012-05-19 00:17 /user/root/mpdemo/seqinputblock
 

 

0
0
分享到:
评论

相关推荐

    sequencify-CBIR-on-hadoop:将图像转换为 Hadoop SequenceFile 格式,适用于基于内容的图像检索系统

    在IT领域,尤其是在大数据处理和图像检索系统的设计中,Hadoop SequenceFile是一种广泛使用的存储格式。这个名为"sequencify-CBIR-on-hadoop"的项目专注于将图像数据转化为SequenceFile格式,以便在基于内容的图像...

    Hadoop_Data Processing and Modelling-Packt Publishing(2016).pdf

    Its simple programming model, "code once and deploy at any scale" paradigm, and an ever-growing ecosystem make Hadoop an inclusive platform for programmers with different levels of expertise and ...

    sequencefile&mapfile代码

    这可能涉及到使用`SequenceFile.Writer`和`SequenceFile.Reader`类,以及相关的序列化和反序列化工具。 **MapFile** MapFile是SequenceFile的一种扩展,它提供了一种索引结构来加速查找特定键的数据。MapFile由两...

    Hadoop Data Processing and Modelling azw3

    Hadoop Data Processing and Modelling 英文azw3 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    11、hadoop环境下的Sequence File的读写与合并

    SequenceFile.Reader reader = new SequenceFile.Reader(conf, ..., ...); while (reader.next(key, value)) { // 处理键值对 } reader.close(); ``` 3. **Sequence File的合并** 合并多个Sequence Files...

    Chinese2SequenceFile.rar_中文转Sequencefile

    - SequenceFile Writer:使用Hadoop的`SequenceFile.Writer`类来写入键值对到SequenceFile中,需要配置正确的键和值类型,以及文件路径。 4. **META-INF目录**: 这个目录通常包含关于软件包的信息,比如类库的元...

    Introduction to SAS and Hadoop

    Base SAS methods that are covered include reading and writing raw data with the DATA step and managing the Hadoop file system and executing Map-Reduce and Pig code from SAS via the HADOOP procedure....

    Scaling Big Data with Hadoop and Solr

    Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code.

    hadoop.dll-and-winutils.exe-for-hadoop2.7

    标题中的"hadoop.dll-and-winutils.exe-for-hadoop2.7"正指向了这个问题的解决方案。 "winutils.exe"是Hadoop项目的一部分,它在Linux系统中对应的是"bin/hadoop"命令,用于执行各种系统级操作,如设置HDFS的权限、...

    sequenceFile打包多个小文件

    SequenceFile的写入是通过SequenceFile.Writer来实现的,並将小文件的名称和内容写入SequenceFile中。 需要注意的是,SequenceFile是一种二进制文件,不能使用普通的文本编辑器来查看其内容。要查看SequenceFile的...

    Big Data, MapReduce, Hadoop, and Spark with Python

    Big Data, MapReduce, Hadoop, and Spark with Python: Master Big Data Analytics and Data Wrangling with MapReduce Fundamentals using Hadoop, Spark, and Python by LazyProgrammer English | 15 Aug 2016 | ...

    hadoop backup and recovery solutions

    从所给内容中提到的书名《Hadoop Backup and Recovery Solutions》来看,书中深入探讨了如何从Hadoop备份集群中恢复数据以及如何排查问题。这包括了对Hadoop集群备份过程中可能遇到的问题,及其解决方案的介绍,也...

    pc机连接集群的HADOOP_HOME

    在IT行业中,Hadoop是一个广泛使用的开源框架,用于存储和处理大数据。Hadoop分布式文件系统(HDFS)和MapReduce是其核心组件,允许数据在集群中的多台服务器上进行分布式计算。标题“pc机连接集群的HADOOP_HOME”指...

    Data Algorithms Recipes for Scaling Up with Hadoop and Spark epub

    Data Algorithms Recipes for Scaling Up with Hadoop and Spark 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 本资源转载自网络,如有侵权,请联系上传者或csdn删除

    Hadoop.Essentials.1784396680

    This book jumps into the world of Hadoop ecosystem components and its tools in a simplified manner, and provides you with the skills to utilize them effectively for faster and effective development of...

    hadoop.dll-and-winutils.exe-for-hadoop2.7.3-on-windows_X64

    Hadoop 2.7.3 Windows64位 编译bin(包含winutils.exe, hadoop.dll),自己用的,把压缩包里的winutils.exe, hadoop.dll 放在你的bin 目录 在重启eclipse 就好了

Global site tag (gtag.js) - Google Analytics