`

hadoop-serializations

阅读更多

一. Writable

note:part of codes are from other's blog!here is a integrated and optimized shards.

 

package test;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.DefaultStringifier;

public class serializerWritable {

	/**
	 * @param args
	 */
	public static void main(String[] args) {

		Configuration conf = new Configuration();
		conf.set(
				"io.serializations",
				//TestSerializer uses Java's Serialization.
                                //if Testcase is used by that,here must be uncomment.
//				"org.apache.hadoop.io.serializer.JavaSerialization," + 
				"org.apache.hadoop.io.serializer.WritableSerialization"
				);
		TestSerializerWritable ts = new TestSerializerWritable(1, "测试呀");
		DefaultStringifier<TestSerializerWritable> ds = new DefaultStringifier<TestSerializerWritable>(
				conf, TestSerializerWritable.class);
		String s = null;
		try {
			s = ds.toString(ts);	//invoke ts's serialization method(write) automatically
		} catch (IOException e) {
			e.printStackTrace();
		}
                //if u used java serialization ,u will see the result  is space-cost much  than this
		System.out.println(s);
		TestSerializerWritable tsxp = null;
		try {
			tsxp = ds.fromString(s); //invoke deserialization method(read)
		} catch (IOException e) {
			e.printStackTrace();
		}
		System.out.println(tsxp.getA() + ":" + tsxp.getB());
	}

}

package test;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.Writable;

public class TestSerializerWritable implements Writable{

	private int a;
	private String b;
	public TestSerializerWritable( ) {
		
	}

	public TestSerializerWritable(int a, String b) {
		super();
		this.a = a;
		this.b = b;
	}

	public int getA() {
		return a;
	}

	public void setA(int a) {
		this.a = a;
	}

	public String getB() {
		return b;
	}

	public void setB(String b) {
		this.b = b;
	}

	@Override
	public void write(DataOutput out) throws IOException {
		out.writeInt(a);
		out.writeUTF(b);
		
	}

	@Override
	public void readFields(DataInput in) throws IOException {
		a = in.readInt();
		b = in.readUTF();
//		byte[] bb = new byte[1];
//		in.readFully(bb);
//		b = new String(bb);
	}
} 

 

here is a tips to not to use java objects to serialize to SequenceFile

 

 

 

//TODO

 

 

References:

http://blog.sina.com.cn/s/blog_5cec1e1d0100oi8p.html

分享到:
评论

相关推荐

    hadoop-eclipse-plugin1.2.1 and hadoop-eclipse-plugin2.8.0

    为了方便开发者在Eclipse或MyEclipse这样的集成开发环境中高效地进行Hadoop应用开发,Hadoop-Eclipse-Plugin应运而生。这个插件允许开发者直接在IDE中对Hadoop集群进行操作,如创建、编辑和运行MapReduce任务,极大...

    hadoop-auth-2.5.1-API文档-中文版.zip

    赠送jar包:hadoop-auth-2.5.1.jar; 赠送原API文档:hadoop-auth-2.5.1-javadoc.jar; 赠送源代码:hadoop-auth-2.5.1-sources.jar; 赠送Maven依赖信息文件:hadoop-auth-2.5.1.pom; 包含翻译后的API文档:hadoop...

    hadoop-yarn-client-2.6.5-API文档-中文版.zip

    赠送jar包:hadoop-yarn-client-2.6.5.jar; 赠送原API文档:hadoop-yarn-client-2.6.5-javadoc.jar; 赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom;...

    hadoop-lzo-0.4.21-SNAPSHOT jars

    1. `hadoop-lzo-0.4.21-SNAPSHOT-javadoc.jar`:这是Hadoop-LZO的Java文档(Javadoc),包含了一份详细的API文档,开发者可以通过查阅这份文档了解如何在自己的代码中调用Hadoop-LZO提供的接口和类,进行数据压缩...

    hadoop-mapreduce-client-jobclient-2.6.5-API文档-中文版.zip

    赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...

    hadoop-3.3.4 版本(最新版)

    Apache Hadoop (hadoop-3.3.4.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。官网下载速度非常缓慢,因此将hadoop-3.3.4 版本放在这里,欢迎大家来下载使用! Hadoop 架构是一个开源的、基于 Java 的编程...

    hadoop-yarn-common-2.6.5-API文档-中文版.zip

    赠送jar包:hadoop-yarn-common-2.6.5.jar 赠送原API文档:hadoop-yarn-common-2.6.5-javadoc.jar 赠送源代码:hadoop-yarn-common-2.6.5-sources.jar 包含翻译后的API文档:hadoop-yarn-common-2.6.5-javadoc-...

    hadoop-eclipse-plugin-3.1.1.tar.gz

    Hadoop-Eclipse-Plugin-3.1.1是一款专为Eclipse集成开发环境设计的插件,用于方便地在Hadoop分布式文件系统(HDFS)上进行开发和调试MapReduce程序。这款插件是Hadoop生态系统的组成部分,它使得Java开发者能够更加...

    好用的hadoop-eclipse-plugin-2.6.4.jar

    《Hadoop-Eclipse-Plugin-2.6.4.jar:Eclipse中的Hadoop开发利器》 在大数据处理领域,Hadoop作为一个开源的分布式计算框架,因其高效、可扩展的特性而备受青睐。为了方便开发者在Eclipse环境中进行Hadoop应用程序...

    hadoop-eclipse-plugin-2.7.3和2.7.7

    hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包

    hadoop-lzo-0.4.20.jar

    hadoop2 lzo 文件 ,编译好的64位 hadoop-lzo-0.4.20.jar 文件 ,在mac 系统下编译的,用法:解压后把hadoop-lzo-0.4.20.jar 放到你的hadoop 安装路径下的lib 下,把里面lib/Mac_OS_X-x86_64-64 下的所有文件 拷到 ...

    flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar.tar.gz

    在这个特定的兼容包中,我们可以看到两个文件:flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(实际的兼容库)和._flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar(可能是Mac OS的元数据文件,通常...

    hadoop-2.7.7单机win7或win10搭建完整包

    3.使用编辑器打开E:\apps\hadoop-2.7.7\etc\hadoop\hadoop-env.cmd,修改set JAVA_HOME=E:\apps\你的jdk目录名 4.把E:\apps\hadoop-2.7.7\bin\hadoop.dll拷贝到 C:\Windows\System32 5.设置环境变量,新建系统变量,...

    好用hadoop-eclipse-plugin-1.2.1

    hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1

    hadoop-common-2.6.0-bin-master.zip

    `hadoop-common-2.6.0-bin-master.zip` 是一个针对Hadoop 2.6.0版本的压缩包,特别适用于在Windows环境下进行本地开发和测试。这个版本的Hadoop包含了对Windows系统的优化,比如提供了`winutils.exe`,这是在Windows...

    hadoop-eclipse-plugin-1.2.1.jar有用的

    该资源包里面包含eclipse上的hadoop-1.2.1版本插件的jar包和hadoop-1.2.1.tar.gz,亲测可用~~请在下载完该包后解压,将hadoop-1.2.1放置于Eclipse\plugins目录下,然后重启eclipse,将hadoop-1.2.1.tar.gz放到D:\...

    hadoop-common-2.7.3-API文档-中文版.zip

    赠送jar包:hadoop-common-2.7.3.jar; 赠送原API文档:hadoop-common-2.7.3-javadoc.jar; 赠送源代码:hadoop-common-2.7.3-sources.jar; 赠送Maven依赖信息文件:hadoop-common-2.7.3.pom; 包含翻译后的API文档...

    hadoop-eclipse2.7.1、hadoop-eclipse2.7.2、hadoop-eclipse2.7.3

    标题中的"hadoop-eclipse2.7.1、hadoop-eclipse2.7.2、hadoop-eclipse2.7.3"代表了Hadoop-Eclipse插件的三个不同版本,每个版本对应Hadoop框架的2.7.x系列。版本号的递增通常意味着修复了前一版本的错误,增加了新...

    Hadoop-eclipse-plugin-2.7.2

    《Hadoop-eclipse-plugin-2.7.2:在Eclipse中轻松开发Hadoop应用》 在大数据处理领域,Hadoop作为一个开源的分布式计算框架,因其高效、可扩展的特性而备受青睐。然而,对于开发者而言,有效地集成开发环境至关重要...

Global site tag (gtag.js) - Google Analytics