写操作
根据上一篇的介绍,在hadoop2.x之后,hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的createWriter()
方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:
新的API里提供的option参数:
FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption
这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。
-
createWriter
public staticorg.apache.hadoop.io.SequenceFile.WritercreateWriter(Configurationconf,
org.apache.hadoop.io.SequenceFile.Writer.Option...opts)
throws IOException
Create a new Writer with the given options.
- Parameters:
-
conf
- the configuration to use
-
opts
- the options to create the file with
- Returns:
- a new Writer
- Throws:
IOException
权威指南第四版中提供了一个SequenceFileWriteDemo实例:
// cc SequenceFileWriteDemo Writing a SequenceFile
import java.io.IOException;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
// vv SequenceFileWriteDemo
public class SequenceFileWriteDemo {
private static final String[] DATA = {
"One, two, buckle my shoe",
"Three, four, shut the door",
"Five, six, pick up sticks",
"Seven, eight, lay them straight",
"Nine, ten, a big fat hen"
};
public static void main(String[] args) throws IOException {
String uri = args[0];
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
Path path = new Path(uri);
IntWritable key = new IntWritable();
Text value = new Text();
SequenceFile.Writer writer = null;
try {
writer = SequenceFile.createWriter(fs, conf, path,
key.getClass(), value.getClass());
for (int i = 0; i < 100; i++) {
key.set(100 - i);
value.set(DATA[i % DATA.length]);
System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
writer.append(key, value);
}
} finally {
IOUtils.closeStream(writer);
}
}
}
// ^^ SequenceFileWriteDemo
对于上面实例中的createWriter()
方法用整合之后的最新的方法来改写一下,代码如下:
package org.apache.hadoop.io;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.SequenceFile.Writer.FileOption;
import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption;
import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption;
import org.apache.hadoop.io.Text;
public class THT_testSequenceFile2 {
private static final String[] DATA = { "One, two, buckle my shoe",
"Three, four, shut the door", "Five, six, pick up sticks",
"Seven, eight, lay them straight", "Nine, ten, a big fat hen" };
public static void main(String[] args) throws IOException {
// String uri = args[0];
String uri = "file:///D://B.txt";
Configuration conf = new Configuration();
Path path = new Path(uri);
IntWritable key = new IntWritable();
Text value = new Text();
SequenceFile.Writer writer = null;
SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path);
SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass());
SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass());
try {
writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD));
for (int i = 0; i < 10; i++) {
key.set(1 + i);
value.set(DATA[i % DATA.length]);
System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key,
value);
writer.append(key, value);
}
} finally {
IOUtils.closeStream(writer);
}
}
}
运行结果如下:
2015-11-06 22:15:05,027 INFO compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate]
[128] 1 One, two, buckle my shoe
[173] 2 Three, four, shut the door
[220] 3 Five, six, pick up sticks
[264] 4 Seven, eight, lay them straight
[314] 5 Nine, ten, a big fat hen
[359] 6 One, two, buckle my shoe
[404] 7 Three, four, shut the door
[451] 8 Five, six, pick up sticks
[495] 9 Seven, eight, lay them straight
[545] 10 Nine, ten, a big fat hen
生成的文件:

读操作
新的API里提供的option参数:
FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption
根据最新的API直接上源码:
package org.apache.hadoop.io;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.util.ReflectionUtils;
public class THT_testSequenceFile3 {
public static void main(String[] args) throws IOException {
//String uri = args[0];
String uri = "file:///D://B.txt";
Configuration conf = new Configuration();
Path path = new Path(uri);
SequenceFile.Reader.Option option1 = Reader.file(path);
SequenceFile.Reader.Option option2 = Reader.length(174);//这个参数表示读取的长度
SequenceFile.Reader reader = null;
try {
reader = new SequenceFile.Reader(conf,option1,option2);
Writable key = (Writable) ReflectionUtils.newInstance(
reader.getKeyClass(), conf);
Writable value = (Writable) ReflectionUtils.newInstance(
reader.getValueClass(), conf);
long position = reader.getPosition();
while (reader.next(key, value)) {
String syncSeen = reader.syncSeen() ? "*" : "";
System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key,
value);
position = reader.getPosition(); // beginning of next record
}
} finally {
IOUtils.closeStream(reader);
}
}
}
我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:
2015-11-06 22:53:00,602 INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate]
[128] 1 One, two, buckle my shoe
[173] 2 Three, four, shut the door
分享到:
相关推荐
Hadoop的I/O、MapReduce应用程序开发;MapReduce的工作机制;MapReduce的类型和格式;MapReduce的特性;如何构建Hadoop集群,如何管理Hadoop;Pig简介;Hbase简介;Hive简介;ZooKeeper简介;开源工具Sqoop,最后还...
第4章 Hadoop I/O 数据完整性 HDFS的数据完整性 LocalFileSystem ChecksumFileSystem 压缩 codec 压缩和输入切分 在MapReduce中使用压缩 序列化 Writable接口 Writable类 实现定制的Writable类型 序列化框架 Avro ...
### Hadoop C++ 扩展技术详解 #### 一、背景简介 随着大数据处理需求的增加,Hadoop作为主流的大规模分布式数据处理框架之一,在处理海量数据时面临着一系列挑战。尤其是当任务数量巨大时,Hadoop系统中的Java...
技术运维-机房巡检表及巡检说明
第四次算法分析与设计整理
图像处理项目实战
该资源为jaxlib-0.4.18-cp311-cp311-macosx_11_0_arm64.whl,欢迎下载使用哦!
搭建说明. 运行环境 php5.6 mysql5.6 扩展sg11 前置条件: 前后端分离,需要准备两个域名,一个后台域名,一个前端域名 后端源码修改(cs2.ijiuwu.com批量替换改为你的后端域名)数据库修改(cs3.ijiuwu.com批量替换为你的前端域名)1、创建后台站点,上传后台源码并解压到根目录2、创建前端站点,上传前端源码并解压到根目录 3、创建数据库上传并导入数据库文件 4、修改数据库信息: 后台:app/database.php 前端:application/database.php 前端站点设置 伪静态thinkphp 运行目录public 关闭防跨站 访问后台域名/admin.php进入后台管理 admin 123456 系统-》系统设置-》附件设置-》Web服务器URL 改为你的前端域名 系统-》清前台缓存 改为你的前端域名 点击刷新缓存
【毕业答辩】爆款黑板风教育文艺毕业论文答辩通用模板.pptx
1、文件内容:systemd-devel-219-78.el7_9.9.rpm以及相关依赖 2、文件形式:tar.gz压缩包 3、安装指令: #Step1、解压 tar -zxvf /mnt/data/output/systemd-devel-219-78.el7_9.9.tar.gz #Step2、进入解压后的目录,执行安装 sudo rpm -ivh *.rpm 4、更多资源/技术支持:公众号禅静编程坊
win32汇编环境,对 WM-MOUSEMOVE 消息的理解
车牌识别项目
UE项目开发过程中的一些快捷脚本
lab1的words.txt文件
python、yolo、pytorch
人工智能、大语言模型相关学习资料
图像处理项目实战
python、yolo、pytorch
车牌识别项目
该资源为jaxlib-0.4.18-cp312-cp312-macosx_10_14_x86_64.whl,欢迎下载使用哦!