一、例3-3的read实现:
package com.tht.hdfs;
//cc FileSystemDoubleCat Displays files from a Hadoop filesystem on standard output twice, by using seek
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
//vv FileSystemDoubleCat
public class FileSystemDoubleCat {
public static void main(String[] args) throws Exception {
// String uri = args[0];
String uri = "hdfs://121.1.253.251:9000/in/core-site.xml";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FSDataInputStream in = null;
byte b[] = new byte[500];
try {
in = fs.open(new Path(uri));
IOUtils.copyBytes(in, System.out, 4096, false);
//in.seek(0); // go back to the start of the file
//IOUtils.copyBytes(in, System.out, 4096, false);
in.read(83,b,10,300);
System.out.println(new String(b));
} finally {
IOUtils.closeStream(in);
}
}
}
// ^^ FileSystemDoubleCat
在第三版英文原版上有如下解释:
FSDataInputStream also implements the PositionedReadable interface for reading parts
of a file at a given offset:
public interface PositionedReadable {
public int read(long position, byte[] buffer, int offset, int length)
throws IOException;
public void readFully(long position, byte[] buffer, int offset, int length)
throws IOException;
public void readFully(long position, byte[] buffer) throws IOException;
}
The read() method reads up to length bytes from the given position in the file into the
buffer at the given offset in the buffer. The return value is the number of bytes actually
read; callers should check this value, as it may be less than length.
二、例3-7的实现:
使用指南中的类:RegexExcludePathFilter(不包含)。
//cc RegexExcludePathFilter A PathFilter for excluding paths that match a regular expression
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
//vv RegexExcludePathFilter
public class RegexExcludePathFilter implements PathFilter {
private final String regex;
public RegexExcludePathFilter(String regex) {
this.regex = regex;
}
public boolean accept(Path path) {
return !path.toString().matches(regex);
}
}
//^^ RegexExcludePathFilter
写一个测试类:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
import java.net.URI;
public class GlobStatus {
public static void main(String[] args) throws IOException {
String uri = "hdfs://121.1.253.251:9000/in/*";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(URI.create(uri), conf);
FileStatus[] status = fs.globStatus(new Path(uri),new RegexExcludePathFilter("^.*/"));
Path[] listedPaths = FileUtil.stat2Paths(status);
for (Path p : listedPaths) {
System.out.println(p);
}
}
}
glob |
name |
matches |
* |
星号 |
Matches zero or more characters |
? |
问号 |
Matches a single character |
[ab] |
字符类 |
Matches a single character in the set {a, b} |
[^ab] |
非字符类 |
Matches a single character that is not in the set {a, b} |
[a-b] |
字符范围 |
Matches a single character in the (closed) range [a, b],
where a is lexicographicallyless than or equal to b |
[^a-b] |
非字符范围 |
Matches a single character that is not in the (closed) range [a, b],
where a islexicographically less than or equal to b |
{a,b} |
或选择 |
Matches either expression a or b |
\c |
转义字符 |
Matches character c when it is a metacharacter |
通配符及其含义
三、一致模型
看下面的例子:
import java.io.OutputStream;
import java.net.URI;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class CoherencyModel {
public static void main(String[] args) throws Exception {
String uri = "hdfs://121.1.253.251:9000/in/";
Configuration conf=new Configuration();
FileSystem fs=FileSystem.get(URI.create(uri),conf);
Path p = new Path(uri+"/p");//如果改为Path p = new Path("p");则输出结果变为hdfs://121.1.253.251:9000/user/hadoop/p
OutputStream out = fs.create(p);
out.write("content for tht test".getBytes("UTF-8"));
out.flush();
out.close();//隐含执行同步方法sync()。
System.out.println(fs.getFileStatus(p).getPath());
}
}
输出为:
hdfs://121.1.253.251:9000/in/p
分享到:
相关推荐
`Hadoop: The Definitive Guide`中可能会讲解如何创建、读取和操作HDFS上的文件,以及如何配置HDFS参数以优化性能。 MapReduce是Hadoop处理大数据的主要计算模型,它将大规模数据处理任务分解为小的“映射”和...
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of ...
Hadoop: The Definitive Guide, 4th Edition Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable,...
《Hadoop:权威指南》是了解和掌握Apache Hadoop生态系统不可或缺的一本著作。这本书由Tom White撰写,全面深入地介绍了Hadoop的各个组件及其工作原理,对于初学者和专业人士来说都是一份宝贵的参考资料。 Hadoop是...
《Hadoop:The Definitive Guide》是O'REILLY出版社出版的一本关于Apache Hadoop的权威指南,目前流行的是第四版。这本书为读者提供了一个全面的Hadoop学习平台,内容涵盖了如何构建和维护一个既可靠又可扩展的...
- **书名**:《Hadoop:The Definitive Guide》(第二版) - **作者**:Tom White - **前言作者**:Doug Cutting - **出版社**:O'Reilly Media, Inc. - **出版日期**:2010年10月 - **版权**:版权所有 © 2011 Tom...
Hadoop是一个由Apache软件基金会开发的开源框架,它允许通过简单的编程模型跨分布式环境存储和处理大数据。其设计目标是可伸缩、高效以及能够从单个服务器的单个机架到数千台机器的大规模商用服务器集群实现容错。...
Hadoop- The Definitive Guide, 4th Edition
《Hadoop: The Definitive Guide, Third Edition》是Tom White对于Hadoop的深入剖析之作,涵盖了Hadoop的多个方面,包括它的历史、架构和应用。本书为读者展示了Hadoop的生态系统和核心概念,包括分布式文件系统...
What’s New in the Fourth Edition? The fourth edition covers Hadoop 2 exclusively. The Hadoop 2 release series is the current active release series and contains the most stable versions of Hadoop. ...
《Hadoop权威指南》是大数据领域的一本经典著作,它详细介绍了Apache Hadoop生态系统的核心组件、工作原理以及实际应用。这本书分为中文版和英文版,为读者提供了双语学习的选择,且带有书签,便于查阅和学习,无需...
《Spark: The Definitive Guide: Big Data Processing Made Simple》是大数据处理领域的经典著作,由Databricks的创始人之一Michael Armbrust等专家撰写。这本书深入浅出地介绍了Apache Spark的核心概念、架构以及...
EPUB版本: Hadoop -The Definitive Guide, 4th edition.epub