Hadoop: the definitive guide 第三版拾遗第三章之查看文件及正则表达式

tenght

浏览: 51912 次

最近访客更多访客>>

jxqc_job

汽车城路

极品拖拉机

aubdiy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

一、例3-3的read实现：

package com.tht.hdfs;

//cc FileSystemDoubleCat Displays files from a Hadoop filesystem on standard output twice, by using seek
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

//vv FileSystemDoubleCat
public class FileSystemDoubleCat {

	public static void main(String[] args) throws Exception {
		// String uri = args[0];
		String uri = "hdfs://121.1.253.251:9000/in/core-site.xml";
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(uri), conf);
		FSDataInputStream in = null;
        byte b[] = new byte[500];
		try {
			in = fs.open(new Path(uri));
			IOUtils.copyBytes(in, System.out, 4096, false);
			//in.seek(0); // go back to the start of the file
			//IOUtils.copyBytes(in, System.out, 4096, false);
			
			in.read(83,b,10,300);
			System.out.println(new String(b)); 
		} finally {
			IOUtils.closeStream(in);
		}
	}
}
// ^^ FileSystemDoubleCat

在第三版英文原版上有如下解释：

FSDataInputStream also implements the PositionedReadable interface for reading parts

of a file at a given offset:

public interface PositionedReadable {
  public int read(long position, byte[] buffer, int offset, int length)
    throws IOException;
  
  public void readFully(long position, byte[] buffer, int offset, int length)
    throws IOException;
  
  public void readFully(long position, byte[] buffer) throws IOException;
}

The read() method reads up to length bytes from the given position in the file into the
buffer at the given offset in the buffer. The return value is the number of bytes actually
read; callers should check this value, as it may be less than length.

二、例3-7的实现：

使用指南中的类：RegexExcludePathFilter（不包含）。

//cc RegexExcludePathFilter A PathFilter for excluding paths that match a regular expression

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;

//vv RegexExcludePathFilter
public class RegexExcludePathFilter implements PathFilter {

private final String regex;

public RegexExcludePathFilter(String regex) {
 this.regex = regex;
}

public boolean accept(Path path) {
 return !path.toString().matches(regex);
}
}
//^^ RegexExcludePathFilter

写一个测试类：

import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.fs.FileStatus;  
import org.apache.hadoop.fs.FileSystem;  
import org.apache.hadoop.fs.FileUtil;  
import org.apache.hadoop.fs.Path;  
  
import java.io.IOException;  
import java.net.URI;  

public class GlobStatus {  
    public static void main(String[] args) throws IOException {  
        String uri = "hdfs://121.1.253.251:9000/in/*";  
        Configuration conf = new Configuration();  
        FileSystem fs = FileSystem.get(URI.create(uri), conf);  
  
        FileStatus[] status = fs.globStatus(new Path(uri),new RegexExcludePathFilter("^.*/"));  
        Path[] listedPaths = FileUtil.stat2Paths(status);  
        for (Path p : listedPaths) {  
            System.out.println(p);  
        }  
    }  
}

glob	name	matches
*	星号	Matches zero or more characters
?	问号	Matches a single character
[ab]	字符类	Matches a single character in the set {a, b}
[^ab]	非字符类	Matches a single character that is not in the set {a, b}
[a-b]	字符范围	Matches a single character in the (closed) range [a, b], where a is lexicographicallyless than or equal to b
[^a-b]	非字符范围	Matches a single character that is not in the (closed) range [a, b], where a islexicographically less than or equal to b
{a,b}	或选择	Matches either expression a or b
\c	转义字符	Matches character c when it is a metacharacter

通配符及其含义
三、一致模型

看下面的例子：

import java.io.OutputStream;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class CoherencyModel {
	public static void main(String[] args) throws Exception {  
	String uri = "hdfs://121.1.253.251:9000/in/";
    Configuration conf=new Configuration();  
    FileSystem fs=FileSystem.get(URI.create(uri),conf);
    Path p = new Path(uri+"/p");//如果改为Path p = new Path("p");则输出结果变为hdfs://121.1.253.251:9000/user/hadoop/p
    OutputStream out = fs.create(p);
    out.write("content for tht test".getBytes("UTF-8"));
    out.flush();
    out.close();//隐含执行同步方法sync()。
    System.out.println(fs.getFileStatus(p).getPath()); 
    }
}

输出为：

hdfs://121.1.253.251:9000/in/p

分享到：

虚拟机安装的系统时间跟本机时间同步的方法 | 简单验证hadoop的wordcount

2013-08-01 11:46
浏览 289
评论(0)
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论