HDFS的文件操作

yu06206

浏览: 112363 次
性别:
来自: 长沙

最近访客更多访客>>

依然任逍遥

星野渡

wmx王明溪

nat2010

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop

HDFS 文件操作

在去年寒假的时候，我们已经完成了hadoop集群的搭建，已经初步搭建起来了自己的云平台，也测试了一下远程访问HDFS，这几天也回顾了一下和测试了一下远程对HDFS的操作。
HDFS的文件操作
格式化HDFS
命令：user@namenode:hadoop$ bin/hadoop namenode -format
启动HDFS
命令：user@namenode:hadoop$ bin/start-dfs.sh

列出HDFS上的文件

命令：user@namenode:hadoop$ bin/hadoop dfs -ls

使用hadoop API

public List<String[]> GetFileBolckHost(Configuration conf, String FileName) {
		try {
			List<String[]> list = new ArrayList<String[]>();
			FileSystem hdfs = FileSystem.get(conf);
			Path path = new Path(FileName);
			FileStatus fileStatus = hdfs.getFileStatus(path);

			BlockLocation[] blkLocations = hdfs.getFileBlockLocations(
					fileStatus, 0, fileStatus.getLen());

			int blkCount = blkLocations.length;
			for (int i = 0; i < blkCount; i++) {
				String[] hosts = blkLocations[i].getHosts();
				list.add(hosts);
			}
			return list;
		} catch (IOException e) {
			e.printStackTrace();
		}
		return null;
	}

在HDFS上创建目录
命令：user@namenode:hadoop$ bin/hadoop dfs -mkdir /文件名

使用hadoop API

// 在HDFS新建文件
	public FSDataOutputStream CreateFile(Configuration conf, String FileName) {
		try {
			FileSystem hdfs = FileSystem.get(conf);
			Path path = new Path(FileName);
			FSDataOutputStream outputStream = hdfs.create(path);
			return outputStream;
		} catch (IOException e) {
			e.printStackTrace();
		}
		return null;
	}

上传一个文件到HDFS
命令：user@namenode:hadoop$ bin/hadoop dfs -put 文件名 /user/yourUserName/

使用hadoop API

// 上传文件到HDFS
	public void PutFile(Configuration conf, String srcFile, String dstFile) {
		try {
			FileSystem hdfs = FileSystem.get(conf);
			Path srcPath = new Path(srcFile);
			Path dstPath = new Path(dstFile);
			hdfs.copyFromLocalFile(srcPath, dstPath);
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

从 HDFS 中导出数据

命令：user@namenode:hadoop$ bin/hadoop dfs -cat foo

使用hadoop API

// 从HDFS读取文件
	public void ReadFile(Configuration conf, String FileName) {
		try {
			FileSystem hdfs = FileSystem.get(conf);
			FSDataInputStream dis = hdfs.open(new Path(FileName));
			IOUtils.copyBytes(dis, System.out, 4096, false);
			dis.close();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

HDFS 的关闭
命令：user@namenode:hadoop$ bin/stop-dfs.sh

HDFS全局状态信息

命令：bin/hadoop dfsadmin -report

我们可以得到一份全局状态报告。这份报告包含了HDFS集群的基本信息，当然也有每台机器的一些情况。

以上讲的都是本地操作HDFS，都是基于在ubuntu下并配置有hadoop环境下对HDFS的操作，作为客户端也可以在window系统下远程的对HDFS进行操作，其实原理基本上差不多，只需要集群中namenode对外开放的IP和端口，就可以访问到HDFS

/**
 * 对HDFS操作
 * @author yujing
 *
 */
public class Write {
	public static void main(String[] args) {
		try {
			uploadTohdfs();
			readHdfs();
			getDirectoryFromHdfs();
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	public static void uploadTohdfs() throws FileNotFoundException, IOException {
		String localSrc = "D://qq.txt";
		String dst = "hdfs://192.168.1.11:9000/usr/yujing/test.txt";
		InputStream in = new BufferedInputStream(new FileInputStream(localSrc));
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(dst), conf);
		OutputStream out = fs.create(new Path(dst), new Progressable() {
			public void progress() {
				System.out.println(".");
			}
		});
		System.out.println("上传文件成功");
		IOUtils.copyBytes(in, out, 4096, true);
	}

	/** 从HDFS上读取文件 */
	private static void readHdfs() throws FileNotFoundException, IOException {
		String dst = "hdfs://192.168.1.11:9000/usr/yujing/test.txt";
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(dst), conf);
		FSDataInputStream hdfsInStream = fs.open(new Path(dst));

		OutputStream out = new FileOutputStream("d:/qq-hdfs.txt");
		byte[] ioBuffer = new byte[1024];
		int readLen = hdfsInStream.read(ioBuffer);

		while (-1 != readLen) {
			out.write(ioBuffer, 0, readLen);
			readLen = hdfsInStream.read(ioBuffer);
		}
		System.out.println("读文件成功");
		out.close();
		hdfsInStream.close();
		fs.close();
	}

	/**
	 * 以append方式将内容添加到HDFS上文件的末尾;注意：文件更新，需要在hdfs-site.xml中添<property><name>dfs.
	 * append.support</name><value>true</value></property>
	 */
	private static void appendToHdfs() throws FileNotFoundException,
			IOException {
		String dst = "hdfs://192.168.1.11:9000/usr/yujing/test.txt";
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(dst), conf);
		FSDataOutputStream out = fs.append(new Path(dst));

		int readLen = "zhangzk add by hdfs java api".getBytes().length;

		while (-1 != readLen) {
			out.write("zhangzk add by hdfs java api".getBytes(), 0, readLen);
		}
		out.close();
		fs.close();
	}

	/** 从HDFS上删除文件 */
	private static void deleteFromHdfs() throws FileNotFoundException,
			IOException {
		String dst = "hdfs://192.168.1.11:9000/usr/yujing";
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(dst), conf);
		fs.deleteOnExit(new Path(dst));
		fs.close();
	}

	/** 遍历HDFS上的文件和目录 */
	private static void getDirectoryFromHdfs() throws FileNotFoundException,
			IOException {
		String dst = "hdfs://192.168.1.11:9000/usr/yujing";
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(URI.create(dst), conf);
		FileStatus fileList[] = fs.listStatus(new Path(dst));
		int size = fileList.length;
		for (int i = 0; i < size; i++) {
			System.out.println("文件名name:" + fileList[i].getPath().getName()
					+ "文件大小/t/tsize:" + fileList[i].getLen());
		}
		fs.close();
	}

}

我们可以通过http://主机IP：50030就可以查看集群的所有信息，也可以查看到自己上传到HDFS上的文件

分享到：

hadoop环境配置——（集群版） | hadoop环境配置——（单机版）

2012-02-07 11:43
浏览 9082
评论(2)
分类:编程语言
查看更多

2 楼 blackproof 2012-12-11

编码问题如何解决

1 楼 napolengogo 2012-11-14

远程访问的时候，client是非集群里的机器，就会存在权限问题，请问这个是怎么解决的

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论