HDFS的JAVA接口API操作实例 -

landyer

浏览: 144337 次
性别:
来自: 上海

最近访客更多访客>>

bill00

hubobocbb

ronggui

笨鸟刃心

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

HDFS的JAVA接口API操作实例

博客分类：

hadoop

Java Hadoop OS CentOS Apache

20:55 2010-6-2

运行环境：

Hadoop.0.20.2

CentOS 5.4

java version "1.6.0_20-ea"

配置的是单机Hadoop环境

先看下我的运行截图

主要参考这篇文章

http://myjavanotebook.blogspot.com/2008/05/hadoop-file-system-tutorial.html

1.Copy a file from the local file system to HDFS

The srcFile variable needs to contain the full name (path + file name) of the file in the local file system.

The dstFile variable needs to contain the desired full name of the file in the Hadoop file system.

Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path srcPath = new Path(srcFile); Path dstPath = new Path(dstFile); hdfs.copyFromLocalFile(srcPath, dstPath);

2.Create HDFS file

The fileName variable contains the file name and path in the Hadoop file system.

The content of the file is the buff variable which is an array of bytes.

//byte[] buff - The content of the file Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); FSDataOutputStream outputStream = hdfs.create(path); outputStream.write(buff, 0, buff.length);

3.Rename HDFS file

In order to rename a file in Hadoop file system, we need the full name (path + name) of

the file we want to rename. The rename method returns true if the file was renamed, otherwise false.

Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path fromPath = new Path(fromFileName); Path toPath = new Path(toFileName); boolean isRenamed = hdfs.rename(fromPath, toPath);

4.Delete HDFS file

In order to delete a file in Hadoop file system, we need the full name (path + name)

of the file we want to delete. The delete method returns true if the file was deleted, otherwise false.

Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); boolean isDeleted = hdfs.delete(path, false); Recursive delete: Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); boolean isDeleted = hdfs.delete(path, true);

5.Get HDFS file last modification time

In order to get the last modification time of a file in Hadoop file system,

we need the full name (path + name) of the file.

Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); FileStatus fileStatus = hdfs.getFileStatus(path); long modificationTime = fileStatus.getModificationTime

6.Check if a file exists in HDFS

In order to check the existance of a file in Hadoop file system,

we need the full name (path + name) of the file we want to check.

The exists methods returns true if the file exists, otherwise false.

Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); boolean isExists = hdfs.exists(path);

7.Get the locations of a file in the HDFS cluster

A file can exist on more than one node in the Hadoop file system cluster for two reasons:

Based on the HDFS cluster configuration, Hadoop saves parts of files on different nodes in the cluster.

Based on the HDFS cluster configuration, Hadoop saves more than one copy of each file on different nodes for redundancy (The default is three).

Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); FileStatus fileStatus = hdfs.getFileStatus(path); BlockLocation[] blkLocations = hdfs.getFileBlockLocations(path, 0, fileStatus.getLen());

BlockLocation[] blkLocations = hdfs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); //这个地方，作者写错了，需要把path改为fileStatus int blkCount = blkLocations.length; for (int i=0; i < blkCount; i++) { String[] hosts = blkLocations[i].getHosts(); // Do something with the block hosts }

8. Get a list of all the nodes host names in the HDFS cluster

his method casts the FileSystem Object to a DistributedFileSystem Object.

This method will work only when Hadoop is configured as a cluster.

Running Hadoop on the local machine only, in a non cluster configuration will

cause this method to throw an Exception.

Configuration config = new Configuration(); FileSystem fs = FileSystem.get(config); DistributedFileSystem hdfs = (DistributedFileSystem) fs; DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats(); String[] names = new String[dataNodeStats.length]; for (int i = 0; i < dataNodeStats.length; i++) { names[i] = dataNodeStats[i].getHostName(); }

程序实例

/* * * 演示操作HDFS的java接口 * * */ import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.hdfs.*; import org.apache.hadoop.hdfs.protocol.*; import java.util.Date; public class DFSOperater { /** * @param args */ public static void main(String[] args) { Configuration conf = new Configuration(); try { // Get a list of all the nodes host names in the HDFS cluster FileSystem fs = FileSystem.get(conf); DistributedFileSystem hdfs = (DistributedFileSystem)fs; DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats(); String[] names = new String[dataNodeStats.length]; System.out.println("list of all the nodes in HDFS cluster:"); //print info for(int i=0; i < dataNodeStats.length; i++){ names[i] = dataNodeStats[i].getHostName(); System.out.println(names[i]); //print info } Path f = new Path("/user/cluster/dfs.txt"); //check if a file exists in HDFS boolean isExists = fs.exists(f); System.out.println("The file exists? [" + isExists + "]"); //if the file exist, delete it if(isExists){ boolean isDeleted = hdfs.delete(f, false);//fase : not recursive if(isDeleted)System.out.println("now delete " + f.getName()); } //create and write System.out.println("create and write [" + f.getName() + "] to hdfs:"); FSDataOutputStream os = fs.create(f, true, 0); for(int i=0; i<10; i++){ os.writeChars("test hdfs "); } os.writeChars("\n"); os.close(); //get the locations of a file in HDFS System.out.println("locations of file in HDFS:"); FileStatus filestatus = fs.getFileStatus(f); BlockLocation[] blkLocations = fs.getFileBlockLocations(filestatus, 0,filestatus.getLen()); int blkCount = blkLocations.length; for(int i=0; i < blkCount; i++){ String[] hosts = blkLocations[i].getHosts(); //Do sth with the block hosts System.out.println(hosts); } //get HDFS file last modification time long modificationTime = filestatus.getModificationTime(); // measured in milliseconds since the epoch Date d = new Date(modificationTime); System.out.println(d); //reading from HDFS System.out.println("read [" + f.getName() + "] from hdfs:"); FSDataInputStream dis = fs.open(f); System.out.println(dis.readUTF()); dis.close(); } catch (Exception e) { // TODO: handle exception e.printStackTrace(); } } }

编译后拷贝到node1上面运行，杯具，不会用Eclipse插件

[cluster /opt/hadoop/source]$cp /opt/winxp/hadoop/dfs_operator.jar . [cluster /opt/hadoop/source]$hadoop jar dfs_operator.jar DFSOperater list of all the nodes in HDFS cluster: node1 The file exists? [true] now delete dfs.txt create and write [dfs.txt] to hdfs: locations of file in HDFS: [Ljava.lang.String;@72ffb Wed Jun 02 18:29:14 CST 2010 read [dfs.txt] from hdfs: est hdfs test hdfs test hdfs test hdfs test hdfs test hdfs

运行成功！查看输出文件

[cluster /opt/hadoop/source]$hadoop fs -cat dfs.txt test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs

分享到：

hadoop 通用操作 | (转)MapReduce源码分析总结

2011-05-03 11:10
浏览 2222
评论(2)
分类:编程语言
查看更多

2 楼 landyer 2011-09-04

qkshan 写道

能帮我把我的Gmail邮箱删掉吗，谢谢，好多垃圾邮件啊

删除了，给你添麻烦了

1 楼 qkshan 2011-08-29

能帮我把我的Gmail邮箱删掉吗，谢谢，好多垃圾邮件啊

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HDFS的JAVA接口API操作实例

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

HDFS的JAVA接口API操作实例

评论

发表评论

相关推荐

thrift安装资料集合

在Ubuntu下编译安装Thrift(支持php和c++)

HBase Thrift 0.5.0 + PHP 5 安裝設定

Hadoop+hbase+thrift H.H.T环境部署

php操作hbase例子

HBase技术介绍

详细讲解Hadoop中的一个简单数据库HBase

hive sql语法解读

Hive 的启动方式

Hive环境搭建与入门

Hbase入门6 -白话MySQL(RDBMS)与HBase之间

Apache Hive入门3–Hive与HBase的整合

Apache Hive入门2

Apache Hive入门1

hbase分布安装部署

使用HBase的一个典型例子，涉及了HBase中很多概念

HBase入门篇4–存储

HBase入门篇3

HBase入门篇2-Java操作HBase例子

HBase入门篇

最近访客更多访客>>