20:55 2010-6-2
运行环境:
Hadoop.0.20.2
CentOS 5.4
java version "1.6.0_20-ea"
配置的是单机Hadoop环境
先看下我的运行截图
主要参考这篇文章
1.Copy a file from the local file system to HDFS
The srcFile variable needs to contain the full name (path + file name) of the file in the local file system.
The dstFile variable needs to contain the desired full name of the file in the Hadoop file system.
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path srcPath = new Path(srcFile); Path dstPath = new Path(dstFile); hdfs.copyFromLocalFile(srcPath, dstPath);
|
2.Create HDFS file
The fileName variable contains the file name and path in the Hadoop file system.
The content of the file is the buff variable which is an array of bytes.
//byte[] buff - The content of the file
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); FSDataOutputStream outputStream = hdfs.create(path); outputStream.write(buff, 0, buff.length);
|
3.Rename HDFS file
In order to rename a file in Hadoop file system, we need the full name (path + name) of
the file we want to rename. The rename method returns true if the file was renamed, otherwise false.
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path fromPath = new Path(fromFileName); Path toPath = new Path(toFileName); boolean isRenamed = hdfs.rename(fromPath, toPath);
|
4.Delete HDFS file
In order to delete a file in Hadoop file system, we need the full name (path + name)
of the file we want to delete. The delete method returns true if the file was deleted, otherwise false.
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); boolean isDeleted = hdfs.delete(path, false);
Recursive delete: Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); boolean isDeleted = hdfs.delete(path, true);
|
5.Get HDFS file last modification time
In order to get the last modification time of a file in Hadoop file system,
we need the full name (path + name) of the file.
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); FileStatus fileStatus = hdfs.getFileStatus(path); long modificationTime = fileStatus.getModificationTime
|
6.Check if a file exists in HDFS
In order to check the existance of a file in Hadoop file system,
we need the full name (path + name) of the file we want to check.
The exists methods returns true if the file exists, otherwise false.
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); boolean isExists = hdfs.exists(path);
|
7.Get the locations of a file in the HDFS cluster
A file can exist on more than one node in the Hadoop file system cluster for two reasons:
Based on the HDFS cluster configuration, Hadoop saves parts of files on different nodes in the cluster.
Based on the HDFS cluster configuration, Hadoop saves more than one copy of each file on different nodes for redundancy (The default is three).
Configuration config = new Configuration(); FileSystem hdfs = FileSystem.get(config); Path path = new Path(fileName); FileStatus fileStatus = hdfs.getFileStatus(path);
BlockLocation[] blkLocations = hdfs.getFileBlockLocations(path, 0, fileStatus.getLen());
BlockLocation[] blkLocations = hdfs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen()); //这个地方,作者写错了,需要把path改为fileStatus int blkCount = blkLocations.length; for (int i=0; i < blkCount; i++) { String[] hosts = blkLocations[i].getHosts(); // Do something with the block hosts }
|
8. Get a list of all the nodes host names in the HDFS cluster
his method casts the FileSystem Object to a DistributedFileSystem Object.
This method will work only when Hadoop is configured as a cluster.
Running Hadoop on the local machine only, in a non cluster configuration will
cause this method to throw an Exception.
Configuration config = new Configuration(); FileSystem fs = FileSystem.get(config); DistributedFileSystem hdfs = (DistributedFileSystem) fs; DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats(); String[] names = new String[dataNodeStats.length]; for (int i = 0; i < dataNodeStats.length; i++) { names[i] = dataNodeStats[i].getHostName(); }
|
程序实例
/* * * 演示操作HDFS的java接口 * * */
import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.hdfs.*; import org.apache.hadoop.hdfs.protocol.*; import java.util.Date;
public class DFSOperater {
/** * @param args */ public static void main(String[] args) {
Configuration conf = new Configuration(); try { // Get a list of all the nodes host names in the HDFS cluster
FileSystem fs = FileSystem.get(conf); DistributedFileSystem hdfs = (DistributedFileSystem)fs; DatanodeInfo[] dataNodeStats = hdfs.getDataNodeStats(); String[] names = new String[dataNodeStats.length]; System.out.println("list of all the nodes in HDFS cluster:"); //print info
for(int i=0; i < dataNodeStats.length; i++){ names[i] = dataNodeStats[i].getHostName(); System.out.println(names[i]); //print info
} Path f = new Path("/user/cluster/dfs.txt"); //check if a file exists in HDFS
boolean isExists = fs.exists(f); System.out.println("The file exists? [" + isExists + "]"); //if the file exist, delete it
if(isExists){ boolean isDeleted = hdfs.delete(f, false);//fase : not recursive
if(isDeleted)System.out.println("now delete " + f.getName()); } //create and write
System.out.println("create and write [" + f.getName() + "] to hdfs:"); FSDataOutputStream os = fs.create(f, true, 0); for(int i=0; i<10; i++){ os.writeChars("test hdfs "); } os.writeChars("\n"); os.close(); //get the locations of a file in HDFS
System.out.println("locations of file in HDFS:"); FileStatus filestatus = fs.getFileStatus(f); BlockLocation[] blkLocations = fs.getFileBlockLocations(filestatus, 0,filestatus.getLen()); int blkCount = blkLocations.length; for(int i=0; i < blkCount; i++){ String[] hosts = blkLocations[i].getHosts(); //Do sth with the block hosts
System.out.println(hosts); } //get HDFS file last modification time
long modificationTime = filestatus.getModificationTime(); // measured in milliseconds since the epoch
Date d = new Date(modificationTime); System.out.println(d); //reading from HDFS
System.out.println("read [" + f.getName() + "] from hdfs:"); FSDataInputStream dis = fs.open(f); System.out.println(dis.readUTF()); dis.close();
} catch (Exception e) { // TODO: handle exception
e.printStackTrace(); } }
}
|
编译后拷贝到node1上面运行,杯具,不会用Eclipse插件
[cluster /opt/hadoop/source]$cp /opt/winxp/hadoop/dfs_operator.jar . [cluster /opt/hadoop/source]$hadoop jar dfs_operator.jar DFSOperater list of all the nodes in HDFS cluster: node1 The file exists? [true] now delete dfs.txt create and write [dfs.txt] to hdfs: locations of file in HDFS: [Ljava.lang.String;@72ffb Wed Jun 02 18:29:14 CST 2010 read [dfs.txt] from hdfs: est hdfs test hdfs test hdfs test hdfs test hdfs test hdfs
|
运行成功!查看输出文件
[cluster /opt/hadoop/source]$hadoop fs -cat dfs.txt test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs test hdfs
|
分享到:
相关推荐
在Java编程环境中,Hadoop分布式文件系统(HDFS)提供了丰富的Java API,使得开发者能够方便地与HDFS进行交互,包括文件的上传、下载、读写等操作。本篇文章将详细探讨如何使用HDFS Java API来实现文件上传的功能。 ...
HDFS API是Hadoop的核心组件之一,它提供了一组Java类和接口,允许用户在HDFS上执行各种操作。主要涉及的类有`FileSystem`、`DFSClient`和`DFSOutputStream`等,而核心接口包括`DFSInputStream`、`DFSOutputStream`...
【大数据技术基础实验报告-调用Java API实现HDFS操作】 本实验主要涵盖了大数据技术的基础,特别是如何在Linux环境下利用Java API对Hadoop分布式文件系统(HDFS)进行操作。实验涉及的主要步骤包括Eclipse的安装、...
本文将详细讲解如何使用Java API来操作HDFS,特别是创建目录的功能。我们将探讨Hadoop的环境配置、HDFS API的使用以及具体创建目录的步骤。 首先,理解Hadoop的环境配置至关重要。在进行Java编程之前,你需要确保...
#### 三、利用Java API操作HDFS 在Java程序中操作HDFS文件主要依赖于`org.apache.hadoop.fs.FileSystem`类,该类提供了许多方法用于执行文件系统操作,如创建文件、删除文件、读写文件等。 ##### 1. 创建文件系统...
本资料主要涵盖了如何使用Eclipse环境进行Java开发,利用Hadoop的HDFS API来操作分布式文件系统。以下是对这些知识点的详细阐述: 1. **HDFS API**:HDFS API是Hadoop的核心组件之一,提供了对分布式文件系统的基本...
本文将详细介绍HDFS的Java API访问方式实例代码,包括获取HDFS文件系统、创建文件目录、删除文件或者文件目录、获取目录下所有文件、将文件上传至HDFS等操作。 一、获取HDFS文件系统 要访问HDFS文件系统,首先需要...
HDFS Java API是Hadoop提供的一套接口,允许开发者通过编写Java代码来与HDFS进行交互。以下是一些关键的HDFS Java API知识点: 1. **初始化HDFS客户端**:使用`Configuration`类配置HDFS连接参数,如NameNode地址。...
这个实例中,可能包含多个这样的操作示例,展示了如何在Java环境中灵活地使用SparkSQL对HDFS上的数据进行各种查询和处理。对于初学者来说,这样的实例能提供很好的学习素材,帮助他们理解SparkSQL与HDFS的结合使用,...
本教程将详细介绍如何利用Java API进行HDFS的基础操作,包括创建目录、上传文件、下载文件、删除文件、重命名文件以及列举文件等。 首先,进行HDFS操作前,我们需要在项目中引入Hadoop的相关依赖,通常是`hadoop-...
WebHDFS是HDFS的一个RESTful接口,它允许通过HTTP协议进行文件操作,从而支持多种编程语言,而不仅仅是Java。这个Java客户端库则是专门为了方便Java开发者利用WebHDFS进行文件系统操作。 1. **WebHDFS API**: Web...
本教程将详细讲解如何使用Java API进行HDFS上的文件存取操作,包括文件上传和下载,以及客户端的基本使用。 一、HDFS简介 HDFS是Apache Hadoop项目的核心部分,设计用于处理海量数据,通过在廉价硬件上实现数据的...
实验二:“熟悉常用的HDFS操作”旨在帮助学习者深入理解Hadoop分布式文件系统(HDFS)在大数据处理中的核心地位,以及如何通过Shell命令和Java API进行高效操作。HDFS在Hadoop架构中扮演着存储大数据的核心角色,为...
2. 使用HDFS API:对于Java应用程序,可以使用Hadoop的FSDataOutputStream类,通过创建一个FileSystem实例,然后调用`create()`方法来上传文件。 三、HDFS的文件下载 1. 命令行工具:使用`hadoop fs -get`命令将...
本文将详细讲解如何使用Java API进行HDFS的基本操作,包括增、删、查等常见任务。 首先,理解HDFS是至关重要的。HDFS是Apache Hadoop项目的一部分,是一个高度容错性的分布式文件系统,设计用于跨大量廉价硬件节点...
这些JAR包包含了Hadoop的客户端API,使得开发者能够在Java应用程序中实现对HDFS的读写和其他文件管理操作。 HDFS的IO操作API主要包括以下关键类和接口: 1. **FileSystem**: 这是所有文件系统操作的基础接口,包括...
总之,使用Hadoop的Java API上传文件到HDFS是一个相对简单的过程,主要涉及配置、文件系统的获取、目录检查以及数据的读写操作。理解这些概念和步骤对于任何处理Hadoop相关任务的开发人员都至关重要。
Java作为广泛使用的编程语言,提供了丰富的API来操作HDFS,使得开发者能够方便地进行文件的读取、写入、复制、移动等操作。本文将详细讲解如何使用Java对HDFS进行文件操作,并介绍两个相关的项目示例。 首先,Java...