- 浏览: 56657 次
- 性别:
- 来自: 广州
文章分类
最新评论
[hadoop@hadoopmaster test]$ hadoop distcp hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir
15/11/18 05:39:30 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:39:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:39:31 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0001
15/11/18 05:39:32 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0001/
15/11/18 05:39:33 INFO tools.DistCp: DistCp job-id: job_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: Running job: job_1447853441917_0001
15/11/18 05:39:41 INFO mapreduce.Job: Job job_1447853441917_0001 running in uber mode : false
15/11/18 05:39:41 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:39:48 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: Job job_1447853441917_0001 completed successfully
15/11/18 05:39:50 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216204
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1220
HDFS: Number of bytes written=24
HDFS: Number of read operations=31
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10356
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10356
Total vcore-seconds taken by all map tasks=10356
Total megabyte-seconds taken by all map tasks=10604544
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=156
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=342798336
Virtual memory (bytes) snapshot=1753182208
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=924
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db/test1
Found 1 items
-rw-r--r-- 1 hadoop supergroup 24 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1/test.body
[hadoop@hadoopmaster test]$ hadoop fs -cat /jacktest/todir/jacktest.db/test1/test.body
1,jack
2,josson
3,gavin
[hadoop@hadoopmaster test]$
hive> create table test1(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.454 seconds
hive> select * from test1;
OK
Time taken: 0.65 seconds
hive> show create table test1;
OK
CREATE TABLE `test1`(
`id` int,
`name` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db/test1'
TBLPROPERTIES (
'transient_lastDdlTime'='1447853584')
Time taken: 0.152 seconds, Fetched: 13 row(s)
[hadoop@hadoopmaster test]$ vi test.body
1,jack
2,josson
3,gavin
关于协议
如果两个集群间的版本不一致,那么使用hdfs可能就会产生错误,因为rpc系统不兼容。那么这时候你可以使用基于http协议的hftp协议,但目标地址还必须是hdfs的,象这样:
hadoop distcp hftp://namenode:50070/user/hadoop/input hdfs://namenode:9000/user/hadoop/input1
推荐用hftp的替代协议webhdfs,源地址和目标地址都可以使用webhdfs,可以完全兼容
hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop fs -mkdir /jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:44:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:44:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:44:33 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db doesn't exist
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:45:10 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:45:10 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:45:11 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0002
15/11/18 05:45:11 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0002/
15/11/18 05:45:12 INFO tools.DistCp: DistCp job-id: job_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: Running job: job_1447853441917_0002
15/11/18 05:45:18 INFO mapreduce.Job: Job job_1447853441917_0002 running in uber mode : false
15/11/18 05:45:18 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:45:24 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: Job job_1447853441917_0002 completed successfully
15/11/18 05:45:26 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216208
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1200
HDFS: Number of bytes written=24
HDFS: Number of read operations=25
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
HFTP: Number of bytes read=0
HFTP: Number of bytes written=0
HFTP: Number of read operations=0
HFTP: Number of large read operations=0
HFTP: Number of write operations=0
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10014
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10014
Total vcore-seconds taken by all map tasks=10014
Total megabyte-seconds taken by all map tasks=10254336
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=104
CPU time spent (ms)=2240
Physical memory (bytes) snapshot=345600000
Virtual memory (bytes) snapshot=1751683072
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=928
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$
15/11/18 05:39:30 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:39:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:39:31 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:39:31 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:39:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0001
15/11/18 05:39:32 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0001/
15/11/18 05:39:33 INFO tools.DistCp: DistCp job-id: job_1447853441917_0001
15/11/18 05:39:33 INFO mapreduce.Job: Running job: job_1447853441917_0001
15/11/18 05:39:41 INFO mapreduce.Job: Job job_1447853441917_0001 running in uber mode : false
15/11/18 05:39:41 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:39:48 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:39:50 INFO mapreduce.Job: Job job_1447853441917_0001 completed successfully
15/11/18 05:39:50 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216204
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1220
HDFS: Number of bytes written=24
HDFS: Number of read operations=31
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10356
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10356
Total vcore-seconds taken by all map tasks=10356
Total megabyte-seconds taken by all map tasks=10604544
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=156
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=342798336
Virtual memory (bytes) snapshot=1753182208
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=924
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1
[hadoop@hadoopmaster test]$ hadoop fs -ls /jacktest/todir/jacktest.db/test1
Found 1 items
-rw-r--r-- 1 hadoop supergroup 24 2015-11-18 05:39 /jacktest/todir/jacktest.db/test1/test.body
[hadoop@hadoopmaster test]$ hadoop fs -cat /jacktest/todir/jacktest.db/test1/test.body
1,jack
2,josson
3,gavin
[hadoop@hadoopmaster test]$
hive> create table test1(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.454 seconds
hive> select * from test1;
OK
Time taken: 0.65 seconds
hive> show create table test1;
OK
CREATE TABLE `test1`(
`id` int,
`name` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoopmaster:9000/user/hive/warehouse/jacktest.db/test1'
TBLPROPERTIES (
'transient_lastDdlTime'='1447853584')
Time taken: 0.152 seconds, Fetched: 13 row(s)
[hadoop@hadoopmaster test]$ vi test.body
1,jack
2,josson
3,gavin
关于协议
如果两个集群间的版本不一致,那么使用hdfs可能就会产生错误,因为rpc系统不兼容。那么这时候你可以使用基于http协议的hftp协议,但目标地址还必须是hdfs的,象这样:
hadoop distcp hftp://namenode:50070/user/hadoop/input hdfs://namenode:9000/user/hadoop/input1
推荐用hftp的替代协议webhdfs,源地址和目标地址都可以使用webhdfs,可以完全兼容
hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop fs -mkdir /jacktest/todir1
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:44:32 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:44:32 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:44:33 ERROR tools.DistCp: Invalid input:
org.apache.hadoop.tools.CopyListing$InvalidInputException: hftp://hadoopmaster:9000/user/hive/warehouse/jacktest.db doesn't exist
at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:84)
at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:353)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:160)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:121)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:401)
[hadoop@hadoopmaster test]$ hadoop distcp hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db hdfs://hadoopmaster:9000/jacktest/todir1
15/11/18 05:45:10 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hftp://hadoopmaster:50070/user/hive/warehouse/jacktest.db], targetPath=hdfs://hadoopmaster:9000/jacktest/todir1, targetPathExists=true, preserveRawXattrs=false}
15/11/18 05:45:10 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
15/11/18 05:45:11 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
15/11/18 05:45:11 INFO client.RMProxy: Connecting to ResourceManager at hadoopmaster/192.168.1.50:8032
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: number of splits:2
15/11/18 05:45:11 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1447853441917_0002
15/11/18 05:45:11 INFO impl.YarnClientImpl: Submitted application application_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: The url to track the job: http://hadoopmaster:8088/proxy/application_1447853441917_0002/
15/11/18 05:45:12 INFO tools.DistCp: DistCp job-id: job_1447853441917_0002
15/11/18 05:45:12 INFO mapreduce.Job: Running job: job_1447853441917_0002
15/11/18 05:45:18 INFO mapreduce.Job: Job job_1447853441917_0002 running in uber mode : false
15/11/18 05:45:18 INFO mapreduce.Job: map 0% reduce 0%
15/11/18 05:45:24 INFO mapreduce.Job: map 50% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: map 100% reduce 0%
15/11/18 05:45:26 INFO mapreduce.Job: Job job_1447853441917_0002 completed successfully
15/11/18 05:45:26 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=216208
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1200
HDFS: Number of bytes written=24
HDFS: Number of read operations=25
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
HFTP: Number of bytes read=0
HFTP: Number of bytes written=0
HFTP: Number of read operations=0
HFTP: Number of large read operations=0
HFTP: Number of write operations=0
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=10014
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=10014
Total vcore-seconds taken by all map tasks=10014
Total megabyte-seconds taken by all map tasks=10254336
Map-Reduce Framework
Map input records=3
Map output records=0
Input split bytes=272
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=104
CPU time spent (ms)=2240
Physical memory (bytes) snapshot=345600000
Virtual memory (bytes) snapshot=1751683072
Total committed heap usage (bytes)=169869312
File Input Format Counters
Bytes Read=928
File Output Format Counters
Bytes Written=0
org.apache.hadoop.tools.mapred.CopyMapper$Counter
BYTESCOPIED=24
BYTESEXPECTED=24
COPY=3
[hadoop@hadoopmaster test]$
发表评论
-
spark snappy save text file
2016-08-28 21:54 1913SQL context available as sqlCon ... -
kerberos 5
2015-12-14 21:42 1398kadmin.local: listprincs K/M@J ... -
hadoop network
2015-11-21 23:09 678[root@hadoopmaster ~]# cat /etc ... -
hadoop常用命令
2015-09-19 14:58 621namenode(hdfs)+jobtracker ... -
hive之jdbc
2015-09-06 16:08 678import java.sql.Connection; imp ... -
hadoop fs, hdfs dfs, hadoop dfs科普
2015-09-06 11:14 983stackoverflow Following are th ... -
pig call hcatalog
2015-08-30 16:33 1540[hadoop@hadoopmaster ~]$ pig pi ... -
hadoop fsck
2015-08-25 22:59 514hadoop fsck Usage: DFSck < ... -
hcatalog study
2015-08-25 22:41 603https://cwiki.apache.org/conflu ... -
pig
2015-08-22 15:34 831hdfs://hadoopmaster:9000/user ... -
pig
2015-08-19 23:03 524http://blog.csdn.net/zythy/ar ... -
hadoop jar command
2015-08-16 22:57 538hadoop jar /opt/jack.jar org.a ... -
mapreduce
2015-08-16 21:56 300ChainMapper 支持多个map reduce 参 ... -
sqoop command
2015-06-06 18:54 6051. list the database sqoop ... -
maven
2015-01-15 21:39 461xxxx -
hadoop 2.5.2安装实录
2014-12-09 23:35 8401. prepare the virtual enviro ...
相关推荐
hadoop使用distcp问题解决 然后用distcp从1.0.3的集群拷数据到2.0.1的集群中。 遇到问题处理
distcp(分布式拷贝)是用于大规模集群内部和集群之间拷贝的工具。 它使用Map/Reduce实现文件分发,错误处理和恢复,以及报告生成。 它把文件和目录的列表作为map任务的输入,每个任务会完成源列表中部分文件的拷贝...
distcp一般用于在两个HDFS集群中传输数据,如果集群在hadoop的同一版本上运行,就适合使用hdfs方案: % hadoop distcp hdfs://namenode1/foo hdfs://namenode2/bar
java运行依赖jar包
写文件一致性-distcp-scp远程间复制-har”深入探讨了Hadoop的核心概念,重点讲述了文件一致性、distcp和scp命令在远程间数据复制中的应用,以及har(Hadoop Archive)文件格式的使用。下面将详细阐述这些知识点。 ...
Flink实现 Hadoop distcp
DistCp命令是hadoop用户最常使用的命令之一,它位于hadooptools包中,代码不多,约1300多行,主要用于在两个HDFS集群之间快速拷贝数据。DistCp工具代码结构清晰易懂,通过分析该工具的代码 引言 DistCp命令是...
本源码为基于Apache Spark的Spark DistCP重实现设计,共包含48个文件,其中scala文件30个,xml文件10个,md文件4个,gitignore文件2个,name文件1个,LICENSE文件1个。该项目是对Hadoop DistCP的重新实现,使用Scala...
Hadoop 2.0 生态系统第六章 数据传输DistCp
java运行依赖jar包
一个简短的演示,展示了如何使用 CLI 启动带有 Spot 实例的 EMR 集群,使用 s3distCP 复制 commonCrawl AWS 公共数据集的一部分,以及如何使用 Hadoop 示例 jar 中的 grep 实现来查找什么是大数据 AWS 公共数据集 ...
本文将深入探讨HBase跨集群迁移的方法,特别是通过DistCp工具进行迁移的步骤和注意事项。 首先,当需要进行HBase集群迁移时,通常是因为硬件升级、灾难恢复或数据中心迁移等原因。在这种情况下,一种常见的方法是...
本实验主要介绍了 HBase 数据迁移与数据备份和恢复的方法,包括使用 Sqoop 将 MySQL 数据导入到 HBase、将文本文件批量导入 HBase、使用 Hadoop DistCp 实现 HBase 的冷备份和热备份。 一、使用 Sqoop 将 MySQL ...
HDFS数据迁移是大数据平台中非常重要的一部分,本节将介绍HDFS数据迁移的两种方法:fs -cp命令和distcp命令。同时,本节还将介绍YARN的概念和组成部分,以及YARN运行任务的过程。 HDFS数据迁移 HDFS数据迁移是指将...
主要是因为hadoop的cdh5官网收费,项目下载不了了,上传我下载的到csdn方便各位下载
本文档还介绍了 DistCp 和 JindoDistCp 的知识点,包括 DistCp 的介绍、JindoDistCp 的介绍、性能优化等。DistCp 是一个分布式的文件拷贝工具,而 JindoDistCp 是阿里云 EMR 团队开发的针对 OSS 上数据迁移的拷贝...
The full dataset is stored on Amazon S3 in the hadoopbook bucket, and if you have an AWS account you can copy it to a EC2-based Hadoop cluster using Hadoop’s distcp command (run from a machine in the...