连接建好 就开始建Job了
sqoop:000> create job --xid 1 --type import
Creating job for connection with id 1
Please fill following values to create new job object
Name: dimDate
Database configuration
Schema name: dbo
Table name: dimDate
Table SQL statement:
Table column names:
Partition column name:
Nulls in partition column:
Boundary query:
Output configuration
Storage type:
0 : HDFS
Choose: 0
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
Choose: 0
Output directory: /home/dimDate
Throttling resources
Extractors:
Loaders:
New job was successfully created with validation status FINE and persistent id 1
-------------------------------------------------------------
跑job
sqoop:000> start job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-03-20 07:01:12 PDT
Lastly updated by: root
External ID: job_1395223907193_0001
http://localhost.localdomain:8088/proxy/application_1395223907193_0001/
2014-03-20 07:01:12 PDT: BOOTING - Progress is not available
查看 http://XXXXXXX:8088/proxy/application_1395223907193_0001/
发觉job 在跑
使用命令也可以查看状态
sqoop:000> status job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-03-20 07:01:12 PDT
Lastly updated by: root
External ID: job_1395223907193_0001
http://localhost.localdomain:8088/proxy/application_1395223907193_0001/
2014-03-20 07:03:31 PDT: RUNNING - 0.00 %
过些时候再查
sqoop:000> status job --jid 1
Submission details
Job ID: 1
Server URL: http://localhost:12000/sqoop/
Created by: root
Creation date: 2014-03-20 07:01:12 PDT
Lastly updated by: root
External ID: job_1395223907193_0001
http://localhost.localdomain:8088/proxy/application_1395223907193_0001/
2014-03-20 07:04:46 PDT: SUCCEEDED
Counters:
org.apache.hadoop.mapreduce.JobCounter
SLOTS_MILLIS_MAPS: 772203
MB_MILLIS_MAPS: 790735872
TOTAL_LAUNCHED_MAPS: 10
MILLIS_MAPS: 772203
VCORES_MILLIS_MAPS: 772203
SLOTS_MILLIS_REDUCES: 0
OTHER_LOCAL_MAPS: 10
org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
BYTES_WRITTEN: 129332
org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
BYTES_READ: 0
org.apache.hadoop.mapreduce.TaskCounter
MAP_INPUT_RECORDS: 0
MERGED_MAP_OUTPUTS: 0
PHYSICAL_MEMORY_BYTES: 612765696
SPILLED_RECORDS: 0
COMMITTED_HEAP_BYTES: 161021952
CPU_MILLISECONDS: 8390
FAILED_SHUFFLE: 0
VIRTUAL_MEMORY_BYTES: 3890085888
SPLIT_RAW_BYTES: 1391
MAP_OUTPUT_RECORDS: 1188
GC_TIME_MILLIS: 2962
org.apache.hadoop.mapreduce.FileSystemCounter
FILE_WRITE_OPS: 0
FILE_READ_OPS: 0
FILE_LARGE_READ_OPS: 0
FILE_BYTES_READ: 0
HDFS_BYTES_READ: 1391
FILE_BYTES_WRITTEN: 934750
HDFS_LARGE_READ_OPS: 0
HDFS_WRITE_OPS: 20
HDFS_READ_OPS: 40
HDFS_BYTES_WRITTEN: 129332
org.apache.sqoop.submission.counter.SqoopCounters
ROWS_READ: 1188
Job executed successfully
应该成功了去HDFS 查看下
[root@localhost /]# hadoop fs -ls /home/ 14/03/20 08:17:08 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items drwxr-xr-x - root supergroup 0 2014-03-20 07:04 /home/dimDate -rw-r--r-- 1 root supergroup 511 2014-03-20 08:04 /home/dimDate.sql [root@localhost /]# hdfs dfs -ls /home/dimDate/ 14/03/20 08:17:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 11 items -rw-r--r-- 1 root supergroup 0 2014-03-20 07:04 /home/dimDate/_SUCCESS -rw-r--r-- 1 root supergroup 20748 2014-03-20 07:04 /home/dimDate/part-m-00000 -rw-r--r-- 1 root supergroup 22248 2014-03-20 07:04 /home/dimDate/part-m-00001 -rw-r--r-- 1 root supergroup 17461 2014-03-20 07:04 /home/dimDate/part-m-00002 -rw-r--r-- 1 root supergroup 25573 2014-03-20 07:04 /home/dimDate/part-m-00003 -rw-r--r-- 1 root supergroup 14132 2014-03-20 07:04 /home/dimDate/part-m-00004 -rw-r--r-- 1 root supergroup 25693 2014-03-20 07:04 /home/dimDate/part-m-00005 -rw-r--r-- 1 root supergroup 0 2014-03-20 07:04 /home/dimDate/part-m-00006 -rw-r--r-- 1 root supergroup 0 2014-03-20 07:04 /home/dimDate/part-m-00007 -rw-r--r-- 1 root supergroup 0 2014-03-20 07:04 /home/dimDate/part-m-00008 -rw-r--r-- 1 root supergroup 3477 2014-03-20 07:04 /home/dimDate/part-m-00009
ok了 把dimDate 分成了好多小文件
可以用
hdfs dfs -cat /home/dimDate/part-m-00001
看下就知道了
关于增量的问题. 做找了下似乎没法做 详见
Sqoop 2 currently can't do incremental imports, however implementing this >> feature is definitely in our plan!
只有等了
去看了下 /home/sqoop-1.99.3/@LOGDIR@/spooq.log 发现有错误
Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: Call From localhost.admin/127.0.0.1 to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:212) at com.sun.proxy.$Proxy28.registerApplicationMaster(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.AMRMProtocolPBClientImpl.registerApplicationMaster(AMRMProtocolPBClientImpl.java:100) ... 12 more
说明Hadoop2.X的端口指向还没弄好, 需要去设置下 core-site.xml
<property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property>
相关推荐
apache-atlas-2.3.0-hbase-hook.tar.gz Apache Atlas 框架是一组可扩展的核心基础治理服务,使企业能够有效且高效地满足 Hadoop 内的合规性要求,并允许与整个企业数据生态系统集成。这将通过使用规范和取证模型、...
Apache Sqoop 是另一个重要的组件,它是用来高效地在关系型数据库和Hadoop之间导入导出数据的工具。而Zookeeper则是Apache的一个分布式协调服务,它为分布式应用程序提供了可靠的同步、命名服务等。 `apache-atlas-...
在IT行业中,我们经常涉及到各种库和框架的集成与使用,这次我们关注的是"Atlas2.3.0"依赖的组件:"org.restlet/sqoop-1.4.6.2.3.99.0-195"。这个依赖包含了三个关键的JAR文件:`sqoop-1.4.6.2.3.99.0-195.jar`,`...
在大数据处理领域,Oozie是一个非常重要的工作流调度系统,它被广泛应用于Hadoop生态系统中,用于管理和调度Hadoop相关的任务,如MapReduce、Pig、Hive、Sqoop等。Oozie的核心功能是协调工作流程,监控作业状态,并...
软件版本列表包括了搭建Hadoop生态系统所需的关键组件版本,例如JDK 1.7.0_67、MySQL 5.1、Hadoop 2.3.0、HBase 0.96、Hive 0.12、SolrCloud 4.4、Storm 0.92、Kafka 2.8.0、Flume-ng 1.4.0、Sqoop 1.4.4等。...
例如,Hadoop版本为3.1.2,Zookeeper为3.5.5,Hive为3.3.1,HBase为2.2.0,Spark为2.4.3,Flume为1.9.0,Sqoop为1.4.7,Kafka为2.12-2.3.0,以及Storm为2.0.4。每个组件的配置文件(如.bashrc)也进行了设置,以确保...
* Ambari 的作用是创建、管理、监视 Hadoop 的集群,包括 Hadoop 整个生态圈(例如 Hive、HBase、Sqoop、Zookeeper 等)。 * Ambari 现在所支持的平台组件也越来越多,例如流行的 Spark、Storm 等计算框架,以及资源...
使用Python的爬虫技术可以自动化地从中国电影网等平台抓取历史票房数据。这些数据包括但不限于电影名称、上映日期、首日票房、每日票房变化等,为后续的分析和预测提供了基础。爬虫技术通常会涉及到如BeautifulSoup...
开发环境:IDEA+Pycharm+Python3+hadoop2.8+hive2.3.0+mysql5.7+sqoop+spark 1.数据采集(pachong.py)、预处理: 采集豆瓣电影top250电影信息,采集电影名称、电影简介、电影评分、其他信息、电影连接等字段,抓取...
开发环境:IDEA+Pycharm+Python3+hadoop2.8+hive2.3.0+mysql5.7+sqoop+spark 1.数据采集(pachong.py)、预处理: 采集豆瓣电影top250电影信息,采集电影名称、电影简介、电影评分、其他信息、电影连接等字段,抓取...