`
zcwfeng
  • 浏览: 106605 次
  • 性别: Icon_minigender_1
  • 来自: 吉林
社区版块
存档分类
最新评论

sqoop 的使用

 
阅读更多

(读这篇文章:bingdongguke的提醒。有可能有的文件链接下载不了,可以到apache自行找链接与版本)

下载sqoop

[root@localhost download]# wgethttp://cloud.github.com/downloads/cloudera/sqoop/sqoop-1.3.0.tar.gz

wgethttp://labs.mop.com/apache-mirror//incubator/sqoop/stable/sqoop-1.4.1-incubating__hadoop-0.20.tar.gz

同步各个节点时间

详见http://zhengzhuangjie.iteye.com/blog/1409336

有可能需要把hadoop-core-0.20.203.0.jar和commons-collections-3.2.1.jar拷贝到sqoop-1.4.1-incubating__hadoop-0.20/lib下面

设置要访问的远程MySQL服务器权限(10.0.30.67):

[root@localhost ~]# mysql -uroot -p123456

mysql>GRANT ALL PRIVILEGES ON *.* TO'root'@'%'WITH GRANT OPTION

Query OK, 0 rows affected (0.01 sec)

配置sqoop.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

<!-- Put Sqoop-specific properties in this file. -->

<configuration>

<!--
Set the value of this property to explicitly enable third-party
ManagerFactory plugins.

If this is not used, you can alternately specify a set of ManagerFactories
in the $SQOOP_CONF_DIR/managers.d/ subdirectory. Each file should contain
one or more lines like:
manager.class.name[=/path/to/containing.jar]

Files will be consulted in lexicographical order only if this property
is unset.
-->

<property>
<name>sqoop.connection.factories</name>
<value>com.cloudera.sqoop.manager.DefaultManagerFactory</value>
<description>A comma-delimited list of ManagerFactory implementations
which are consulted, in order, to instantiate ConnManager instances
used to drive connections to databases.
</description>
</property>


<!--
Set the value of this property to enable third-party tools.

If this is not used, you can alternately specify a set of ToolPlugins
in the $SQOOP_CONF_DIR/tools.d/ subdirectory. Each file should contain
one or more lines like:
plugin.class.name[=/path/to/containing.jar]

Files will be consulted in lexicographical order only if this property
is unset.
-->
<!--
<property>
<name>sqoop.tool.plugins</name>
<value></value>
<description>A comma-delimited list of ToolPlugin implementations
which are consulted, in order, to register SqoopTool instances which
allow third-party tools to be used.
</description>
</property>
-->

<!--
By default, the Sqoop metastore will auto-connect to a local embedded
database stored in ~/.sqoop/. To disable metastore auto-connect, uncomment
this next property.
-->

<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>false</value>
<description>If true, Sqoop will connect to a local metastore
for job management when no other metastore arguments are
provided.
</description>
</property>


<!--
The auto-connect metastore is stored in ~/.sqoop/. Uncomment
these next arguments to control the auto-connect process with
greater precision.
-->

<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:mysql://10.0.30.67/cncloud?useUnicode=true&amp;characterEncoding=utf-8</value>
<description>The connect string to use when connecting to a
job-management metastore. If unspecified, uses ~/.sqoop/.
You can specify a different path here.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>root</value>
<description>The username to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value>123456</value>
<description>The password to bind to the metastore.
</description>
</property>


<!--
For security reasons, by default your database password will not be stored in
the Sqoop metastore. When executing a saved job, you will need to
reenter the database password. Uncomment this setting to enable saved
password storage. (INSECURE!)
-->

<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>

<!--
SERVER CONFIGURATION: If you plan to run a Sqoop metastore on this machine,
you should uncomment and set these parameters appropriately.

You should then configure clients with:
sqoop.metastore.client.autoconnect.url =
jdbc:hsqldb:hsql://&lt;server-name&gt;:&lt;port&gt;/sqoop
-->

<property>
<name>sqoop.metastore.server.location</name>
<value>/tmp/sqoop-metastore/shared.db</value>
<description>Path to the shared metastore database files.
If this is not set, it will be placed in ~/.sqoop/.
</description>
</property>

<property>
<name>sqoop.metastore.server.port</name>
<value>16000</value>
<description>Port that this metastore should listen on.
</description>
</property>


</configuration>

[root@nn0001 bin]# ./sqoop help
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_HOME to the root of your Hadoop installation.
[root@nn0001 bin]# vim /etc/profile

HADOOP_HOME=/download/hadoop-0.20.203.0
HBASE_HOME=/download/hbase-0.90.5

export HADOOP_HOME
export HBASE_HOME

[root@nn0001 bin]# source /etc/profile

[root@nn0001 bin]# ./sqoop import --help

sqoop导入mysql数据运行过程中依赖mysql-connector-java-*.jar,所以你需要下载mysql-connector-java-*.jar并复制到sqoop-1.4.1-incubating__hadoop-0.20/lib中。

[root@nn0001 bin]# ./sqoop list-tables --connect jdbc:mysql://10.0.30.67/cncloud --username root
12/02/29 11:32:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
T_AUTHORITY
T_GROUP
T_GROUP_SHARE
T_GROUP_USER
T_HFILE
T_OLINK
T_OLINK_HFILE
T_RESOURCE
T_RESOURCE_AUTHORITY
T_ROLE
T_ROLE_AUTHORITY
T_USER
T_USER_ROLE
T_USER_SHARE

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --tablecncloud
12/02/29 11:19:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/02/29 11:19:29 INFO tool.CodeGenTool: Beginning code generation
12/02/29 11:19:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cncloud` AS t LIMIT 1
12/02/29 11:19:29 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table 'cncloud.cncloud' doesn't exist
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table 'cncloud.cncloud' doesn't exist
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:936)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
at com.mysql.jdbc.Connection.execSQL(Connection.java:3283)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1332)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1467)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:487)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:496)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:194)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:178)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:114)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1235)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1060)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
12/02/29 11:19:29 ERROR tool.ImportTool: Imported Failed: Attempted to generate class with no columns!

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --tableT_USER

12/02/29 11:23:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/02/29 11:23:44 INFO tool.CodeGenTool: Beginning code generation
12/02/29 11:23:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `T_USER` AS t LIMIT 1
12/02/29 11:23:46 INFO orm.CompilationManager: HADOOP_HOME is /download/hadoop-0.20.203.0/bin/..
Note: /tmp/sqoop-root/compile/3e41a89256b74520169e9f08b714538c/T_USER.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/02/29 11:23:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/3e41a89256b74520169e9f08b714538c/T_USER.jar
12/02/29 11:23:59 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/02/29 11:23:59 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/02/29 11:23:59 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/02/29 11:23:59 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/02/29 11:23:59 INFO mapreduce.ImportJobBase: Beginning import of T_USER
12/02/29 11:24:02 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 0 time(s).
12/02/29 11:24:03 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 1 time(s).
12/02/29 11:24:04 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 2 time(s).
12/02/29 11:24:05 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 3 time(s).
12/02/29 11:24:06 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 4 time(s).
12/02/29 11:24:07 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 5 time(s).
12/02/29 11:24:08 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 6 time(s).
12/02/29 11:24:09 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 7 time(s).
12/02/29 11:24:10 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 8 time(s).
12/02/29 11:24:11 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 9 time(s).
12/02/29 11:24:11 ERROR tool.ImportTool: Encountered IOException running import job: java.net.ConnectException: Call to nn0001/10.0.30.235:9001 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1057)
at org.apache.hadoop.ipc.Client.call(Client.java:1033)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:460)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:454)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:437)
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:477)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:475)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:464)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:414)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:527)
at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:187)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164)
at org.apache.hadoop.ipc.Client.call(Client.java:1010)
... 26 more

如果执行失败,再次执行,会出现错误:

12/02/29 11:27:04 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.java to /download/sqoop-1.4.1-incubating__hadoop-0.20/bin/./T_USER.java
java.io.IOException: Destination '/download/sqoop-1.4.1-incubating__hadoop-0.20/bin/./T_USER.java' already exists
at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

使用./hadoop dfs -rmr xxx删除即可

[root@nn0001 bin]# rm /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.java
[root@nn0001 bin]# rm /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.class
[root@nn0001 bin]# rm /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.jar

引起上面问题的原因是因为没有启动mapreduce

[root@nn0001 bin]# ./start-mapred.sh
starting jobtracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-jobtracker-nn0001.out
dn0001: starting tasktracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-tasktracker-dn0001.out
dn0003: starting tasktracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-tasktracker-dn0003.out
dn0002: starting tasktracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-tasktracker-dn0002.out

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER
12/02/29 12:37:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/02/29 12:37:09 INFO tool.CodeGenTool: Beginning code generation
12/02/29 12:37:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `T_USER` AS t LIMIT 1
12/02/29 12:37:10 INFO orm.CompilationManager: HADOOP_HOME is /download/hadoop-0.20.203.0/bin/..
Note: /tmp/sqoop-root/compile/7354a8be9b8fae6811150fd2fffac7a0/T_USER.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/02/29 12:37:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/7354a8be9b8fae6811150fd2fffac7a0/T_USER.jar
12/02/29 12:37:21 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/02/29 12:37:21 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/02/29 12:37:21 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/02/29 12:37:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/02/29 12:37:21 INFO mapreduce.ImportJobBase: Beginning import of T_USER
12/02/29 12:37:30 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`ID`), MAX(`ID`) FROM `T_USER`
12/02/29 12:37:30 WARN db.TextSplitter: Generating splits for a textual index column.
12/02/29 12:37:30 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
12/02/29 12:37:30 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
12/02/29 12:37:32 INFO mapred.JobClient: Running job: job_201202291233_0001
12/02/29 12:37:33 INFO mapred.JobClient: map 0% reduce 0%
12/02/29 12:38:22 INFO mapred.JobClient: map 25% reduce 0%
12/02/29 12:38:25 INFO mapred.JobClient: map 50% reduce 0%
12/02/29 12:38:35 INFO mapred.JobClient: map 75% reduce 0%
12/02/29 12:38:37 INFO mapred.JobClient: map 100% reduce 0%
12/02/29 12:38:48 INFO mapred.JobClient: Job complete: job_201202291233_0001
12/02/29 12:38:49 INFO mapred.JobClient: Counters: 14
12/02/29 12:38:49 INFO mapred.JobClient: Job Counters
12/02/29 12:38:49 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=104029
12/02/29 12:38:49 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/29 12:38:49 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/02/29 12:38:49 INFO mapred.JobClient: Launched map tasks=4
12/02/29 12:38:49 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/02/29 12:38:49 INFO mapred.JobClient: File Output Format Counters
12/02/29 12:38:49 INFO mapred.JobClient: Bytes Written=928
12/02/29 12:38:49 INFO mapred.JobClient: FileSystemCounters
12/02/29 12:38:49 INFO mapred.JobClient: HDFS_BYTES_READ=701
12/02/29 12:38:49 INFO mapred.JobClient: FILE_BYTES_WRITTEN=125482
12/02/29 12:38:49 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=928
12/02/29 12:38:49 INFO mapred.JobClient: File Input Format Counters
12/02/29 12:38:49 INFO mapred.JobClient: Bytes Read=0
12/02/29 12:38:49 INFO mapred.JobClient: Map-Reduce Framework
12/02/29 12:38:49 INFO mapred.JobClient: Map input records=3
12/02/29 12:38:49 INFO mapred.JobClient: Spilled Records=0
12/02/29 12:38:49 INFO mapred.JobClient: Map output records=3
12/02/29 12:38:49 INFO mapred.JobClient: SPLIT_RAW_BYTES=701
12/02/29 12:38:49 INFO mapreduce.ImportJobBase: Transferred 928 bytes in 86.7251 seconds (10.7005 bytes/sec)
12/02/29 12:38:49 INFO mapreduce.ImportJobBase: Retrieved 3 records.

[root@nn0001 bin]# /download/hadoop-0.20.203.0/bin/hadoop dfs -lsr /

drwxr-xr-x - root supergroup 0 2012-02-29 12:37 /user
drwxr-xr-x - root supergroup 0 2012-02-29 12:37 /user/root
drwxr-xr-x - root supergroup 0 2012-02-29 12:38 /user/root/T_USER
-rw-r--r-- 3 root supergroup 0 2012-02-29 12:38 /user/root/T_USER/_SUCCESS
drwxr-xr-x - root supergroup 0 2012-02-29 12:37 /user/root/T_USER/_logs
drwxr-xr-x - root supergroup 0 2012-02-29 12:37 /user/root/T_USER/_logs/history
-rw-r--r-- 3 root supergroup 12907 2012-02-29 12:37 /user/root/T_USER/_logs/history/job_201202291233_0001_1330490252595_root_T%5FUSER.jar
-rw-r--r-- 3 root supergroup 26958 2012-02-29 12:37 /user/root/T_USER/_logs/history/job_201202291233_0001_conf.xml
-rw-r--r-- 3 root supergroup 311 2012-02-29 12:38 /user/root/T_USER/part-m-00000
-rw-r--r-- 3 root supergroup 0 2012-02-29 12:38 /user/root/T_USER/part-m-00001
-rw-r--r-- 3 root supergroup 305 2012-02-29 12:38 /user/root/T_USER/part-m-00002
-rw-r--r-- 3 root supergroup 312 2012-02-29 12:38 /user/root/T_USER/part-m-00003

上面是演示如何把数据的数据导入到hdfs中,下面演示如何把hdfs中的数据导入到mysql

如果新的服务器上有数据库和相应的表,则不需要新建,否则要先新建数据库及相应的表

因为数据是从所有的datanode上获取并导入到mysql中,所以mysql需要设置允许所有ip客户端访问,如上面设置

同时表里面不能有跟导入的数据id相同的记录,否则会报错

[root@nn0001 bin]# ./sqoop export --connect jdbc:mysql://10.0.30.235/cncloud --export-dir/user/root/T_ROLE--username root --table T_ROLE

注意:红色字体是hdfs上面的路径,不是本地路径

下面引自http://thomas0988.iteye.com/blog/1383463

Sqoop 1.3 命令

序号 命令/command 类说明
1 impor ImportTool从关系型数据库中导入数据(来自表或者查询语句)到HDFS中
2 export ExportTool 将HDFS中的数据导入到关系型数据库中
3 codegen CodeGenTool 获取数据库中某张表数据生成Java并打成jar包
4 create-hive-tableCreateHiveTableTool 创建Hive表
5eval EvalSqlTool 查看SQL执行结果
6 import-all-tables ImportAllTablesTool导入某个数据库下所有表到HDFS中
7 job JobTool
8 list-databases ListDatabasesTool 列出所有数据库名
9 list-tablesListTablesTool 列出某个数据库下所有表
10 merge MergeTool
11 metastore MetastoreTool
12help HelpTool 查看帮助
13 version VersionTool 查看版本

接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments,Incremental import arguments,Output line formatting arguments,Input parsing arguments,Hive arguments,HBase arguments,Generic Hadoop command-line arguments,下面一一说明:
1.Common arguments 通用参数,主要是针对关系型数据库链接的一些参数


序号 参数 说明 样例
1connect 连接关系型数据库的URLjdbc:mysql://localhost/sqoop_datas
2connection-manager 连接管理类,一般不用
3driver 连接驱动
4 hadoop-homehadoop目录/home/guoyun/hadoop
5 help 查看帮助信息
6password 连接关系型数据库的密码
7username链接关系型数据库的用户名
8 verbose 查看更多的信息,其实是将日志级别调低该参数后面不接值
===========================================
sqoop 导入数据 报 java heap space 错误
使用sqoop导入数据时,当数据量变大时,在map/reduce的过程中就会提示java heap space error。经过总结,解决方法有两个:
1、 修改每个运行子进程的jvm大小
修改mapred-site.xml文件,添加以下属性:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx512M</value>
</property>
<property>
<name>mapred.reduce.child.java.opts</name>
<value>-Xmx512M</value>
</property>
<property>
<name>mapred.map.child.java.opts</name>?
<value>-Xmx512M</value>
</property>

2、 增加map数量,通过sqoop -m 选项指定更多的map。通过更多的map,降少每个子进程占用的heap space,避免超出hadoop设置的java heap space 大小
sqoop ... -m <map 数量>

导入hive

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER --hive-import

12/03/01 14:54:03 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot run program "hive": java.io.IOException: error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at java.lang.Runtime.exec(Runtime.java:593)
at java.lang.Runtime.exec(Runtime.java:509)
at org.apache.sqoop.util.Executor.exec(Executor.java:76)
at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:344)
at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:297)
at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:392)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 15 more

出现这个错误主要是因为hive没有配置环境变量,hadoop找不到程序路径导致的

在/etc/profile配置

HIVE_HOME=/download/hive-0.8.1

export HIVE_HOME

上面的命令似乎把数据库的数据导入到Hadoop的hdfs中(因为 导入后我在hive中没有找到相应的表,在hadoop中却又相应的数据,但是当我使用下面的命令导入另外一个表时,却发现之前的T_ROLE表已经在 hive中了,而且还有数据,而T_USER只是创建了一个表而已,没有相应的数据记录,求解?)

[root@nn0001 bin]# ./sqoop create-hive-table --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER --hive-table T_USER

12/03/01 16:04:43 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
12/03/01 16:04:43 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
12/03/01 16:04:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/03/01 16:04:44 INFO hive.HiveImport: Removing temporary files from import process: T_USER/_logs
12/03/01 16:04:44 INFO hive.HiveImport: Loading uploaded data into Hive
12/03/01 16:04:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `T_USER` AS t LIMIT 1
12/03/01 16:04:45 WARN hive.TableDefWriter: Column LOGIN_FAILED_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:45 WARN hive.TableDefWriter: Column LAST_LOGIN_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:45 WARN hive.TableDefWriter: Column UPDATED_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:45 WARN hive.TableDefWriter: Column CREATED_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:49 INFO hive.HiveImport: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
12/03/01 16:04:51 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/download/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
12/03/01 16:04:51 INFO hive.HiveImport: Hive history file=/tmp/root/hive_job_log_root_201203011604_1032827888.txt
12/03/01 16:05:17 INFO hive.HiveImport: OK
12/03/01 16:05:17 INFO hive.HiveImport: Time taken: 24.125 seconds
12/03/01 16:05:17 INFO hive.HiveImport: Hive import complete.

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER --hive-import --hive-overwrite

这个命令又能奇迹般把数据导入到hive中,估计是上面的操作有什么错误。

导出数据到MySQL,当然数据库表要先存在,否则会报错:

[root@nn0001 bin]# ./sqoop export --connect jdbc:mysql://10.0.30.235/cncloud --username root --table T_USER --export-dir /user/hive/warehouse/t_user

java.util.NoSuchElementException
at java.util.AbstractList$Itr.next(AbstractList.java:350)
at T_USER.__loadFromFields(T_USER.java:553)
at T_USER.parse(T_USER.java:502)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)

此错误的原因为sqoop解析文件的字段与MySql数据库的表的字段对应不上造成的。因此需要在执行的时候给sqoop增加参数,告诉sqoop文件的分隔符,使它能够正确的解析文件字段。hive默认的字段分隔符为'\001'

[root@nn0001 bin]# ./sqoop export --connect jdbc:mysql://10.0.30.235/cncloud --username root --table T_USER --export-dir /user/hive/warehouse/t_user --input-fields-terminated-by '\t'

官方详细英文文档:

http://incubator.apache.org/sqoop/docs/1.4.1-incubating/SqoopUserGuide.html

分享到:
评论

相关推荐

    Sqoop使用手册(高清书签)

    Sqoop使用MapReduce进行数据传输,其数据传输工作是在后台由一个或多个MapReduce作业完成的。这个特性使得Sqoop能够利用Hadoop的并行处理能力以及容错机制,进行大规模数据的导入导出操作。 本文档适用于Sqoop ...

    sqoop使用参考.txt

    sqoop使用参考是自己在学习和开发中用到的进行总结和汇总,方便大家做为参考.同时文内容不断的在更新中

    sqoop使用.txt

    sqoop数据导出工具

    sqoop使用的demo

    大数据sqoop从RDBMS中向Hadoop的hdfs中进行数据的转移工具。

    sqoop-1.4.7(可直接下载学习使用)附有安装配置教程!

    安装配置 sqoop 链接:...适用人群:Sqoop 1.4.7 适用于以下人群:数据工程师:数据工程师可以使用 Sqoop 进行大规模数据的迁移和转换,以支持数据分析、数

    sqoop 使用手册

    通过这份Sqoop手册,你可以深入理解 Sqoop 的使用方法,掌握如何配置和调优 Sqoop 任务,以及如何解决常见问题。对于需要在 Hadoop 和 RDBMS 之间进行数据交换的场景,Sqoop 是一个非常实用的工具。

    sqoop1使用必备

    ### Sqoop 使用必备知识点 #### 一、概述 Sqoop 是一款开源的工具,主要用于在 Hadoop 和关系型数据库服务器之间传输数据。它利用 MapReduce 作业并行处理数据的特性,大大提高了数据导入导出的效率。对于从事 ETL...

    sqoop连接Postgres SQL 的驱动包

    利用sqoop导出PostgresSQL的数据时所需的驱动包,只需要将该包放到hive或sqoop的lib目录下即可。

    sqoop安装与使用

    #### 三、Sqoop使用方法 ##### 常用命令: - `sqoop list-databases`:列出数据库中的所有数据库。 - `sqoop list-tables`:列出指定数据库中的所有表。 - `sqoop import`:将数据从关系型数据库导入到HDFS。 - `...

    Hadoop-Sqoop配置

    * 依赖于 JDBC 驱动包:Sqoop 需要使用 JDBC 驱动包来连接数据源,如果 JDBC 驱动包不存在或不可用,Sqoop 将无法工作。 * 依赖于 Hadoop 环境:Sqoop 需要 Hadoop 环境来工作,如果 Hadoop 环境不可用,Sqoop 将...

    sqoop2-1.99.7 documentation 英文文档

    使用 REST API 开发 Sqoop 应用**:解释如何使用 REST API 来构建与 Sqoop 集成的应用程序。 **5. 使用 Sqoop Java Client API 开发 Sqoop 应用**:介绍如何利用 Sqoop 的 Java 客户端 API 来构建应用程序。 #### ...

    Sqoop学习文档(1){Sqoop基本概念、Sqoop的安装配置}.docx

    5. **测试连接**:使用 `bin/sqoop list-databases --connect jdbc:mysql://master:3306/ --username root --password root` 命令,检查 Sqoop 是否能成功连接到 MySQL 数据库,并列出所有数据库。 ### Sqoop ...

    Sqoop-linux.zip

    Sqoop 是一个在 Linux 环境下广泛使用的工具,用于在关系数据库和 Apache Hadoop 之间高效地导入和导出数据。这个压缩包“Sqoop-linux.zip”包含两个版本的 Sqoop 安装包:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar....

    sqoop-javabean-sqoop脚本分隔符失效问题

    默认情况下, Sqoop使用制表符(\t)作为字段分隔符,但你可以根据需要自定义。问题可能出在以下几个方面: 1. **命令行参数设置**:确保你在运行Sqoop命令时正确指定了`--fields-terminated-by`选项。例如,如果...

    Sqoop安装与使用

    Sqoop 安装与使用 Sqoop 是一款方便的在传统型数据库与 Hadoop 之间进行数据迁移的工具,充分利用 MapReduce 并行特点以批处理的方式加快数据传输。Sqoop 工具是 Hadoop 下连接关系型数据库和 Hadoop 的桥梁,支持...

    sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.zip

    3. 使用 `sqoop list-databases` 和 `sqoop list-tables` 查看可导入的数据库和表。 4. 使用 `sqoop import` 命令将数据导入 HDFS。 5. 使用 `sqoop export` 命令将数据导出到数据库。 Sqoop 的出现使得 Hadoop ...

    sqoop-1.4.7.tar.gz.zip

    用户可以使用 SQL 查询语句从数据库中选择要导入的数据,然后 Sqoop 将这些数据分成多个小块,每个小块由一个 MapReduce 任务处理。导出时,Sqoop 可以将 HDFS 中的数据写入到数据库,同样利用 MapReduce 的并行处理...

    Apache Sqoop Cookbook

    总的来说,这本书是Apache Sqoop使用者的宝贵参考资料,无论是初学者还是有经验的开发者,都能从中获得实用的技巧和深入的理解,从而更有效地管理和迁移大数据。通过实践书中的例子,读者可以提升在Hadoop生态系统中...

Global site tag (gtag.js) - Google Analytics