`

sqoop安装及数据迁移

阅读更多

下载sqoop

[root@localhost download]# wget http://cloud.github.com/downloads/cloudera/sqoop/sqoop-1.3.0.tar.gz

 

wget http://labs.mop.com/apache-mirror//incubator/sqoop/stable/sqoop-1.4.1-incubating__hadoop-0.20.tar.gz

 

 

同步各个节点时间

详见 http://zhengzhuangjie.iteye.com/blog/1409336

 

 

有可能需要把hadoop-core-0.20.203.0.jar和commons-collections-3.2.1.jar拷贝到sqoop-1.4.1-incubating__hadoop-0.20/lib下面

 

设置要访问的远程MySQL服务器权限(10.0.30.67):

[root@localhost ~]# mysql -uroot -p123456

 

mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' WITH GRANT OPTION

Query OK, 0 rows affected (0.01 sec)

 

配置sqoop.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

<!-- Put Sqoop-specific properties in this file. -->

<configuration>

  <!--
    Set the value of this property to explicitly enable third-party
    ManagerFactory plugins.

    If this is not used, you can alternately specify a set of ManagerFactories
    in the $SQOOP_CONF_DIR/managers.d/ subdirectory.  Each file should contain
    one or more lines like:
      manager.class.name[=/path/to/containing.jar]

    Files will be consulted in lexicographical order only if this property
    is unset.
  -->

  <property>
    <name>sqoop.connection.factories</name>
    <value>com.cloudera.sqoop.manager.DefaultManagerFactory</value>
    <description>A comma-delimited list of ManagerFactory implementations
      which are consulted, in order, to instantiate ConnManager instances
      used to drive connections to databases.
    </description>
  </property>


  <!--
    Set the value of this property to enable third-party tools.

    If this is not used, you can alternately specify a set of ToolPlugins
    in the $SQOOP_CONF_DIR/tools.d/ subdirectory.  Each file should contain
    one or more lines like:
      plugin.class.name[=/path/to/containing.jar]

    Files will be consulted in lexicographical order only if this property
    is unset.
  -->
  <!--
  <property>
    <name>sqoop.tool.plugins</name>
    <value></value>
    <description>A comma-delimited list of ToolPlugin implementations
      which are consulted, in order, to register SqoopTool instances which
      allow third-party tools to be used.
    </description>
  </property>
  -->

  <!--
    By default, the Sqoop metastore will auto-connect to a local embedded
    database stored in ~/.sqoop/. To disable metastore auto-connect, uncomment
    this next property.
  -->

  <property>
    <name>sqoop.metastore.client.enable.autoconnect</name>
    <value>false</value>
    <description>If true, Sqoop will connect to a local metastore
      for job management when no other metastore arguments are
      provided.
    </description>
  </property>


  <!--
    The auto-connect metastore is stored in ~/.sqoop/. Uncomment
    these next arguments to control the auto-connect process with
    greater precision.
  -->

  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:mysql://10.0.30.67/cncloud?useUnicode=true&amp;characterEncoding=utf-8</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>root</value>
    <description>The username to bind to the metastore.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value>123456</value>
    <description>The password to bind to the metastore.
    </description>
  </property>


  <!--
    For security reasons, by default your database password will not be stored in
    the Sqoop metastore. When executing a saved job, you will need to
    reenter the database password. Uncomment this setting to enable saved
    password storage. (INSECURE!)
  -->

  <property>
    <name>sqoop.metastore.client.record.password</name>
    <value>true</value>
    <description>If true, allow saved passwords in the metastore.
    </description>
  </property>

 

  <!--
    SERVER CONFIGURATION: If you plan to run a Sqoop metastore on this machine,
    you should uncomment and set these parameters appropriately.

    You should then configure clients with:
       sqoop.metastore.client.autoconnect.url =
       jdbc:hsqldb:hsql://&lt;server-name&gt;:&lt;port&gt;/sqoop
  -->

  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>

  <property>
    <name>sqoop.metastore.server.port</name>
    <value>16000</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>


</configuration>

 

[root@nn0001 bin]# ./sqoop help
Error: /usr/lib/hadoop does not exist!
Please set $HADOOP_HOME to the root of your Hadoop installation.
[root@nn0001 bin]# vim /etc/profile

HADOOP_HOME=/download/hadoop-0.20.203.0
HBASE_HOME=/download/hbase-0.90.5

export HADOOP_HOME
export HBASE_HOME

 

[root@nn0001 bin]# source /etc/profile

 

[root@nn0001 bin]# ./sqoop import --help

 

 

sqoop导入mysql数据运行过程中依赖mysql-connector-java-*.jar,所以你需要下载mysql-connector-java-*.jar并复制到sqoop-1.4.1-incubating__hadoop-0.20/lib中。

 

[root@nn0001 bin]# ./sqoop list-tables --connect jdbc:mysql://10.0.30.67/cncloud --username root
12/02/29 11:32:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
T_AUTHORITY
T_GROUP
T_GROUP_SHARE
T_GROUP_USER
T_HFILE
T_OLINK
T_OLINK_HFILE
T_RESOURCE
T_RESOURCE_AUTHORITY
T_ROLE
T_ROLE_AUTHORITY
T_USER
T_USER_ROLE
T_USER_SHARE

 

 

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table cncloud
12/02/29 11:19:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/02/29 11:19:29 INFO tool.CodeGenTool: Beginning code generation
12/02/29 11:19:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `cncloud` AS t LIMIT 1
12/02/29 11:19:29 ERROR manager.SqlManager: Error executing statement: com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table 'cncloud.cncloud' doesn't exist
com.mysql.jdbc.exceptions.MySQLSyntaxErrorException: Table 'cncloud.cncloud' doesn't exist
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:936)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2985)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1631)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1723)
        at com.mysql.jdbc.Connection.execSQL(Connection.java:3283)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1332)
        at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1467)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:487)
        at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:496)
        at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:194)
        at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:178)
        at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:114)
        at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1235)
        at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1060)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:82)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
12/02/29 11:19:29 ERROR tool.ImportTool: Imported Failed: Attempted to generate class with no columns!

 

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER

12/02/29 11:23:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/02/29 11:23:44 INFO tool.CodeGenTool: Beginning code generation
12/02/29 11:23:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `T_USER` AS t LIMIT 1
12/02/29 11:23:46 INFO orm.CompilationManager: HADOOP_HOME is /download/hadoop-0.20.203.0/bin/..
Note: /tmp/sqoop-root/compile/3e41a89256b74520169e9f08b714538c/T_USER.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/02/29 11:23:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/3e41a89256b74520169e9f08b714538c/T_USER.jar
12/02/29 11:23:59 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/02/29 11:23:59 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/02/29 11:23:59 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/02/29 11:23:59 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/02/29 11:23:59 INFO mapreduce.ImportJobBase: Beginning import of T_USER
12/02/29 11:24:02 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 0 time(s).
12/02/29 11:24:03 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 1 time(s).
12/02/29 11:24:04 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 2 time(s).
12/02/29 11:24:05 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 3 time(s).
12/02/29 11:24:06 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 4 time(s).
12/02/29 11:24:07 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 5 time(s).
12/02/29 11:24:08 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 6 time(s).
12/02/29 11:24:09 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 7 time(s).
12/02/29 11:24:10 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 8 time(s).
12/02/29 11:24:11 INFO ipc.Client: Retrying connect to server: nn0001/10.0.30.235:9001. Already tried 9 time(s).
12/02/29 11:24:11 ERROR tool.ImportTool: Encountered IOException running import job: java.net.ConnectException: Call to nn0001/10.0.30.235:9001 failed on connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1057)
        at org.apache.hadoop.ipc.Client.call(Client.java:1033)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
        at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:364)
        at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:460)
        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:454)
        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:437)
        at org.apache.hadoop.mapreduce.Job$1.run(Job.java:477)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:475)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:464)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:97)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:406)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:414)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:527)
        at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:187)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1164)
        at org.apache.hadoop.ipc.Client.call(Client.java:1010)
        ... 26 more

 

 如果执行失败,再次执行,会出现错误:

12/02/29 11:27:04 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.java to /download/sqoop-1.4.1-incubating__hadoop-0.20/bin/./T_USER.java
java.io.IOException: Destination '/download/sqoop-1.4.1-incubating__hadoop-0.20/bin/./T_USER.java' already exists
        at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:1811)
        at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:227)
        at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:367)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

 

使用./hadoop dfs -rmr xxx删除即可

[root@nn0001 bin]# rm /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.java
[root@nn0001 bin]# rm /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.class
[root@nn0001 bin]# rm /tmp/sqoop-root/compile/35b5389c7196008f8e1a66b0061f0cd5/T_USER.jar

 引起上面问题的原因是因为没有启动mapreduce

[root@nn0001 bin]# ./start-mapred.sh
starting jobtracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-jobtracker-nn0001.out
dn0001: starting tasktracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-tasktracker-dn0001.out
dn0003: starting tasktracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-tasktracker-dn0003.out
dn0002: starting tasktracker, logging to /download/hadoop-0.20.203.0/bin/../logs/hadoop-root-tasktracker-dn0002.out

 

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER
12/02/29 12:37:09 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/02/29 12:37:09 INFO tool.CodeGenTool: Beginning code generation
12/02/29 12:37:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `T_USER` AS t LIMIT 1
12/02/29 12:37:10 INFO orm.CompilationManager: HADOOP_HOME is /download/hadoop-0.20.203.0/bin/..
Note: /tmp/sqoop-root/compile/7354a8be9b8fae6811150fd2fffac7a0/T_USER.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
12/02/29 12:37:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/7354a8be9b8fae6811150fd2fffac7a0/T_USER.jar
12/02/29 12:37:21 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/02/29 12:37:21 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/02/29 12:37:21 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/02/29 12:37:21 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/02/29 12:37:21 INFO mapreduce.ImportJobBase: Beginning import of T_USER
12/02/29 12:37:30 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`ID`), MAX(`ID`) FROM `T_USER`
12/02/29 12:37:30 WARN db.TextSplitter: Generating splits for a textual index column.
12/02/29 12:37:30 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
12/02/29 12:37:30 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
12/02/29 12:37:32 INFO mapred.JobClient: Running job: job_201202291233_0001
12/02/29 12:37:33 INFO mapred.JobClient:  map 0% reduce 0%
12/02/29 12:38:22 INFO mapred.JobClient:  map 25% reduce 0%
12/02/29 12:38:25 INFO mapred.JobClient:  map 50% reduce 0%
12/02/29 12:38:35 INFO mapred.JobClient:  map 75% reduce 0%
12/02/29 12:38:37 INFO mapred.JobClient:  map 100% reduce 0%
12/02/29 12:38:48 INFO mapred.JobClient: Job complete: job_201202291233_0001
12/02/29 12:38:49 INFO mapred.JobClient: Counters: 14
12/02/29 12:38:49 INFO mapred.JobClient:   Job Counters
12/02/29 12:38:49 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=104029
12/02/29 12:38:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/02/29 12:38:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/02/29 12:38:49 INFO mapred.JobClient:     Launched map tasks=4
12/02/29 12:38:49 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/02/29 12:38:49 INFO mapred.JobClient:   File Output Format Counters
12/02/29 12:38:49 INFO mapred.JobClient:     Bytes Written=928
12/02/29 12:38:49 INFO mapred.JobClient:   FileSystemCounters
12/02/29 12:38:49 INFO mapred.JobClient:     HDFS_BYTES_READ=701
12/02/29 12:38:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=125482
12/02/29 12:38:49 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=928
12/02/29 12:38:49 INFO mapred.JobClient:   File Input Format Counters
12/02/29 12:38:49 INFO mapred.JobClient:     Bytes Read=0
12/02/29 12:38:49 INFO mapred.JobClient:   Map-Reduce Framework
12/02/29 12:38:49 INFO mapred.JobClient:     Map input records=3
12/02/29 12:38:49 INFO mapred.JobClient:     Spilled Records=0
12/02/29 12:38:49 INFO mapred.JobClient:     Map output records=3
12/02/29 12:38:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=701
12/02/29 12:38:49 INFO mapreduce.ImportJobBase: Transferred 928 bytes in 86.7251 seconds (10.7005 bytes/sec)
12/02/29 12:38:49 INFO mapreduce.ImportJobBase: Retrieved 3 records.

 

 

[root@nn0001 bin]# /download/hadoop-0.20.203.0/bin/hadoop dfs -lsr /

drwxr-xr-x   - root supergroup          0 2012-02-29 12:37 /user
drwxr-xr-x   - root supergroup          0 2012-02-29 12:37 /user/root
drwxr-xr-x   - root supergroup          0 2012-02-29 12:38 /user/root/T_USER
-rw-r--r--   3 root supergroup          0 2012-02-29 12:38 /user/root/T_USER/_SUCCESS
drwxr-xr-x   - root supergroup          0 2012-02-29 12:37 /user/root/T_USER/_logs
drwxr-xr-x   - root supergroup          0 2012-02-29 12:37 /user/root/T_USER/_logs/history
-rw-r--r--   3 root supergroup      12907 2012-02-29 12:37 /user/root/T_USER/_logs/history/job_201202291233_0001_1330490252595_root_T%5FUSER.jar
-rw-r--r--   3 root supergroup      26958 2012-02-29 12:37 /user/root/T_USER/_logs/history/job_201202291233_0001_conf.xml
-rw-r--r--   3 root supergroup        311 2012-02-29 12:38 /user/root/T_USER/part-m-00000
-rw-r--r--   3 root supergroup          0 2012-02-29 12:38 /user/root/T_USER/part-m-00001
-rw-r--r--   3 root supergroup        305 2012-02-29 12:38 /user/root/T_USER/part-m-00002
-rw-r--r--   3 root supergroup        312 2012-02-29 12:38 /user/root/T_USER/part-m-00003

 

 

上面是演示如何把数据的数据导入到hdfs中,下面演示如何把hdfs中的数据导入到mysql

 

如果新的服务器上有数据库和相应的表,则不需要新建,否则要先新建数据库及相应的表

 

因为数据是从所有的datanode上获取并导入到mysql中,所以mysql需要设置允许所有ip客户端访问,如上面设置

 

同时表里面不能有跟导入的数据id相同的记录,否则会报错

 

[root@nn0001 bin]# ./sqoop export --connect jdbc:mysql://10.0.30.235/cncloud --export-dir /user/root/T_ROLE --username root --table T_ROLE

 

注意:红色字体是hdfs上面的路径,不是本地路径

 

 

 

下面引自http://thomas0988.iteye.com/blog/1383463

 

Sqoop 1.3 命令
 
序号    命令/command        类                                       说明
1        impor                     ImportTool                         从关系型数据库中导入数据(来自表或者查询语句)到HDFS中
2        export                    ExportTool                         将HDFS中的数据导入到关系型数据库中
3        codegen                 CodeGenTool                     获取数据库中某张表数据生成Java并打成jar包
4        create-hive-table   CreateHiveTableTool          创建Hive表
5        eval                        EvalSqlTool                        查看SQL执行结果
6        import-all-tables     ImportAllTablesTool           导入某个数据库下所有表到HDFS中
7        job                          JobTool 
8        list-databases         ListDatabasesTool            列出所有数据库名
9        list-tables                ListTablesTool                   列出某个数据库下所有表
10      merge                     MergeTool 
11      metastore               MetastoreTool 
12      help                         HelpTool                            查看帮助
13      version                    VersionTool                        查看版本
 
        接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分Common arguments,Incremental import arguments,Output line formatting arguments,Input parsing arguments,Hive arguments,HBase arguments,Generic Hadoop command-line arguments,下面一一说明:
       1.Common arguments  通用参数,主要是针对关系型数据库链接的一些参数
 
 
序号            参数                                 说明                               样例
1                connect                           连接关系型数据库的          URL jdbc:mysql://localhost/sqoop_datas
2                connection-manager       连接管理类,一般不用 
3                driver                              连接驱动 
4                hadoop-home                 hadoop目录                    /home/guoyun/hadoop
5                help                                查看帮助信息 
6                password                       连接关系型数据库的密码 
7                username                       链接关系型数据库的用户名 
8                verbose                          查看更多的信息,其实是将日志级别调低 该参数后面不接值
===========================================
sqoop 导入数据 报 java heap space 错误
使用sqoop导入数据时,当数据量变大时,在map/reduce的过程中就会提示java heap space error。经过总结,解决方法有两个:
1、 修改每个运行子进程的jvm大小
修改mapred-site.xml文件,添加以下属性:
<property>
 <name>mapred.child.java.opts</name>
 <value>-Xmx512M</value>
</property>
<property>
 <name>mapred.reduce.child.java.opts</name>
 <value>-Xmx512M</value>
</property>
<property>
 <name>mapred.map.child.java.opts</name>?
 <value>-Xmx512M</value>
</property>

2、 增加map数量,通过sqoop -m 选项指定更多的map。通过更多的map,降少每个子进程占用的heap space,避免超出hadoop设置的java heap space 大小
sqoop ... -m <map 数量>

 

导入hive

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER --hive-import

12/03/01 14:54:03 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Cannot run program "hive": java.io.IOException: error=2, No such file or directory
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at java.lang.Runtime.exec(Runtime.java:593)
        at java.lang.Runtime.exec(Runtime.java:509)
        at org.apache.sqoop.util.Executor.exec(Executor.java:76)
        at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:344)
        at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:297)
        at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:239)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:392)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
Caused by: java.io.IOException: java.io.IOException: error=2, No such file or directory
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        ... 15 more

 

出现这个错误主要是因为hive没有配置环境变量,hadoop找不到程序路径导致的

在/etc/profile配置

HIVE_HOME=/download/hive-0.8.1

export HIVE_HOME

 

上面的命令似乎把数据库的数据导入到Hadoop的hdfs中(因为导入后我在hive中没有找到相应的表,在hadoop中却又相应的数据,但是当我使用下面的命令导入另外一个表时,却发现之前的T_ROLE表已经在hive中了,而且还有数据,而T_USER只是创建了一个表而已,没有相应的数据记录,求解?)

[root@nn0001 bin]# ./sqoop create-hive-table --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER --hive-table T_USER

12/03/01 16:04:43 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
12/03/01 16:04:43 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
12/03/01 16:04:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
12/03/01 16:04:44 INFO hive.HiveImport: Removing temporary files from import process: T_USER/_logs
12/03/01 16:04:44 INFO hive.HiveImport: Loading uploaded data into Hive
12/03/01 16:04:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `T_USER` AS t LIMIT 1
12/03/01 16:04:45 WARN hive.TableDefWriter: Column LOGIN_FAILED_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:45 WARN hive.TableDefWriter: Column LAST_LOGIN_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:45 WARN hive.TableDefWriter: Column UPDATED_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:45 WARN hive.TableDefWriter: Column CREATED_DATE had to be cast to a less precise type in Hive
12/03/01 16:04:49 INFO hive.HiveImport: WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
12/03/01 16:04:51 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/download/hive-0.8.1/lib/hive-common-0.8.1.jar!/hive-log4j.properties
12/03/01 16:04:51 INFO hive.HiveImport: Hive history file=/tmp/root/hive_job_log_root_201203011604_1032827888.txt
12/03/01 16:05:17 INFO hive.HiveImport: OK
12/03/01 16:05:17 INFO hive.HiveImport: Time taken: 24.125 seconds
12/03/01 16:05:17 INFO hive.HiveImport: Hive import complete.

 

[root@nn0001 bin]# ./sqoop import --connect jdbc:mysql://10.0.30.67/cncloud --username root --table T_USER --hive-import --hive-overwrite

这个命令又能奇迹般把数据导入到hive中,估计是上面的操作有什么错误。

 

 

导出数据到MySQL,当然数据库表要先存在,否则会报错:

[root@nn0001 bin]# ./sqoop export --connect jdbc:mysql://10.0.30.235/cncloud --username root --table T_USER --export-dir /user/hive/warehouse/t_user

 

java.util.NoSuchElementException
        at java.util.AbstractList$Itr.next(AbstractList.java:350)
        at T_USER.__loadFromFields(T_USER.java:553)
        at T_USER.parse(T_USER.java:502)
        at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:77)
        at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:36)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:183)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:253)

 

此错误的原因为sqoop解析文件的字段与MySql数据库的表的字段对应不上造成的。因此需要在执行的时候给sqoop增加参数,告诉sqoop文件的分隔符,使它能够正确的解析文件字段。hive默认的字段分隔符为'\001'

[root@nn0001 bin]# ./sqoop export --connect jdbc:mysql://10.0.30.235/cncloud --username root --table T_USER --export-dir /user/hive/warehouse/t_user --input-fields-terminated-by '\t'

 

官方详细英文文档:

http://incubator.apache.org/sqoop/docs/1.4.1-incubating/SqoopUserGuide.html

分享到:
评论

相关推荐

    Sqoop安装及MySQL、HDFS、Hive、HBase 数据迁移实验

    内容概要:本文档提供了详细的 Sqoop 安装指南以及在 Ubuntu 上运行 Sqoop 实验的具体步骤,涵盖从环境搭建、 Sqoop 基础配置、 MySQL 连接器获取到常见数据迁移操作。文中介绍了通过 Sqoop 在 MySQL 和 HDFS 之间、...

    Sqoop安装与使用

    Sqoop 是一款方便的在传统型数据库与 Hadoop 之间进行数据迁移的工具,充分利用 MapReduce 并行特点以批处理的方式加快数据传输。Sqoop 工具是 Hadoop 下连接关系型数据库和 Hadoop 的桥梁,支持关系型数据库和 hive...

    大数据技术基础实验报告-sqoop的安装配置与应用.doc

    Sqoop作为大数据处理的重要工具,简化了Hadoop和RDBMS之间的数据迁移工作。正确配置和使用Sqoop,可以有效地实现大数据的导入导出,提高数据处理的效率。在实验中,了解每个步骤的细节和注意事项,对于理解大数据...

    sqoop安装与使用

    ### Sqoop 安装与使用详解 #### 一、Sqoop功能概述 Sqoop是一款用于在Hadoop和关系型...以上介绍了如何安装和使用Sqoop的基本流程及常用命令,通过这些命令可以帮助用户轻松地在Hadoop与关系型数据库之间迁移数据。

    使用sqoop抽取mysql数据

    在这个过程中,Sqoop 提供了高效、灵活的数据传输机制,支持多种数据库,并且可以进行并行操作,提高数据迁移效率。 在“使用sqoop抽取mysql数据”的场景中,我们需要了解以下关键知识点: 1. ** Sqoop 安装与配置...

    Sqoop企业级大数据迁移方案全方位实战视频教程

    企业级数据迁移及同步需求 2.Sqoop的设计思想 3.Sqoop与Hadoop的关系 4.Sqoop的分布式实现原理 5.Sqoop的企业级版本选型 6.Sqoop的部署安装及配置 第二章:Sqoop数据导入实战开发 1.Sqoop导入开发参数详解 ...

    sqoop导入数据到hive中,数据不一致

    在大数据处理领域,Sqoop和Hive是两个重要的工具,分别用于数据迁移和数据仓库管理。当使用Sqoop将数据导入Hive时,有时可能会遇到数据不一致的问题,这可能是由于多种原因引起的。本文将深入探讨这个问题,并提供...

    大数据系列2020-数据迁移工具资料汇总(sqoop、kettle、datax).zip

    本资料汇总主要聚焦于三大常用的数据迁移工具:Sqoop、Kettle和DataX,它们各有特色,广泛应用于不同的场景。接下来,我们将深入探讨这三个工具的核心功能、应用场景以及使用技巧。 Sqoop 是Apache开发的一款用于...

    sqoop把mysql数据导入hbase2.1.6

    在这个过程中,我们需要使用Apache Sqoop工具,它是一个用于在Hadoop和传统数据库间进行数据迁移的工具。在本文中,我们将详细介绍如何进行这个操作,以及解决可能出现的问题。 首先,Sqoop不仅支持将数据从关系型...

    sqoop 安装与简单使用

    Sqoop 是 Apache 开源项目中的一款工具,专门用于在 Hadoop 和关系型数据库管理系统(RDBMS)之间进行数据迁移。它提供了一个命令行界面,允许用户方便地将数据导入到 Hadoop 的 HDFS 中,或者从 HDFS 导出到 RDBMS...

    大数据课程-Hadoop集群程序设计与开发-10.Sqoop数据迁移_lk_edit.pptx

    【大数据课程-Hadoop集群程序设计与开发-10.Sqoop数据迁移】是针对大数据领域的一门课程,主要讲解如何使用Sqoop工具进行数据在Hadoop集群与关系型数据库之间的迁移。Sqoop是一款由Apache开发的开源软件,专门用于...

    sqoop安装工具

    Sqoop是一款开源的数据迁移工具,主要用于在Hadoop与传统关系型数据库之间进行数据的导入导出。它提供了高效、方便的接口,使得大数据处理变得更加灵活。在这个“sqoop安装工具”压缩包中,包含了 Sqoop 的安装包...

    Sqoop 安装.docx

    Sqoop 是一个用于在 Apache Hadoop 和传统关系型数据库之间进行数据迁移的工具。它允许用户将大规模数据从 RDBMS(关系型数据库管理系统)导入到 Hadoop 分布式文件系统(HDFS),或者从 HDFS 导出数据回 RDBMS。在...

    星环大数据平台_Sqoop数据导入.pdf

    星环大数据平台使用的Sqoop是一种在Hadoop与传统关系数据库之间进行数据迁移的工具。Sqoop利用MapReduce的分布式并行处理机制来实现数据的高效导入导出。在星环大数据平台中使用Sqoop,可以实现对数据的批量迁移,这...

    《Hadoop系统搭建及项目实践》课件10Hadoop 与RDMS数据迁移工具Sqoop.pdf

    《Hadoop系统搭建及项目实践》课件10Hadoop 与RDMS数据迁移工具Sqoop.pdf《Hadoop系统搭建及项目实践》课件10Hadoop 与RDMS数据迁移工具Sqoop.pdf《Hadoop系统搭建及项目实践》课件10Hadoop 与RDMS数据迁移工具Sqoop...

    Sqoop与HBase间数据高效迁移实践与注意事项

    使用场景及目标:在需要将HBase数据迁移到其他系统(如关系型数据库、大数据平台、云存储、消息队列等)时,本文档为实现高效、安全、完整的数据迁移提供了全面指导和支持。 阅读建议:读者应该先熟悉Hadoop、HBase...

    电商数仓项目(九) Sqoop安装与配置

    在大数据处理领域,Sqoop是一款用于在Hadoop和传统关系型数据库之间进行数据迁移的工具。本电商数仓项目中的第九部分重点介绍了如何安装和配置Sqoop,这对于构建高效的数据仓库系统至关重要。以下是对 Sqoop 的详细...

    Hadoop-Sqoop配置

    Sqoop 是一种数据迁移工具,主要用于在结构化数据源和 Hadoop 之间进行数据迁移。Sqoop 可以连接各种数据源,如 MySQL、PostgreSQL、Oracle 等,并将数据导入 Hadoop 中。 Sqoop 的主要特点是可以将结构化数据源...

    hadoop大数据平台技术与应用--第7章数据迁移工具Sqoop.pdf

    ### Sqoop数据导入与导出 数据导入是指将关系数据库中的数据迁移到HDFS,而数据导出是指将HDFS中的数据迁移到关系数据库。Sqoop支持多种导入方式,包括导入单个表、导入数据库中所有表以及导入大型机上的数据集。在...

Global site tag (gtag.js) - Google Analytics