`
gaojingsong
  • 浏览: 1182686 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
文章分类
社区版块
存档分类
最新评论

【史上最详细的sqoop导入数据到hive中(二)】

阅读更多

1)将数据从关系数据库导入文件到hive表中

Table 1. Common arguments

Argument Description
--connect <jdbc-uri> Specify JDBC connect string
--connection-manager <class-name> Specify connection manager class to use
--driver <class-name> Manually specify JDBC driver class to use
--hadoop-home <dir> Override $HADOOP_HOME
--help Print usage instructions
-P Read password from console
--password <password> Set authentication password
--username <username> Set authentication username
--verbose Print more information while working
--connection-param-file <filename> Optional properties file that provides connection parameters

 

导入数据到hive参数

Table 7. Hive arguments:

Argument Description
--hive-home <dir> Override $HIVE_HOME
--hive-import Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite Overwrite existing data in the Hive table.
--create-hive-table If set, then the job will fail if the target hive
  table exits. By default this property is false.
--hive-table <table-name> Sets the table name to use when importing to Hive.
--hive-drop-import-delims Drops \n\r, and \01 from string fields when importing to Hive.
--hive-delims-replacement Replace \n\r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key Name of a hive field to partition are sharded on
--hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.

例子:

[root@hadoop0 bin]# ./sqoop import  --connect jdbc:mysql://192.168.1.101/test  -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ',' 

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

99/06/23 19:07:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

99/06/23 19:07:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

99/06/23 19:07:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

99/06/23 19:07:49 INFO tool.CodeGenTool: Beginning code generation

99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

 

99/06/23 19:07:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272

 

99/06/23 19:08:09 INFO mapreduce.Job: Job job_local417795562_0001 completed successfully

99/06/23 19:08:10 INFO mapreduce.Job: Counters: 20

        File System Counters

                FILE: Number of bytes read=18029933

                FILE: Number of bytes written=18472806

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=0

                HDFS: Number of bytes written=7294

                HDFS: Number of read operations=4

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=3

        Map-Reduce Framework

                Map input records=108

                Map output records=108

                Input split bytes=87

                Spilled Records=0

                Failed Shuffles=0

                Merged Map outputs=0

                GC time elapsed (ms)=17

                Total committed heap usage (bytes)=84185088

        File Input Format Counters 

                Bytes Read=0

        File Output Format Counters 

                Bytes Written=7294

99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Transferred 7.123 KB in 16.6249 seconds (438.7394 bytes/sec)

99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Retrieved 108 records.

99/06/23 19:08:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:08:10 INFO hive.HiveImport: Loading uploaded data into Hive

99/06/23 19:08:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hive2.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

99/06/23 19:08:54 INFO hive.HiveImport: 

99/06/23 19:08:54 INFO hive.HiveImport: Logging initialized using configuration in file:/opt/bigdata/hive2.0/conf/hive-log4j2.properties

99/06/23 19:09:27 INFO hive.HiveImport: OK

99/06/23 19:09:27 INFO hive.HiveImport: Time taken: 3.378 seconds

99/06/23 19:09:32 INFO hive.HiveImport: Loading data to table default.emps

99/06/23 19:09:35 INFO hive.HiveImport: OK

99/06/23 19:09:35 INFO hive.HiveImport: Time taken: 7.827 seconds

99/06/23 19:09:36 INFO hive.HiveImport: Hive import complete.

99/06/23 19:09:36 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

[root@hadoop0 bin]# 

错误解决方案:

 

99/06/23 18:57:26 INFO mapreduce.Job: Running job: job_local1178174559_0001

99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null

99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter

99/06/23 18:57:26 INFO mapred.LocalJobRunner: Waiting for map tasks

99/06/23 18:57:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1178174559_0001_m_000000_0

99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1

99/06/23 18:57:27 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]

99/06/23 18:57:27 INFO db.DBInputFormat: Using read commited transaction isolation

99/06/23 18:57:27 INFO mapred.MapTask: Processing split: 1=1 AND 1=1

99/06/23 18:57:27 INFO mapred.LocalJobRunner: map task executor complete.

99/06/23 18:57:27 WARN mapred.LocalJobRunner: job_local1178174559_0001

java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found

        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found

        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)

        at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:403)

        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:237)

        at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:263)

        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:515)

        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:758)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

        at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: Class people not found

        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)

        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)

        ... 12 more

99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 running in uber mode : false

99/06/23 18:57:27 INFO mapreduce.Job:  map 0% reduce 0%

99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 failed with state FAILED due to: NA

99/06/23 18:57:27 INFO mapreduce.Job: Counters: 0

99/06/23 18:57:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead

99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 8.0751 seconds (0 bytes/sec)

99/06/23 18:57:27 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Retrieved 0 records.

99/06/23 18:57:27 ERROR tool.ImportTool: Error during import: Import job failed!

解决方案:copy文件系统/tmp下面生成的表对应的class文件和jar文件,到sqoop目录下面的bin中,再次执行命令OK

 

[root@hadoop0 bin]# ./sqoop import  --connect jdbc:mysql://192.168.1.101/test  -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ',' 

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

99/06/23 19:07:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

99/06/23 19:07:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.

99/06/23 19:07:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

99/06/23 19:07:29 INFO tool.CodeGenTool: Beginning code generation

99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1

99/06/23 19:07:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272

Note: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

99/06/23 19:07:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.jar

99/06/23 19:07:32 WARN manager.MySQLManager: It looks like you are importing from mysql.

99/06/23 19:07:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct

99/06/23 19:07:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.

99/06/23 19:07:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)

99/06/23 19:07:32 INFO mapreduce.ImportJobBase: Beginning import of people

99/06/23 19:07:32 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

99/06/23 19:07:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

99/06/23 19:07:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

99/06/23 19:07:34 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

99/06/23 19:07:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

99/06/23 19:07:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id

99/06/23 19:07:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

99/06/23 19:07:35 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoop0:9000/user/root/people already exists

        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)

        at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)

        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)

        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)

        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)

        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)

        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)

        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)

        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)

        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)

        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)

        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)

        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)

        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)

        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)

        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)

        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)

        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)

        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

解决方案:删除对应的目录即可

[root@hadoop0 bin]# ./hadoop fs -rmr /user/root/people

rmr: DEPRECATED: Please use 'rm -r' instead.

99/06/23 19:07:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

99/06/23 19:07:44 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted /user/root/people

 

 

0
2
分享到:
评论

相关推荐

    sqoop导入数据到hive中,数据不一致

    在导入前,应确保源数据的格式与Hive可以接受的格式一致,或者在Sqoop命令中指定转换规则。 4. 分区问题:如果Hive表是分区表,而Sqoop导入时未正确处理分区,可能会导致数据分布不均或者丢失部分数据。确保在导入...

    sqoop从mysql中导入数据到parquet格式的hive中

    sqoop导入数据到hive

    Sqoop导Oracle数据到Hive

    本文将详细介绍如何使用Sqoop将Oracle数据库中的数据导入到Hive表中,通过具体的命令行示例,帮助读者理解整个过程。 #### 二、准备工作 在进行数据导入之前,需要确保以下条件已满足: 1. **安装并配置好Oracle...

    2、sqoop导入(RMDB-mysql、sybase到HDFS-hive)

    在大数据处理中,Sqoop 提供了方便的数据导入和导出功能,它能够将结构化的数据从传统数据库迁移到 Hadoop 生态系统中的组件,如 Hive。 在本主题中,我们将深入探讨 Sqoop 的导入功能,特别是如何将数据从 MySQL ...

    sqoop导入数据到hdfs路径

    根据提供的部分内容,我们可以详细解析Sqoop导入数据到HDFS的操作命令: ```bash sqoop --username root --password root --connect jdbc:mysql://222.28.93.40:3306/bjyy_basedata --table CARD_RECHARGE --target...

    数据同步Sqoop用法之mysql与Hive导入导出.docx

    Sqoop 提供了灵活的数据导入方式,可以将 MySQL 数据库中的数据导入到 Hive 表中。例如,以下命令可以将 MySQL 数据库中的 people_access_log 表导入到 Hive 表 web.people_access_log 中: sqoop import \ --...

    实战Sqoop数据导入及大数据用户行为案例分析

    06_Sqoop将MySQL数据导入到HDFS(二) 07_Sqoop中的增量导入与Sqoop job 08_Sqoop将MySQL数据导入Hive表中 09_Sqoop的导出及脚本中使用的方式 10_案例分析-动态分区的实现 11_案例分析-源表的分区加载创建 12_案例...

    bdp2hive:生成hive分区表、sqoop导入数据至hive

    hive分区表生成组件主要是通过获取数据集及数据集item的code,数据集code作为hive表名,数据集item code作为hive分区表字段,同时加入相关字段,形成hive表的基本结构。项目结构─src ├─main │ ├─java │ │ └...

    Sqoop数据采集工具简介、安装、使用学习笔记(配合Hive和Hbase)

    - **从 PostgreSQL 导入数据到 Hive**: ```bash sqoop import \ --connect jdbc:postgresql://localhost/test \ --username root \ --password root \ --table test_table \ --hive-import \ --hive-table ...

    datax数据从hive导入mysql数据缺失解决

    DataX 在读取 Hive 中的数据时,实际上是按照 HDFS 文件中的行进行读取的。默认情况下,DataX 处理的文件块大小上限为 256M。如果文件块大小超过这一限制,可能会导致部分数据无法正确读取,从而引发数据缺失问题。 ...

    Hive几种数据导入方式

    除了从本地文件系统导入数据外,还可以直接从 HDFS 中导入数据。这种方式避免了数据的额外拷贝过程,提高了效率。 **示例**: 假设 HDFS 中存在一个名为 `rcth0.txt` 的文件,内容如下: ```plaintext 4 wyp11 23...

    Sqoop通过Phoenix导hbase数据到hive

    出现此问题时使用:java.lang.NullPointerException at org.json.JSONObject.(JSONObject.java:144) at org.apache.sqoop.util.SqoopJsonUtil.... at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

    kettle 从oracle数据库导数据到hive 表

    在企业级数据处理场景中,随着大数据技术的发展,越来越多的企业选择将原有的关系型数据库(如 Oracle)中的数据迁移到基于 Hadoop 生态系统的数据仓库(如 Hive、HBase)中。这种迁移不仅可以充分利用大数据平台的...

    一个简单的Hive项目,使用了Sqoop、Hadoop、Hive、MySQL,对电商数据进行分析-BI_project.zip

    通过编写SQL查询或者配置文件,Sqoop能够批量导入数据,同时支持各种数据库类型,确保数据迁移的灵活性和兼容性。 其次,Hadoop作为大数据处理的基础框架,其核心组件包括HDFS和MapReduce。HDFS提供了高容错性和高...

    yinian_hive_increase_sqoop:sqoop从mysql同步数据到hive

    【标题】"yinian_hive_increase_sqoop:sqoop从mysql同步数据到hive" 描述了一种使用Apache Sqoop工具将数据从MySQL数据库高效地导入到Hadoop的Hive数据仓库的过程。这个过程在大数据处理中至关重要,因为它允许用户...

    java解决hive快速导数据到Hbase代码

    本文将详细讨论如何使用Java编程语言实现从Hive到HBase的快速数据导入方案。 首先,Hive是一个基于Hadoop的数据仓库工具,它可以将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,适合大规模数据的离线...

    DataX数据的迁移(MySQL、HDFS,Hive)

    1.将Mysql中的数据迁移到Hdfs文件系统中,然后通过Hive加载HDFS文件系统中的数据值 2.将Hive中的数据迁移到指定Mysql数据库中 注意点: 1.数据迁移的过程中,由于hive的Null值存储为"\N",Mysql存储为NULL值,二者...

    kafka+flume 实时采集oracle数据到hive中.docx

    基于Kafka+Flume实时采集Oracle数据到Hive中 一、Kafka获取Oracle日志实时数据 Kafka是一种分布式流媒体平台,能够实时地从Oracle数据库中提取日志信息。为了实现这一点,需要先安装ZooKeeper和Kafka,然后配置...

    23-Sqoop数据导入导出1

    - 导入数据:使用Sqoop将MySQL数据导入Hive,数据会自动创建为Hive外部表。 在实际操作中,可以使用`--where`参数限制导入导出的数据范围,提高灵活性。此外, Sqoop还支持增量导入,通过指定last-row键值,只导入...

Global site tag (gtag.js) - Google Analytics