1)将数据从关系数据库导入文件到hive表中
Table 1. Common arguments
--connect <jdbc-uri> |
Specify JDBC connect string |
--connection-manager <class-name> |
Specify connection manager class to use |
--driver <class-name> |
Manually specify JDBC driver class to use |
--hadoop-home <dir> |
Override $HADOOP_HOME |
--help |
Print usage instructions |
-P |
Read password from console |
--password <password> |
Set authentication password |
--username <username> |
Set authentication username |
--verbose |
Print more information while working |
--connection-param-file <filename> |
Optional properties file that provides connection parameters |
导入数据到hive参数
Table 7. Hive arguments:
--hive-home <dir> |
Override $HIVE_HOME
|
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
例子:
[root@hadoop0 bin]# ./sqoop import --connect jdbc:mysql://192.168.1.101/test -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ','
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
99/06/23 19:07:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
99/06/23 19:07:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
99/06/23 19:07:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
99/06/23 19:07:49 INFO tool.CodeGenTool: Beginning code generation
99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:50 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272
99/06/23 19:08:09 INFO mapreduce.Job: Job job_local417795562_0001 completed successfully
99/06/23 19:08:10 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=18029933
FILE: Number of bytes written=18472806
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=7294
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Map-Reduce Framework
Map input records=108
Map output records=108
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=17
Total committed heap usage (bytes)=84185088
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=7294
99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Transferred 7.123 KB in 16.6249 seconds (438.7394 bytes/sec)
99/06/23 19:08:10 INFO mapreduce.ImportJobBase: Retrieved 108 records.
99/06/23 19:08:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:08:10 INFO hive.HiveImport: Loading uploaded data into Hive
99/06/23 19:08:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hive2.0/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
99/06/23 19:08:52 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
99/06/23 19:08:54 INFO hive.HiveImport:
99/06/23 19:08:54 INFO hive.HiveImport: Logging initialized using configuration in file:/opt/bigdata/hive2.0/conf/hive-log4j2.properties
99/06/23 19:09:27 INFO hive.HiveImport: OK
99/06/23 19:09:27 INFO hive.HiveImport: Time taken: 3.378 seconds
99/06/23 19:09:32 INFO hive.HiveImport: Loading data to table default.emps
99/06/23 19:09:35 INFO hive.HiveImport: OK
99/06/23 19:09:35 INFO hive.HiveImport: Time taken: 7.827 seconds
99/06/23 19:09:36 INFO hive.HiveImport: Hive import complete.
99/06/23 19:09:36 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.
[root@hadoop0 bin]#
错误解决方案:
99/06/23 18:57:26 INFO mapreduce.Job: Running job: job_local1178174559_0001
99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null
99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
99/06/23 18:57:26 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
99/06/23 18:57:26 INFO mapred.LocalJobRunner: Waiting for map tasks
99/06/23 18:57:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1178174559_0001_m_000000_0
99/06/23 18:57:26 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
99/06/23 18:57:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
99/06/23 18:57:27 INFO db.DBInputFormat: Using read commited transaction isolation
99/06/23 18:57:27 INFO mapred.MapTask: Processing split: 1=1 AND 1=1
99/06/23 18:57:27 INFO mapred.LocalJobRunner: map task executor complete.
99/06/23 18:57:27 WARN mapred.LocalJobRunner: job_local1178174559_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class people not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:403)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:237)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:263)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:515)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:758)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class people not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 12 more
99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 running in uber mode : false
99/06/23 18:57:27 INFO mapreduce.Job: map 0% reduce 0%
99/06/23 18:57:27 INFO mapreduce.Job: Job job_local1178174559_0001 failed with state FAILED due to: NA
99/06/23 18:57:27 INFO mapreduce.Job: Counters: 0
99/06/23 18:57:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 8.0751 seconds (0 bytes/sec)
99/06/23 18:57:27 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
99/06/23 18:57:27 INFO mapreduce.ImportJobBase: Retrieved 0 records.
99/06/23 18:57:27 ERROR tool.ImportTool: Error during import: Import job failed!
解决方案:copy文件系统/tmp下面生成的表对应的class文件和jar文件,到sqoop目录下面的bin中,再次执行命令OK
[root@hadoop0 bin]# ./sqoop import --connect jdbc:mysql://192.168.1.101/test -username root --password root --table people --hive-import --hive-table emps --fields-terminated-by ','
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/bigdata/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
99/06/23 19:07:28 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
99/06/23 19:07:28 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
99/06/23 19:07:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
99/06/23 19:07:29 INFO tool.CodeGenTool: Beginning code generation
99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:29 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `people` AS t LIMIT 1
99/06/23 19:07:29 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/bigdata/hadoop272
Note: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
99/06/23 19:07:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/ef3e50779e6c6fd3f28e16401ec54afc/people.jar
99/06/23 19:07:32 WARN manager.MySQLManager: It looks like you are importing from mysql.
99/06/23 19:07:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
99/06/23 19:07:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
99/06/23 19:07:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
99/06/23 19:07:32 INFO mapreduce.ImportJobBase: Beginning import of people
99/06/23 19:07:32 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
99/06/23 19:07:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
99/06/23 19:07:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
99/06/23 19:07:34 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
99/06/23 19:07:34 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
99/06/23 19:07:34 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
99/06/23 19:07:34 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
99/06/23 19:07:35 ERROR tool.ImportTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoop0:9000/user/root/people already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:196)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:169)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:266)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
解决方案:删除对应的目录即可
[root@hadoop0 bin]# ./hadoop fs -rmr /user/root/people
rmr: DEPRECATED: Please use 'rm -r' instead.
99/06/23 19:07:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
99/06/23 19:07:44 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /user/root/people
相关推荐
在导入前,应确保源数据的格式与Hive可以接受的格式一致,或者在Sqoop命令中指定转换规则。 4. 分区问题:如果Hive表是分区表,而Sqoop导入时未正确处理分区,可能会导致数据分布不均或者丢失部分数据。确保在导入...
sqoop导入数据到hive
本文将详细介绍如何使用Sqoop将Oracle数据库中的数据导入到Hive表中,通过具体的命令行示例,帮助读者理解整个过程。 #### 二、准备工作 在进行数据导入之前,需要确保以下条件已满足: 1. **安装并配置好Oracle...
在大数据处理中,Sqoop 提供了方便的数据导入和导出功能,它能够将结构化的数据从传统数据库迁移到 Hadoop 生态系统中的组件,如 Hive。 在本主题中,我们将深入探讨 Sqoop 的导入功能,特别是如何将数据从 MySQL ...
hive分区表生成组件主要是通过获取数据集及数据集item的code,数据集code作为hive表名,数据集item code作为hive分区表字段,同时加入相关字段,形成hive表的基本结构。项目结构─src ├─main │ ├─java │ │ └...
根据提供的部分内容,我们可以详细解析Sqoop导入数据到HDFS的操作命令: ```bash sqoop --username root --password root --connect jdbc:mysql://222.28.93.40:3306/bjyy_basedata --table CARD_RECHARGE --target...
Sqoop 提供了灵活的数据导入方式,可以将 MySQL 数据库中的数据导入到 Hive 表中。例如,以下命令可以将 MySQL 数据库中的 people_access_log 表导入到 Hive 表 web.people_access_log 中: sqoop import \ --...
06_Sqoop将MySQL数据导入到HDFS(二) 07_Sqoop中的增量导入与Sqoop job 08_Sqoop将MySQL数据导入Hive表中 09_Sqoop的导出及脚本中使用的方式 10_案例分析-动态分区的实现 11_案例分析-源表的分区加载创建 12_案例...
- **从 PostgreSQL 导入数据到 Hive**: ```bash sqoop import \ --connect jdbc:postgresql://localhost/test \ --username root \ --password root \ --table test_table \ --hive-import \ --hive-table ...
除了从本地文件系统导入数据外,还可以直接从 HDFS 中导入数据。这种方式避免了数据的额外拷贝过程,提高了效率。 **示例**: 假设 HDFS 中存在一个名为 `rcth0.txt` 的文件,内容如下: ```plaintext 4 wyp11 23...
DataX 在读取 Hive 中的数据时,实际上是按照 HDFS 文件中的行进行读取的。默认情况下,DataX 处理的文件块大小上限为 256M。如果文件块大小超过这一限制,可能会导致部分数据无法正确读取,从而引发数据缺失问题。 ...
出现此问题时使用:java.lang.NullPointerException at org.json.JSONObject.(JSONObject.java:144) at org.apache.sqoop.util.SqoopJsonUtil.... at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
在企业级数据处理场景中,随着大数据技术的发展,越来越多的企业选择将原有的关系型数据库(如 Oracle)中的数据迁移到基于 Hadoop 生态系统的数据仓库(如 Hive、HBase)中。这种迁移不仅可以充分利用大数据平台的...
通过编写SQL查询或者配置文件,Sqoop能够批量导入数据,同时支持各种数据库类型,确保数据迁移的灵活性和兼容性。 其次,Hadoop作为大数据处理的基础框架,其核心组件包括HDFS和MapReduce。HDFS提供了高容错性和高...
- 导入数据:使用Sqoop将MySQL数据导入Hive,数据会自动创建为Hive外部表。 在实际操作中,可以使用`--where`参数限制导入导出的数据范围,提高灵活性。此外, Sqoop还支持增量导入,通过指定last-row键值,只导入...
一、问题发现与分析 ...经过对比:sqoop在转换MySQL的datatime字段类型为hive的string时会出现问题:默认先转为对应时间戳,再转换为北京市区时间,就会使时间多8小时。 解决办法有两个: 1、在sql里将时间字段转换为
在导入数据时,Sqoop会将所有字段的值序列化为字符串,并将其转换为UTF-8编码的二进制值存储在HBase中。这意味着所有类型的数据都将被转换为文本格式,这可能会影响数据的读取效率和存储空间。 总的来说,将MySQL...
【标题】"yinian_hive_increase_sqoop:sqoop从mysql同步数据到hive" 描述了一种使用Apache Sqoop工具将数据从MySQL数据库高效地导入到Hadoop的Hive数据仓库的过程。这个过程在大数据处理中至关重要,因为它允许用户...
本文将详细讨论如何使用Java编程语言实现从Hive到HBase的快速数据导入方案。 首先,Hive是一个基于Hadoop的数据仓库工具,它可以将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,适合大规模数据的离线...