`

Sqoop1: Sqoop-HCatalog Integration

 
阅读更多

I create a HCat table using Hue metastore manager, and submit a sqoop job with hcat through hue, the command show as blow:

import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj 
--password 123456 --table sample_user --split-by user_id -m 2 
--hcatalog-table sample_raw.sample_user

 Error:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hcatalog/mapreduce/HCatOutputFormat
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hadoop/hive/conf/HiveConf$ConfVars
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf$ConfVars

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, java.lang.NoClassDefFoundError: javax/jdo/JDOException
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoClassDefFoundError: javax/jdo/JDOException
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
....
Caused by: java.lang.ClassNotFoundException: javax.jdo.JDOException

similars :
https://issues.apache.org/jira/browse/HCATALOG-380
https://issues.apache.org/jira/browse/PIG-2666

 

java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1168)

 

 

 

A:

a. copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog  to  oozie-4.0.1/share/lib/sqoop

this will solve the first error,but the second error comes after the first one disapare.

 

I think two ways to fix the second one.

1)change hadoop-env.sh  adding HADOOP_CLASSPATH pointing to hcatalog jars which are copyed to cluster nodes from local dir oozie-4.0.1/share/lib/hcatalog.  But fails.

2.copy all jars in share/lib/hcatalog to share/lib/sqoop and upgrade it to hdfs sharelib. This way solve the second error but lead to the third one. Fuck!

3. copy all jars in shar/lib/hive/ to share/lib/sqoop/  and upgrade sharelib

//till now, the first three errors solved, but produce the fourth one

4. The fourth error is due to version problem

"datanucleus-api-jdo-3.0.0-release.jar" does NOT contain  
 org.datanucleus.jdo.JDOPersistenceManagerFactory. 
It contains  "org.datanucleus.api.jdo.JDOPersistenceManagerFactory". 

I find that in oozie-4.0.1

./share/lib/sqoop/datanucleus-rdbms-2.0.3.jar
./share/lib/sqoop/datanucleus-connectionpool-2.0.3.jar
./share/lib/sqoop/datanucleus-core-2.0.3.jar
./share/lib/sqoop/datanucleus-enhancer-2.0.3.jar
./share/lib/hive/datanucleus-rdbms-2.0.3.jar
./share/lib/hive/datanucleus-connectionpool-2.0.3.jar
./share/lib/hive/datanucleus-core-2.0.3.jar
./share/lib/hive/datanucleus-enhancer-2.0.3.jar

and  in hive-0.12.0-bin

./lib/datanucleus-core-3.2.2.jar
./lib/datanucleus-rdbms-3.2.1.jar
./lib/datanucleus-api-jdo-3.2.1.ja

 

cp datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar datanucleus-api-jdo-3.2.1.ja to share/lib/sqoop

and upgrade sharelib.

 

Note: when compile oozie, the hive version is not 0.12.0, so lead to these errors.

 

 

http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

 

******************************************************************************

0.  retain the original jars in share/lib/sqoop

1.  copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog  to  oozie-4.0.1/share/lib/sqoop

2.  copy all jars in hive-0.12.0-bin/lib/  to oozie-4.0.1/share/lib/sqoop

3. copy sqoop-1.4.4.bin__hadoop-2.0.4-alpha/sqoop-1.4.4.jar  to oozie-4.0.1/share/lib/sqoop

4. update sharelib

All errors above listed disappers. But the new one comes

 

 java.io.IOException: NoSuchObjectException(message:inok_datamine.inok_user table not found)

 I geuss sqoop job submmited through hue couldn't access hive metastore. But can't fix till now.

 

Good news: I set the entity in hive-site.xml

  <name>hive.metastore.uris</name>
  <value>thrift://192.168.122.1:9083</value>

and upload it to hdfs  hive/hive-site.xml meanwhile add it to the sqoop job in hue.

 

    start metastore by

hive --service metastore  //default port is 9083
hive --service metastore -p <port_num>

But the last error maybe comes

32064 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog full table schema fields = [user_id, user_name, first_letter, live_city]
33238 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog table partitioning key fields = []
33241 [main] ERROR org.apache.sqoop.Sqoop  - Got exception running Sqoop: java.lang.NullPointerException
Intercepting System.exit(1)
 details
java.lang.NullPointerException
	at org.apache.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:99)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:344)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:658)
	at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:232)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:600)
	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
	at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:203)
	at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:172)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
	at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
 
 After two days struggle, I finally realized that  the hive metastore is the same with hcatalog in hive 0.12. So use --hive-XXX to replace --hcatalog-xxxx. The following command is correct.
    import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj   
    --password 123456 --table sample_user --split-by user_id -m 2   
    --hive-database sample_raw --hive-table sample_user --hive-import
 

 

**********************************************************************************

The entity in hive-site.xml shown below  is confused me.

<!-- HCatAccessorService -->
   <property>
        <name>oozie.service.HCatAccessorService.jmsconnections</name>
        <value>
        default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory
        </value>
        <description>
        Specify the map  of endpoints to JMS configuration properties. In general, endpoint
        identifies the HCatalog server URL. "default" is used if no endpoint is mentioned
        in the query. If some JMS property is not defined, the system will use the property
        defined jndi.properties. jndi.properties files is retrieved from the application classpath.
        Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers.
        hcat://${1}.${2}.server.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}:61616
        </description>
   </property>

 

 

 

 

see:http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-foundhttp://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

 

The similar problem with pig using hcatalog appears. see:http://ylzhj02.iteye.com/admin/blogs/2043781

 

 

NOTE:With the support for HCatalog added to Sqoop, any HCatalog job depends on a set of jar files being available both on the Sqoop client host and where the Map/Reduce tasks run. To run HCatalog jobs, the environment variable HADOOP_CLASSPATH must be set up as shown below before launching the Sqoop HCatalog jobs.

HADOOP_CLASSPATH=$(hcat -classpath)

export HADOOP_CLASSPATH

The necessary HCatalog dependencies will be copied to the distributed cache automatically by the Sqoop job.

I  add the above two lines to ~/.bashrc  and hive-0.12.0-bin/conf/hive-env.sh, but not works.

 -----------------

 NoSuchObjectException(message:default.'inok_datamine.inok_user' table not found)

    my sqoop script command likes:

--hcatalog-table 'inok_datamine.inok_user'

    the above script miss  --hcatalog-database. Correct scipt is:

 

Reference

official docments

http://gethue.com/hadoop-tutorial-how-to-access-hive-in-pig-with/

https://cwiki.apache.org/confluence/display/Hive/HCatalog

http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

 http://www.micmiu.com/bigdata/sqoop/sqoop-setup-and-demo/

 

 

 

 

分享到:
评论

相关推荐

    sqoop-connector-generic-jdbc-1.99.7.jar

    sqoop-connector-generic-jdbc-1.99.7.jar sqoop-connector-generic-jdbc-1.99.7.jar

    sqoop-1.4.7.bin-hadoop-2.6.0

    虚拟机linux使用的sqoop-1.4.7版本

    Sqoop-sqlserver-hdfs.rar

    在这个"Sqoop-sqlserver-hdfs.rar"压缩包中,我们有两个关键文件:sqljdbc.jar和sqoop-sqlserver-1.0.tar.gz,它们是实现SQL Server到HDFS数据迁移的关键组件。 首先,`sqljdbc.jar`是Microsoft提供的Java驱动程序...

    sqoop-1.4.6.bin-hadoop-2.0.4-alpha版本的压缩包,直接下载到本地,解压后即可使用

    Sqoop(发音:skup)是一款开源的工具,主要用于在Hadoop(Hive)与传统的数据库(mysql、postgresql...)间进行数据的传递,可以将一个关系型数据库(例如 : MySQL ,Oracle ,Postgres等)中的数据导进到Hadoop的HDFS中,...

    sqoop-1.4.6-cdh5.14.2.tar系列安装包

    这个压缩包“sqoop-1.4.6-cdh5.14.2.tar”是 Sqoop 的一个特定版本,适用于 Cloudera Distribution Including Apache Hadoop (CDH) 5.14.2。CDH 是一个广泛使用的 Hadoop 发行版,提供了经过优化和集成的 Hadoop ...

    sqoop-from_phoenix-all.sh

    sqoop从phoenix抽取数据到hdfs sqoop import \ --driver org.apache.phoenix.jdbc.PhoenixDriver \ --connect jdbc:phoenix:192.168.111.45:2181 \ ...--num-mappers 1 \ --direct \ --fields-terminated-by ','

    sqoop1-1.4.6 documentation 英文文档

    `sqoop-job` 命令用于创建、运行、删除和列出保存的工作。 ##### 13.2 语法 - **13.2.1 创建工作**:使用 `create` 子命令。 - **13.2.2 运行工作**:使用 `run` 子命令。 - **13.2.3 删除工作**:使用 `delete` ...

    sqoop-1.4.6-hadoop-2.6最小资源包

    这个"sqoop-1.4.6-hadoop-2.6 最小资源包"是针对 Sqoop 1.4.6 版本,专为运行在 Hadoop 2.6 环境优化的精简版安装包。 Sqoop 的主要功能包括: 1. **数据导入**:通过 SQL 查询语句, Sqoop 可以选择性地从数据库...

    sqoop-1.4.4-cdh5.0.6.tar

    `sqoop-1.4.4-cdh5.0.6.tar` 是 Cloudera Distribution 包含 Hadoop(CDH)的一个特定版本的 Sqoop 发行版。 1. **Sqoop 的核心功能**: - 数据导入:Sqoop 可以自动创建 MapReduce 任务,将数据库表的数据分片并...

    sqoop-1.4.7(可直接下载学习使用)附有安装配置教程!

    内容概要:Sqoop 1.4.7 安装包主要包括以下内容:Sqoop 命令行工具:用于执行数据迁移任务的客户端工具。连接器:Sqoop 支持多种数据库连接器,包括 MySQL、PostgreSQL、Oracle 等,用于连接目标数据库。元数据驱动...

    sqoop1.x 导入数据

    ### Sqoop 1.x 数据导入详解 #### 一、Sqoop 概述 Sqoop 是一款开源工具,用于高效地在 Hadoop 和关系型数据库之间传输数据。它通过 JDBC 连接到关系型数据库,并利用 MapReduce 作业并行化数据传输过程。本文将...

    2、sqoop导入(RMDB-mysql、sybase到HDFS-hive)

    Apache Sqoop 是一个用于在关系型数据库(如 MySQL 和 Sybase)与 Hadoop 分布式文件系统(HDFS)之间高效传输数据的工具。在大数据处理中,Sqoop 提供了方便的数据导入和导出功能,它能够将结构化的数据从传统...

    sqoop-1.4.5-cdh5.4.2.tar.gz

    在标题"sqoop-1.4.5-cdh5.4.2.tar.gz"中,我们可以看出这是Sqoop的一个特定版本——1.4.5,针对Cloudera的Distribution包括Hadoop(CDH)的5.4.2版本的打包文件。通常,这种压缩包包含了 Sqoop 源码、二进制文件、...

    sqoop-1.4.7.zip

    1. 将`sqoop-1.4.7.jar`复制到`$SQOOP_HOME/lib`目录下,其中`$SQOOP_HOME`是你的Sqoop安装目录。 2. 如果有其他依赖JAR,也应一并放入`lib`目录。 3. 更新环境变量`CLASSPATH`,包括`$SQOOP_HOME/lib`目录。 4. ...

    sqoop-1.4.7.jar

    把这个sqoop-1.4.7.jar放到sqoop根目录下的lib目录中,即可。 如果你没有积分,也可以自己去这个地址下载:...

    sqoop资源下载 sqoop-1.4.7.bin-hadoop-2.6.0

    sqoop资源下载 sqoop-1.4.7.bin_hadoop-2.6.0

    sqoop-1.4.2.bin__hadoop-2.0.0-alpha.tar

    这个压缩包 "sqoop-1.4.2.bin__hadoop-2.0.0-alpha.tar" 提供的是 Sqoop 1.4.2 版本,适用于与 Hadoop 2.0.0-alpha 版本集成。以下是对这个版本 Sqoop 的详细介绍以及相关的知识点: 1. **Sqoop 的作用**:Sqoop 是...

    Hadoop2.2.0+Hbase0.98.4+sqoop-1.4.4+hive-0.98.1安装手册(All)_ZCX

    叶梓老师整理的Hadoop2.2.0+Hbase0.98.4+sqoop-1.4.4+hive-0.98.1安装手册,非常实用

    Atlas2.3.0依赖: org.restlet/sqoop-1.4.6.2.3.99.0-195

    在IT行业中,我们经常涉及到各种库和框架的集成与使用,这次我们关注的是"Atlas2.3.0"依赖的组件:"org.restlet/sqoop-1.4.6.2.3.99.0-195"。这个依赖包含了三个关键的JAR文件:`sqoop-1.4.6.2.3.99.0-195.jar`,`...

    java连接sqoop源码-quick-sqoop:ApacheSqoopETL工具的快速参考

    sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz /srv/ $ cd /srv $ sudo tar -xvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz $ sudo chown -R hadoop:hadoop sqoop-1.4.6.bin__hadoop-2.0.4-alpha $ sudo ln -s $...

Global site tag (gtag.js) - Google Analytics