Sqoop1: Sqoop-HCatalog Integration

ylzhj02

浏览: 248857 次
性别:
来自: 成都

最近访客更多访客>>

daqin

bbpopeye

也许on

learnmore

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Sqoop

I create a HCat table using Hue metastore manager, and submit a sqoop job with hcat through hue, the command show as blow:

import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj 
--password 123456 --table sample_user --split-by user_id -m 2 
--hcatalog-table sample_raw.sample_user

Error:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hcatalog/mapreduce/HCatOutputFormat

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hadoop/hive/conf/HiveConf$ConfVars
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf$ConfVars

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, java.lang.NoClassDefFoundError: javax/jdo/JDOException
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoClassDefFoundError: javax/jdo/JDOException
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
....
Caused by: java.lang.ClassNotFoundException: javax.jdo.JDOException

similars :
https://issues.apache.org/jira/browse/HCATALOG-380
https://issues.apache.org/jira/browse/PIG-2666

java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1168)

a. copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog to oozie-4.0.1/share/lib/sqoop

this will solve the first error,but the second error comes after the first one disapare.

I think two ways to fix the second one.

1)change hadoop-env.sh adding HADOOP_CLASSPATH pointing to hcatalog jars which are copyed to cluster nodes from local dir oozie-4.0.1/share/lib/hcatalog. But fails.

2.copy all jars in share/lib/hcatalog to share/lib/sqoop and upgrade it to hdfs sharelib. This way solve the second error but lead to the third one. Fuck!

3. copy all jars in shar/lib/hive/ to share/lib/sqoop/ and upgrade sharelib

//till now, the first three errors solved, but produce the fourth one

4. The fourth error is due to version problem

"datanucleus-api-jdo-3.0.0-release.jar" does NOT contain  
 org.datanucleus.jdo.JDOPersistenceManagerFactory. 
It contains  "org.datanucleus.api.jdo.JDOPersistenceManagerFactory".

I find that in oozie-4.0.1

./share/lib/sqoop/datanucleus-rdbms-2.0.3.jar
./share/lib/sqoop/datanucleus-connectionpool-2.0.3.jar
./share/lib/sqoop/datanucleus-core-2.0.3.jar
./share/lib/sqoop/datanucleus-enhancer-2.0.3.jar
./share/lib/hive/datanucleus-rdbms-2.0.3.jar
./share/lib/hive/datanucleus-connectionpool-2.0.3.jar
./share/lib/hive/datanucleus-core-2.0.3.jar
./share/lib/hive/datanucleus-enhancer-2.0.3.jar

and in hive-0.12.0-bin

./lib/datanucleus-core-3.2.2.jar
./lib/datanucleus-rdbms-3.2.1.jar
./lib/datanucleus-api-jdo-3.2.1.ja

cp datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar datanucleus-api-jdo-3.2.1.ja to share/lib/sqoop

and upgrade sharelib.

Note: when compile oozie, the hive version is not 0.12.0, so lead to these errors.

http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

******************************************************************************

0. retain the original jars in share/lib/sqoop

1. copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog to oozie-4.0.1/share/lib/sqoop

2. copy all jars in hive-0.12.0-bin/lib/ to oozie-4.0.1/share/lib/sqoop

3. copy sqoop-1.4.4.bin__hadoop-2.0.4-alpha/sqoop-1.4.4.jar to oozie-4.0.1/share/lib/sqoop

4. update sharelib

All errors above listed disappers. But the new one comes

 java.io.IOException: NoSuchObjectException(message:inok_datamine.inok_user table not found)

I geuss sqoop job submmited through hue couldn't access hive metastore. But can't fix till now.

Good news: I set the entity in hive-site.xml

<name>hive.metastore.uris</name>
<value>thrift://192.168.122.1:9083</value>

and upload it to hdfs hive/hive-site.xml meanwhile add it to the sqoop job in hue.

start metastore by

hive --service metastore  //default port is 9083
hive --service metastore -p <port_num>

But the last error maybe comes

32064 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog full table schema fields = [user_id, user_name, first_letter, live_city]
33238 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog table partitioning key fields = []
33241 [main] ERROR org.apache.sqoop.Sqoop  - Got exception running Sqoop: java.lang.NullPointerException
Intercepting System.exit(1)

details

java.lang.NullPointerException
	at org.apache.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:99)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:344)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:658)
	at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:232)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:600)
	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
	at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:203)
	at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:172)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
	at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]

After two days struggle, I finally realized that the hive metastore is the same with hcatalog in hive 0.12. So use --hive-XXX to replace --hcatalog-xxxx. The following command is correct.

    import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj   
    --password 123456 --table sample_user --split-by user_id -m 2   
    --hive-database sample_raw --hive-table sample_user --hive-import

**********************************************************************************

The entity in hive-site.xml shown below is confused me.

<property>
        <name>oozie.service.HCatAccessorService.jmsconnections</name>
        <value>
        default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory
        </value>
        <description>
        Specify the map of endpoints to JMS configuration properties. In general, endpoint
        identifies the HCatalog server URL. "default" is used if no endpoint is mentioned
        in the query. If some JMS property is not defined, the system will use the property
        defined jndi.properties. jndi.properties files is retrieved from the application classpath.
        Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers.
        hcat://${1}.${2}.server.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}:61616
        </description>
   </property>

see:http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-foundhttp://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

The similar problem with pig using hcatalog appears. see:http://ylzhj02.iteye.com/admin/blogs/2043781

NOTE:With the support for HCatalog added to Sqoop, any HCatalog job depends on a set of jar files being available both on the Sqoop client host and where the Map/Reduce tasks run. To run HCatalog jobs, the environment variable HADOOP_CLASSPATH must be set up as shown below before launching the Sqoop HCatalog jobs.

HADOOP_CLASSPATH=$(hcat -classpath)

export HADOOP_CLASSPATH

The necessary HCatalog dependencies will be copied to the distributed cache automatically by the Sqoop job.

I add the above two lines to ~/.bashrc and hive-0.12.0-bin/conf/hive-env.sh, but not works.

-----------------

 NoSuchObjectException(message:default.'inok_datamine.inok_user' table not found)

my sqoop script command likes:

--hcatalog-table 'inok_datamine.inok_user'

the above script miss --hcatalog-database. Correct scipt is:

Reference

official docments

http://gethue.com/hadoop-tutorial-how-to-access-hive-in-pig-with/

https://cwiki.apache.org/confluence/display/Hive/HCatalog

http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

http://www.micmiu.com/bigdata/sqoop/sqoop-setup-and-demo/

分享到：

Sqoop1: 1.4.4 commands list getted from ... | Sqoop1: commands -- import

2014-04-14 00:50
浏览 8136
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论