`

spark取得lzo压缩文件报错 java.lang.ClassNotFoundException: Class com.hadoop.compression

阅读更多

恩,这个问题,反正是我从来没有注意的问题,但今天还是写出来吧

配置信息

hadoop core-site.xml配置

<property>
   <name>io.compression.codecs</name>
        <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.LzmaCodec</value>
    </property>

    <property>
        <name>io.compression.codec.lzo.class</name>
        <value>com.hadoop.compression.lzo.LzoCodec</value>
    </property>12345678910

 

io compression codec 是lzo

spark-env.sh配置

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/nativeexport SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/nativeexport SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/cluster/apps/hadoop/share/hadoop/yarn/:/home/cluster/apps/hadoop/share/hadoop/yarn/lib/:/home/cluster/apps/hadoop/share/hadoop/common/:/home/cluster/apps/hadoop/share/hadoop/common/lib/:/home/cluster/apps/hadoop/share/hadoop/hdfs/:/home/cluster/apps/hadoop/share/hadoop/hdfs/lib/:/home/cluster/apps/hadoop/share/hadoop/mapreduce/:/home/cluster/apps/hadoop/share/hadoop/mapreduce/lib/:/home/cluster/apps/hadoop/share/hadoop/tools/lib/:/home/cluster/apps/spark/spark-1.4.1/lib/123

 

操作信息

启动 spark-shell 
执行如下代码

 val lzoFile  = sc.textFile("/tmp/data/lzo/part-m-00000.lzo")
 lzoFile.count12

 

具体报错信息

java.lang.RuntimeException: Error in configuring object 
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) 
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
        at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) 
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) 
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) 
        at scala.Option.getOrElse(Option.scala:120) 
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) 
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1781) 
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:885) 
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) 
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) 
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) 
        at org.apache.spark.rdd.RDD.collect(RDD.scala:884) 
        at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:105) 
        at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:503) 
        at org.apache.spark.sql.hive.thriftserver.AbstractSparkSQLDriver.run(AbstractSparkSQLDriver.scala:58) 
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:283) 
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) 
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:218) 
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
        at java.lang.reflect.Method.invoke(Method.java:606) 
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665) 
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170) 
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193) 
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) 
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.reflect.InvocationTargetException 
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
        at java.lang.reflect.Method.invoke(Method.java:606) 
        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
        ... 45 more 
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. 
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:135) 
        at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:175) 
        at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) 
        ... 50 more 
Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not found 
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1803) 
        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:128) 
        ... 52 more 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263

 

然后如何解决呢

后来有点怀疑 hadoop core-site.xml配置格式问题,然后让同事帮我跟进hadoop 源码,可以肯定不是hadoop问题 
然后 我就想了想,之前也遇到类似的问题,我是这样配置spark-env.sh

export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/stark_summer/opt/hadoop/hadoop-2.3.0-cdh5.1.0/lib/native/Linux-amd64-64/*:/home/stark_summer/opt/hadoop/hadoop-2.3.0-cdh5.1.0/share/hadoop/common/hadoop-lzo-0.4.15-cdh5.1.0.jar:/home/stark_summer/opt/spark/spark-1.3.1-bin-hadoop2.3/lib/*
export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/stark_summer/opt/hadoop/hadoop-2.3.0-cdh5.1.0/share/hadoop/common/hadoop-lzo-0.4.15-cdh5.1.0.jar:/home/stark_summer/opt/spark/spark-1.3.1-bin-hadoop2.3/lib/*12

 

这个配置是之前fix这个问题的,但是 是很久之前的事情,我早已经忘了,所以这是平日写博客的好处,把每次遇到的问题全部记录下来 
恩?如果我指定具体.jar包,那就没问题了,但是在spark中 难道必须要用 * 来指定某个目录下的所有jar么?那这个跟hadoop还真不一样呢,在hadoop中 我们要指定某个目录下的jar包,都是/xxx/yyy/lib/ 
而spark必须要求/xxx/yyy/lib/*,才能加载到这个目录下的jar包,否则就会包如上错误

修改后的spark-env.sh配置文件

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/nativeexport SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/nativeexport SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/cluster/apps/hadoop/share/hadoop/yarn/*:/home/cluster/apps/hadoop/share/hadoop/yarn/lib/*:/home/cluster/apps/hadoop/share/hadoop/common/*:/home/cluster/apps/hadoop/share/hadoop/common/lib/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/lib/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/lib/*:/home/cluster/apps/hadoop/share/hadoop/tools/lib/*:/home/cluster/apps/spark/spark-1.4.1/lib/*123

 

当再次执行上述代码就没有问题了

但是 但是 但是

如果 我把 /home/cluster/apps/hadoop/lib/native 改成/home/cluster/apps/hadoop/lib/native/*

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/native/*export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/home/cluster/apps/hadoop/lib/native/*export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/cluster/apps/hadoop/share/hadoop/yarn/*:/home/cluster/apps/hadoop/share/hadoop/yarn/lib/*:/home/cluster/apps/hadoop/share/hadoop/common/*:/home/cluster/apps/hadoop/share/hadoop/common/lib/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/*:/home/cluster/apps/hadoop/share/hadoop/hdfs/lib/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/*:/home/cluster/apps/hadoop/share/hadoop/mapreduce/lib/*:/home/cluster/apps/hadoop/share/hadoop/tools/lib/*:/home/cluster/apps/spark/spark-1.4.1/lib/*123

 

尼玛 就会报错如下:

spark.repl.class.uri=http://10.32.24.78:52753) error [Ljava.lang.StackTraceElement;@4efb0b1f2015-09-11 17:52:02,357 ERROR [main] spark.SparkContext (Logging.scala:logError(96)) - Error initializing SparkContext.java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:68)
    at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:60)
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:69)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:513)
    at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1017)
    at $line3.$read$$iwC$$iwC.<init>(<console>:9)
    at $line3.$read$$iwC.<init>(<console>:18)
	at $line3.$read.<init>(<console>:20)
	at $line3.$read$.<init>(<console>:24)
	at $line3.$read$.<clinit>(<console>)
	at $line3.$eval$.<init>(<console>:7)
	at $line3.$eval$.<clinit>(<console>)
	at $line3.$eval.$print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:123)
    at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply(SparkILoopInit.scala:122)
	at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
	at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoopInit.scala:122)
	at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
	at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.scala:157)
	at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
	at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkILoopInit.scala:106)
	at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:64)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException
    at org.apache.spark.io.SnappyCompressionCodec.<init>(CompressionCodec.scala:155)
    ... 56 more1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162

 

此刻我想说

您们城里人就是会玩,我已经被打败了~

尊重原创,拒绝转载,http://blog.csdn.net/stark_summer/article/details/48375999

0
3
分享到:
评论

相关推荐

    hadoop-lzo-0.4.21-SNAPSHOT jars

    集成Hadoop-LZO到你的Hadoop环境,你需要将`hadoop-lzo-0.4.21-SNAPSHOT.jar`添加到Hadoop的类路径中,并配置Hadoop的相关参数,例如在`core-site.xml`中设置`io.compression.codecs`属性,指定支持LZO压缩。...

    hadoop-lzo-0.4.20.jar

    hadoop支持LZO压缩配置 将编译好后的hadoop-lzo-0.4.20.jar 放入hadoop-2.7.2/share/hadoop/... &lt;name&gt;io.compression.codec.lzo.class &lt;value&gt;com.hadoop.compression.lzo.LzoCodec&lt;/value&gt; &lt;/configuration&gt;

    hadoop-lzo-0.4.21-SNAPSHOT.jar

    通过设置`mapreduce.output.fileoutputformat.class`属性为`com.hadoop.compression.lzo.LZOFileOutputFormat`,可以指定输出文件采用LZO压缩。 3. 分布式缓存:为了提高效率,LZO的库文件(包括.lzo文件和相应的...

    hadoop-lzo-0.4.15.jar

    &lt;name&gt;io.compression.codec.lzo.class &lt;value&gt;com.hadoop.compression.lzo.LzoCodec&lt;/value&gt; 看看hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer big_file.lzo 行不行,不行...

    lzo-2.06.tar.gz/lzo-2.10.tar.gz

    2. **生成LZO压缩文件**:可以使用Hadoop的命令行工具,如`hadoop fs -put`和`hadoop fs -get`,配合`-compress codec=lzo`参数来创建和下载LZO压缩的文件。 3. **处理压缩数据**:在MapReduce作业中,必须声明使用...

    hadoop-lzo-0.4.15.tar.gz

    为了在Hadoop集群中使用Hadoop LZO,用户需要将库文件添加到Hadoop的类路径中,并配置Hadoop的属性,如`io.compression.codecs`和`io.compression.codec.lzo.class`,以启用LZO压缩支持。同时,还需要确保集群中的...

    hadoop-lzo-0.4.20-SNAPSHOT.jar

    编译后的hadoop-lzo源码,将hadoop-lzo-0.4.21-SNAPSHOT.jar放到hadoop的classpath下 如${HADOOP_HOME}/share/hadoop/common。hadoop才能正确支持lzo,免去编译的烦恼

    编译好的lzo包可直接使用.rar

    hadoop编译好的lzo包可以直接下载使用java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.

    hadoop集群内lzo的安装与配置.doc

    &lt;value&gt;org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec&lt;/value&gt; ``` 2. **修改`mapred-...

    hadoop-lzo.zip

    &lt;name&gt;io.compression.codec.lzo.class &lt;value&gt;com.hadoop.compression.lzo.LzoCodec&lt;/value&gt; ``` 此外,由于LZO的Native库是C编写的,所以在运行时需要确保系统已经安装了LZO库,并且Hadoop环境能够找到这些...

    lzo-2.09.tar.gz

    LZO(Lempel-Ziv-Oberhumer)是一种数据压缩算法,由V. Lempel、A. Ziv和J. Oberhumer在1986年提出。这个算法以其快速、简单和低内存需求而闻名,尤其适用于嵌入式系统和实时应用。在给定的文件"lzo-2.09.tar.gz"中...

    hadoop-lzo-release-0.4.20.zip

    LZO在Hadoop中的实现可能包括Java源文件,这些文件实现了Hadoop与LZO压缩库的接口和逻辑。 2. **构建脚本**:如`build.xml`或`pom.xml`,这些脚本用于构建和打包项目,通常使用Ant或Maven等工具。用户可以通过运行...

    hadoop-lzo-master

    1.安装 Hadoop-gpl-compression 1.1 wget ...bin/hadoop jar /usr/local/hadoop-1.0.2/lib/hadoop-lzo-0.4.15.jar com.hadoop.compression.lzo.LzoIndexer /home/hadoop/project_hadoop/aa.html.lzo

    2.Hadoop-lzo.7z lzo源码+包

    标题中的“2.Hadoop-lzo.7z lzo源码+包”指的是一个包含Hadoop-LZO相关的源代码和预编译的库文件的压缩包。Hadoop-LZO是Hadoop生态系统中的一个扩展,它提供了对LZO(一种高效的压缩算法)的支持。LZO是一种快速的...

Global site tag (gtag.js) - Google Analytics