hive的map结果压缩 -

love敏小仪

浏览: 37265 次
性别:
来自: 杭州

最近访客更多访客>>

花开不败lyc

luojianbing

mykstar

benwudashi

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

hive的map结果压缩

博客分类：

学习总结

map的中间结果也可以启用压缩，而且中间结果对输出结果是没有影响的：

hadoop-site.xml中：

<property>
  <name>mapred.compress.map.output</name>
  <value>true</value>
  <description>Should the outputs of the maps be compressed before being
               sent across the network. Uses SequenceFile compression.
  </description>
</property>
<property>
  <name>mapred.map.output.compression.codec</name>
  <value>org.apache.hadoop.io.compress.LzoCodec</value>
  <description>If the map outputs are compressed, how should they be
               compressed?
  </description>
</property>

也可以hive-site.xml中配置：

<property>
  <name>hive.exec.compress.intermediate</name>
  <value>true</value>
  <description>Should the outputs of the maps be compressed before being
               sent across the network. Uses SequenceFile compression.
  </description>
</property>
<property>
  <name>hive.intermediate.compression.codec</name>
  <value>org.apache.hadoop.io.compress.LzoCodec</value>
  <description>If the map outputs are compressed, how should they be
               compressed?
  </description>
</property>

或者直接在HIVE脚本中写：

set hive.exec.compress.intermediate=true;

set hive.intermediate.compression.codec="org.apache.hadoop.io.compress.LzoCodec";

中间结果的压缩，建议采用lzo，因为它速度比较快，不像其他压缩方式比较耗CPU。

当然，如果启用了lzo，也会有上面说的许可证的问题，要保证你的集群机器都单独安装了lzo压缩包。

需要注意的是，lzo在0.19.1中是存在的，但是在0.20之后，因为许可证问题被移除了，是需要单独安装的。

首先需要添加lzo codec，在hadoop-site.xml中添加：

<property>
  <name>io.compression.codecs</name>
  <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.LzoCodec</value>
  <description>A list of the compression codec classes that can be used
               for compression/decompression.</description>
</property>

其他的压缩配置按照上面的写就OK。

配置完以后，可以在job.xml中查看运行的作业的配置是否启用了压缩，也可以使用

hadoop fs -cat 输出结果文件 | more

来查看是否启用压缩。因为输出结果文件的文件头是标注了文件的格式的，如key和value的类名，以及是否压缩。如果启用压缩，你能看到类似下面的输出：

SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Text*org.apache.hadoop.io.compress.DefaultCodec...

分享到：

eclipse java.lang.OutOfMemoryError: Java ... | HIVE使用技巧（一）--union all

2013-04-09 13:34
浏览 1595
评论(0)
分类:互联网
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hive的map结果压缩

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hive的map结果压缩

评论

发表评论

相关推荐

tomcat内存溢出设置JAVA_OPTS

eclipse java.lang.OutOfMemoryError: Java heap space 解决方案

HIVE使用技巧（一）--union all

学习退款数据分析思路

hive文件存储格式小记

最近访客更多访客>>