spring hadoop系列一

dalan_123

浏览: 88299 次
性别:
来自: 郑州

最近访客更多访客>>

Charles2628

fanxingabc16

sdy330441359

linziyuu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

java
大数据
架构
Hadoop

一、要求

1、对于spring hadoop2.1构建在jdk7上面（最低要求：jdk6及其以上），hadoop 2.6 默认构建在spring framework 4.1上。

2、spring for Apache hadoop2.1支持如下hadoop版本

Apache Hadoop 2.4.1

Apache Hadoop 2.5.2

Apache Hadoop 2.6.0

Pivotal HD 2.1

Cloudera CDH5(2.5.0-CDH5.3.0)

Hortonworks Data Platform 2.0

任何通过Apache Hadoop 2.2.x系列分布式都能够使用Spring For Apache Hadoop2.1,同时也支持HBase0.94.11、Hive 0.11.0 、Pig 0.1及其以上版本

在使用spring for Apache hadoop时，使用hadoop版本为基础，查看其所匹配的其他框架的版本

二、spring 和 hadoop

（1）、hadoop 配置

在运行时使用无论是本地的hadoop还是远程hadoop集群，都必须正确的配置以及以及引导hadoop提交job，具体的操作如下，注spring for Apache hadoop 简称 SHDP

第一步：使用shdp的命名空间

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:hdp="http://www.springframework.org/schema/hadoop"
   xsi:schemaLocation="
    http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/hadoop
    http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

   <bean/>

   <hdp:configuration/>
</beans>

注意上述的配置信息，同时也可以修改命名空间相关内容，<beans> 转为<hap>

<?xml version="1.0" encoding="UTF-8"?>
<beans:beans
xmlns="http://www.springframework.org/schema/hadoop"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:beans="http://www.springframework.org/schema/beans"
   xsi:schemaLocation="
    http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
    http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

    <beans:bean id ... >

    <configuration ...>

</beans:beans>

第二步： SHDP javaconfig形式

import org.springframework.context.annotation.Configuration;
import org.springframework.data.hadoop.config.annotation.EnableHadoop
import org.springframework.data.hadoop.config.annotation.SpringHadoopConfigurerAdapter
import org.springframework.data.hadoop.config.annotation.builders.HadoopConfigConfigurer;

@Configuration
@EnableHadoop
static class Config extends SpringHadoopConfigurerAdapter {

  @Override
  public void configure(HadoopConfigConfigurer config) throws Exception {
    config
      .fileSystemUri("hdfs://localhost:8021");
  }

}

其中HadoopConfigConfigurer config参数记录hadoop相关的配置；@EnableHadoop必须关联标示@Configuration的class

第三步：配置hadoop

为了使用hadoop，首先需要Configuration对象配置hadoop的追踪信息、输入输出格式等各种配置参数，来简化工作

<hdp:configuration>定义一个ConfigurationFactoryBean名称默认为hadoopConfiguration的实体bean。

特殊情况需要指定资源的配置

<hdp:configuration resources="classpath:/custom-site.xml, classpath:/hq-site.xml">

完成将两个configuration文件添加到Hadoop Configuration中，除了上述的方法之外，我们可以通过properties文件设定Hadoop的配置信息

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

     <hdp:configuration>
        fs.defaultFS=hdfs://localhost:8020
        hadoop.tmp.dir=/tmp/hadoop
        electric=sea
     </hdp:configuration>
</beans>

常用的参数fs.defaultFS、mapred.job.tracker、yarn.resourcemanager.address通过标签属性file-system-uri、job-tracker-uri、rm-manager-uri

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xmlns:context="http://www.springframework.org/schema/context"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

     <hdp:configuration>
        fs.defaultFS=${hd.fs}
        hadoop.tmp.dir=file://${java.io.tmpdir}
        hangar=${number:18}
     </hdp:configuration>

     <context:property-placeholder location="classpath:hadoop.properties" />
</beans>

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:hdp="http://www.springframework.org/schema/hadoop"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:util="http://www.springframework.org/schema/util"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
        http://www.springframework.org/schema/util http://www.springframework.org/schema/util/spring-util.xsd
        http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">

   <!-- merge the local properties, the props bean and the two properties files -->
   <hdp:configuration properties-ref="props" properties-location="cfg-1.properties, cfg-2.properties">
      star=chasing
      captain=eo
   </hdp:configuration>

   <util:properties id="props" location="props.properties"/>
</beans>

<!-- default name is 'hadoopConfiguration' -->
<hdp:configuration>
    fs.defaultFS=${hd.fs}
    hadoop.tmp.dir=file://${java.io.tmpdir}
</hdp:configuration>

<hdp:configuration id="custom" configuration-ref="hadoopConfiguration">
    fs.defaultFS=${custom.hd.fs}
</hdp:configuration>

在如上的定义的configuration项中，相同的项，后面将覆盖掉前面的

详细hdp:configuration 配置项

Name	                   Values	Description
configuration-ref  Bean Reference   Reference to existing Configuration bean
properties-ref  Bean Reference  Reference to existing Properties bean
properties-location   Comma delimited list  List or Spring Resource paths
resources  Comma delimited list   List or Spring Resource paths
register-url-handler  Boolean  Registers an HDFS url handler in the running VM. Note that t   his operation can be executed at most once in a given JVM hence the default is false. De    faults to false.
file-system-uri  String  The HDFS filesystem address. Equivalent to fs.defaultFS propertys.
job-tracker-uri  String
Job tracker address for HadoopV1. Equivalent to mapred.job.tracker property.
rm-manager-uri String The Yarn Resource manager address for HadoopV2. Equivalent to yarn.re                                    sourcemanager.address property.
user-keytab  String  Security keytab.
user-principal  String  User security principal.
namenode-principal  String  Namenode security principal.
rm-manager-principal   String  Resource manager security principal.
security-method  String    The security method for hadoop.

四、命令行的支持

目前spring-data-hadoop-boot-2.1.2.REALEASE.jar只支持HadoopConfiguration和fsshell的bean

@Grab('org.springframework.data:spring-data-hadoop-boot:2.1.2.RELEASE')

import org.springframework.data.hadoop.fs.FsShell

public class App implements CommandLineRunner {

  @Autowired FsShell shell

  void run(String... args) {
    shell.lsr("/tmp").each() {println "> ${it.path}"}
  }

}

分享到：

spring hadoop系列二（MapReduce and Dist ... | 数据库深度解析 | 从NoSQL历史看未来(转 ...

2015-11-05 23:18
浏览 1815
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论