Oozie的JavaAction使用

ghost_face

浏览: 55377 次

最近访客更多访客>>

herman_liu76

zhkc123

zyq11223

lfrdreamman

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

oozie

1编写自定义功能的main函数

功能：得到指定目录的信息（该目录下文件个数、该目录的修改时间），并将信息反馈给oozie。

代码如下：

package myTest.oozie;
 
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
 
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.util.Date;
import java.util.Properties;
 
public class GetDirInfo {
//javaAction的输出是一个属性文件
    private static final String OOZIE_ACTION_OUTPUT_PROPERTIES = "oozie.action.output.properties";
 
    public static void main(String[] args) throws Exception {
        String dirPath = "/user/abc/archer";
        String propKey0 = "dir.num-files";
        String propVal0 = "-1";
        String propKey1 = "dir.modified-time";
        String propVal1 = "-1";
        System.out.println("Directory path: '" + dirPath + "'");
 
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path hadoopDir = new Path(dirPath);
        if (fs.exists(hadoopDir)) {
            FileStatus[] files = FileSystem.get(conf).listStatus(hadoopDir);
            int numFilesInDir = files.length;
            propVal0 = Integer.toString(numFilesInDir);
            FileStatus fileStatus = FileSystem.get(conf).getFileStatus(hadoopDir);
            Date date = new Date(fileStatus.getModificationTime());
            propVal1 = date.toLocaleString();
        }
        System.out.printf("%s,%s",propVal0,propVal1);
        String oozieProp = System.getProperty(OOZIE_ACTION_OUTPUT_PROPERTIES);
        if (oozieProp != null) {
            File propFile = new File(oozieProp);
            Properties props = new Properties();
            props.setProperty(propKey0, propVal0);
            props.setProperty(propKey1, propVal1);
            OutputStream os = new FileOutputStream(propFile);
            props.store(os, "");
            os.close();
        } else
            throw new RuntimeException(OOZIE_ACTION_OUTPUT_PROPERTIES
                    + " System property not defined");
    }
}

2.编写workflow.xml文件

文件如下，其中<main-class>指定要使用的类，<capture-output/>表示获取JavaAction的输出。

工作流的过程：

第一步，通过javaAction得到指定目录的信息，并将结果写到"oozie.action.output.properties"文件中，反馈给oozie，成功则进入下一步，否则error；

第二步，通过EL Function，${wf:actionData('getDirInfo')['dir.num-files'] eq 41}，读取输出结果，并进行判断，目录大小=41，则end，否则error；

注：在本程序中，指定目录下的大小为41。

<workflow-app xmlns='uri:oozie:workflow:0.2' name='getDirInformation'>
 
    <start to='getDirInfo' />
 
    <!-- STEP ONE -->
    <action name='getDirInfo'>
         <!--writes 2 properties: dir.num-files: returns -1 if dir doesn't exist,
             otherwise returns # of files in dir dir.modified.time: returns -1 if dir doesn't exist,
             otherwise returns modified time of dir-->
         <java>
             <job-tracker>10.1.1.26:54311</job-tracker>
             <name-node>hdfs://10.1.1.26:31054</name-node>
            <configuration>
                <property>
                    <name>mapred.queue.name</name>
                    <value>default</value>
                </property>
            </configuration>
             <main-class>myTest.oozie.GetDirInfo</main-class>
             <arg></arg>
             <capture-output />
         </java>
         <ok to="makeIngestDecision" />
         <error to="fail" />
     </action>
 
     <!-- STEP TWO -->
     <decision name="makeIngestDecision">
         <switch>
             <!-- empty or doesn't exist -->
             <case to="end">
                ${wf:actionData('getDirInfo')['dir.num-files'] eq 41}
             </case>
             <!-- # of files >= 24 -->
             <case to="fail">
              ${wf:actionData('getDirInfo')['dir.num-files'] ne 41}
             </case>
             <default to="fail"/>
         </switch>
     </decision>
 
     <kill name="fail">
          <message>num is not equal 41,Java failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}]</message>
     </kill>
     <end name='end' />
</workflow-app>