HIVE自定义处理时间格式的UDF

zl520878

浏览: 28667 次
性别:
来自: 南京

最近访客更多访客>>

neoneus

FishBoy

kindred

ericchunli

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hive
Hadoop
Java

工作过程中遇到很多情况需要进行日期格式处理，通常用下面的方法来实现：

from_unixtime(to_unix_timestamp('20150813','yyyyMMdd'),'yyyy-MM-dd')

2015-08-13

如果再需要对日期进行一下加减多少天的话，还需要再增加长度：

date_sub(from_unixtime(to_unix_timestamp('20150813','yyyyMMdd'),'yyyy-MM-dd'),5)

2015-08-08

如果还需要把日期转换成整形的话，还需要再套个两层：

from_unixtime(to_unix_timestamp(date_sub(from_unixtime(to_unix_timestamp('20150813','yyyyMMdd'),'yyyy-MM-dd'),5),'yyyy-MM-dd'),'yyyyMMdd')

20150808

这样不仅使用不方便，而且代码的整洁度不好。

所以写了以下的UDF，先看效果：

myfunc('date_format','20150813','yyyyMMdd','yyyyMMdd-5d')

20150808

复杂点：

myfunc('date_format','2015-08-13 12:45:22','yyyy-MM-dd HH:mm:ss','yyyyMMddHHmmss-5d+3y-1s+1M')

20180908010244

UDF使用方法和说明：

hive> add jar /home/sospdm/Myfunc.jar;
Added /home/sospdm/Myfunc.jar to class path
Added resource: /home/sospdm/Myfunc.jar
hive> create temporary function myfunc as 'sospdm.udf.Myfunc';

udf可以支持多个参数传入，第一个参数为内置方法名称，通过不同的方法名可以执行不同的方法，方便后期扩展UDF功能，目前只支持一个'date_format'，第二个参数是输入的日期，第三个是输入参数格式，第四个是输出日期的格式（可以做加减天数）。

代码如下：

package sospdm.udf;
    
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;

import java.util.Date;
import java.util.Calendar;
import java.util.HashMap;
import java.util.Map;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.text.SimpleDateFormat;
    
    
    
    @Description(name = "myfunc",
    			value="_FUNC_(func_nm,parma1,parma2,[parma3],...)"+"",
    			extended = "Example:\n"
    					 + "> select _FUNC_('concat_low','aBc','dEf') from src;\n"
    					 + "Return:\n"
    					 + "abcdef;")
    
    
    public class Myfunc extends UDF{
    	

    	//***************************数据日期格式转换和加减计算****************************//
    	
		private static Date getDateAfter(Date dt,int[] num){  
			   Calendar now =Calendar.getInstance();  
			   now.setTime(dt);
			   now.add(Calendar.YEAR,num[0]);
			   now.add(Calendar.MONTH,num[1]);
			   now.add(Calendar.DAY_OF_MONTH,num[2]);
			   now.add(Calendar.HOUR_OF_DAY,num[3]);
			   now.add(Calendar.MINUTE,num[4]);
			   now.add(Calendar.SECOND,num[5]);
			   return now.getTime();
			   
			  }
		
		
		private static String getfmt(String str_input,String str_inputformat,String str_outputformat){
			
			try{
				

				//*********************通过正则表达式解析str_outputformat*********************
				SimpleDateFormat in_fmt = new SimpleDateFormat(str_inputformat);
				SimpleDateFormat out_fmt2 = new SimpleDateFormat(str_outputformat);
				//最长支持增加或者减少12位数的计算
				Pattern p = Pattern.compile("((\\+|-)\\d{1,12})([A-Za-z])");
				
				Matcher m = p.matcher(str_outputformat);
				Matcher m1 = p.matcher(str_outputformat);
				
				Date dt_input = in_fmt.parse(str_input);
				
				
				if(m1.find()){
					
					int pos_exg = m1.start();
					//*********************获取输入参数、输入参数格式参数格式*****************
					
					
					SimpleDateFormat out_fmt = new SimpleDateFormat(str_outputformat.substring(0,pos_exg));
					
					
					//新建HashMap类存储解析后的各个字符
					Map<String,Integer> map = new HashMap<String,Integer>();
					
					//初始化map
					map.put("y", 0);
					map.put("M", 0);
					map.put("d", 0);
					map.put("H", 0);
					map.put("m", 0);
					map.put("s", 0); 
					
					//第三个组作为KEY，第一组作为VALUE，更新map
					while(m.find()){
						map.put(m.group(3), Integer.valueOf(m.group(1)).intValue());
					}
					
					//数组初始化
					int[] num_arr = 
						{
							map.get("y"),
							map.get("M"),
							map.get("d"),
							map.get("H"),
							map.get("m"),
							map.get("s")
						};
					return out_fmt.format(getDateAfter(dt_input,num_arr));
				} else {
					
					return out_fmt2.format(dt_input);
				}
				
			}
			
			catch(Exception e){
				 
	    		e.printStackTrace();
	    		
			}
			return null;
		}
		
		
		
		
		//******************************UDF调用代码*************************************//
		
        public String evaluate(String func_nm,String... s){
        	try{

        		switch(func_nm){

                //日期格式转换
        		case "date_format":
        			return getfmt(s[0],s[1],s[2]);
        			
        		default:
        			return null;
        		}
        	} catch(Exception e){
        		e.printStackTrace();
        		return func_nm;
        	}
        }  
    	
    }

1
顶

2
踩

分享到：

DB2多行结果合并到一行排列输出（类似递归 ... | 多个MapReduce任务实现任务间相互依赖

2015-09-16 11:56
浏览 5676
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论