`
wang吖
  • 浏览: 238293 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

HIve实战分析Hadoop的日志

 
阅读更多



1、日志格式分析
首先分析 Hadoop 的日志格式, 日志是一行一条, 日志格式可以依次描述为:日期、时间、级别、相关类和提示信息。如下所示: 

 

2013-03-06 15:23:48,132 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = ubuntu/127.0.0.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.1.1
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1 -r 1411108; compiled by 'hortonfo' on Mon Nov 19 10:48:11 UTC 2012
************************************************************/
2013-03-06 15:23:48,288 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-03-06 15:23:48,298 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-03-06 15:23:48,299 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-03-06 15:23:48,299 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-03-06 15:23:48,423 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-03-06 15:23:48,427 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-03-06 15:23:53,094 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean
2013-03-06 15:23:53,102 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened data transfer server at 50010
2013-03-06 15:23:53,105 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s
2013-03-06 15:23:58,189 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2013-03-06 15:23:58,331 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2013-03-06 15:23:58,346 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2013-03-06 15:23:58,346 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50075
2013-03-06 15:23:58,346 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50075 webServer.getConnectors()[0].getLocalPort() returned 50075
2013-03-06 15:23:58,346 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50075
2013-03-06 15:23:58,347 INFO org.mortbay.log: jetty-6.1.26
2013-03-06 15:23:58,719 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50075
2013-03-06 15:23:58,724 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered.
2013-03-06 15:23:58,726 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source DataNode registered.
2013-03-06 15:24:03,904 INFO org.apache.hadoop.ipc.Server: Starting SocketReader
2013-03-06 15:24:03,909 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcDetailedActivityForPort50020 registered.
2013-03-06 15:24:03,909 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source RpcActivityForPort50020 registered.
2013-03-06 15:24:03,910 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(localhost.localdomain:50010, storageID=DS-2039125727-127.0.1.1-50010-1362105928671, infoPort=50075, ipcPort=50020)
2013-03-06 15:24:03,922 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished generating blocks being written report for 1 volumes in 0 seconds
2013-03-06 15:24:03,926 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
2013-03-06 15:24:03,926 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.11.157:50010, storageID=DS-2039125727-127.0.1.1-50010-1362105928671, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/home/hadoop/hadoop-datastore/dfs/data/current'}
2013-03-06 15:24:03,932 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting
2013-03-06 15:24:03,932 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2013-03-06 15:24:03,934 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 8ms
2013-03-06 15:24:03,934 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting
2013-03-06 15:24:03,934 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting
2013-03-06 15:24:03,950 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting
2013-03-06 15:24:03,951 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2013-03-06 15:24:03,956 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 1 ms
2013-03-06 15:24:03,961 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 12 blocks took 1 msec to generate and 5 msecs for RPC and NN processing
2013-03-06 15:24:03,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner.
2013-03-06 15:24:03,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Generated rough (lockless) block report in 0 ms
2013-03-06 15:24:03,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled asynchronous block report against current state in 0 ms
2013-03-06 15:24:04,004 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-03-06 15:24:04,047 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_3810479607061332370_1201
2013-03-06 15:24:34,274 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_8724520321365706382_1202 src: /192.168.11.157:42695 dest: /192.168.11.157:50010
2013-03-06 15:24:34,282 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42695, dest: /192.168.11.157:50010, bytes: 4, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-328627796_1, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_8724520321365706382_1202, duration: 1868644
2013-03-06 15:24:34,282 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_8724520321365706382_1202 terminating
2013-03-06 15:24:36,967 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block blk_3810479607061332370_1201 file /home/hadoop/hadoop-datastore/dfs/data/current/blk_3810479607061332370 for deletion
2013-03-06 15:24:36,969 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block blk_3810479607061332370_1201 at file /home/hadoop/hadoop-datastore/dfs/data/current/blk_3810479607061332370
2013-03-06 15:24:42,130 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-7687594967083109639_1203 src: /192.168.11.157:42698 dest: /192.168.11.157:50010
2013-03-06 15:24:42,135 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42698, dest: /192.168.11.157:50010, bytes: 3, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_-7687594967083109639_1203, duration: 1823671
2013-03-06 15:24:42,135 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-7687594967083109639_1203 terminating
2013-03-06 15:24:42,159 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_8851175106166281673_1204 src: /192.168.11.157:42699 dest: /192.168.11.157:50010
2013-03-06 15:24:42,162 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42699, dest: /192.168.11.157:50010, bytes: 38, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_8851175106166281673_1204, duration: 496431
2013-03-06 15:24:42,163 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_8851175106166281673_1204 terminating
2013-03-06 15:24:42,177 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42700, bytes: 42, op: HDFS_READ, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_8851175106166281673_1204, duration: 598594
2013-03-06 15:24:42,401 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-3564732110216498100_1206 src: /192.168.11.157:42701 dest: /192.168.11.157:50010
2013-03-06 15:24:42,402 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42701, dest: /192.168.11.157:50010, bytes: 109, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_-3564732110216498100_1206, duration: 465158
2013-03-06 15:24:42,404 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-3564732110216498100_1206 terminating
2013-03-06 15:24:42,593 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_2602280850343619161_1208 src: /192.168.11.157:42702 dest: /192.168.11.157:50010
2013-03-06 15:24:42,594 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42702, dest: /192.168.11.157:50010, bytes: 111, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2602280850343619161_1208, duration: 457596
2013-03-06 15:24:42,595 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_2602280850343619161_1208 terminating
2013-03-06 15:24:42,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-8499292753361571333_1208 src: /192.168.11.157:42703 dest: /192.168.11.157:50010
2013-03-06 15:24:42,673 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_2168216133004853837_1209 src: /192.168.11.157:42704 dest: /192.168.11.157:50010
2013-03-06 15:24:42,676 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42704, dest: /192.168.11.157:50010, bytes: 848, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 705024
2013-03-06 15:24:42,676 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_2168216133004853837_1209 terminating
2013-03-06 15:24:42,691 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42705, bytes: 340, op: HDFS_READ, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 512, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 913742
2013-03-06 15:24:42,709 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42706, bytes: 856, op: HDFS_READ, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 462507
2013-03-06 15:24:42,724 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42707, bytes: 340, op: HDFS_READ, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 512, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 364763
2013-03-06 15:24:42,726 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42708, bytes: 856, op: HDFS_READ, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 432228
2013-03-06 15:24:42,739 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42703, dest: /192.168.11.157:50010, bytes: 421, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_-8499292753361571333_1208, duration: 116933097
2013-03-06 15:24:42,739 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-8499292753361571333_1208 terminating
2013-03-06 15:24:42,759 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-6232731177153285690_1209 src: /192.168.11.157:42709 dest: /192.168.11.157:50010
2013-03-06 15:24:42,764 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42709, dest: /192.168.11.157:50010, bytes: 134, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_-6232731177153285690_1209, duration: 2742705
2013-03-06 15:24:42,765 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_-6232731177153285690_1209 terminating
2013-03-06 15:24:42,803 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_6878738047819289992_1210 src: /192.168.11.157:42710 dest: /192.168.11.157:50010
2013-03-06 15:24:42,806 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:42710, dest: /192.168.11.157:50010, bytes: 727, op: HDFS_WRITE, cliID: DFSClient_hb_m_localhost.localdomain,60000,1362554661390_792638511_9, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_6878738047819289992_1210, duration: 1048999
2013-03-06 15:24:42,807 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_6878738047819289992_1210 terminating
2013-03-06 15:24:49,347 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42716, bytes: 340, op: HDFS_READ, cliID: DFSClient_hb_rs_localhost.localdomain,60020,1362554662758_1605864397_26, offset: 512, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 317106
2013-03-06 15:24:49,359 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42717, bytes: 856, op: HDFS_READ, cliID: DFSClient_hb_rs_localhost.localdomain,60020,1362554662758_1605864397_26, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 460452
2013-03-06 15:24:49,455 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42718, bytes: 516, op: HDFS_READ, cliID: DFSClient_hb_rs_localhost.localdomain,60020,1362554662758_1605864397_26, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 264641
2013-03-06 15:24:49,456 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.11.157:50010, dest: /192.168.11.157:42719, bytes: 516, op: HDFS_READ, cliID: DFSClient_hb_rs_localhost.localdomain,60020,1362554662758_1605864397_26, offset: 0, srvID: DS-2039125727-127.0.1.1-50010-1362105928671, blockid: blk_2168216133004853837_1209, duration: 224282
2013-03-06 15:24:50,615 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-55581707144444311_1211 src: /192.168.11.157:42722 dest: /192.168.11.157:50010
2013-03-06 15:38:17,696 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at ubuntu/127.0.0.1
************************************************************/

 

2、Hive表的定义如下:

 

create table if not exists loginfo(
    rdate string,
    time array<string>,
    type string,
    relateclass string,
    information1 string,
    information2 string,
    information3 string)
row format delimited fields terminated by ' '
collection items terminated by ','  
map keys terminated by  ':';

 

 

3、MySql表定义

 

drop table if exists  hadooplog;
create table hadooplog(
    id int(11) not null auto_increment,
    rdate varchar(50)  null,
    time varchar(50) default null,
    type varchar(50) default null,
    relateclass tinytext default null,
    information longtext default null,
    primary key (id)
) engine=innodb default charset=utf8;

 

 

4、程序代码

    1)DBHelper: 负责建立与 Hive 和 MySQL 的连接

 

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;

/**
 * 负责连接Hive及mysql数据库
 * 
 * @author 吖大哥
 * 
 */
public class DBHelper {

	private static Connection connToHive = null;
	private static Connection connToMySQL = null;

	private DBHelper() {
	}

	// 获得与 Hive 连接,如果连接已经初始化,则直接返回
	public static Connection getHiveConn() throws SQLException {
		if (connToHive == null) {
			try {
				Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
			} catch (ClassNotFoundException err) {
				err.printStackTrace();
				System.exit(1);
			}
			// hadoop3 为集群hive所在节点的IP地址
			connToHive = DriverManager.getConnection(
					"jdbc:hive://hadoop3:10000/default", "hive", "mysql");
		}
		return connToHive;
	}

	// 获得与 MySQL 连接
	public static Connection getMySQLConn() throws SQLException {
		if (connToMySQL == null) {
			try {
				Class.forName("com.mysql.jdbc.Driver");
			} catch (ClassNotFoundException err) {
				err.printStackTrace();
				System.exit(1);
			}

			// hadoop2为集群mysql安装IP地址
			connToMySQL = DriverManager
					.getConnection(
							"jdbc:mysql://hadoop2:3306/ha?useUnicode=true&characterEncoding=UTF8",
							"root", "hadoop"); // 编码不要写成UTF-8
		}
		return connToMySQL;
	}

	public static void closeHiveConn() throws SQLException {
		if (connToHive != null) {
			connToHive.close();
		}
	}

	public static void closeMySQLConn() throws SQLException {
		if (connToMySQL != null) {
			connToMySQL.close();
		}
	}

	public static void main(String[] args) throws SQLException {
		System.out.println(getMySQLConn());
		closeMySQLConn();
	}

}

 

 

     2)HiveUtil:针对 Hive 的工具类:

 

import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

/**
 * Hive 数据处理工具类
 * 
 * @author 吖大哥
 * 
 */
public class HiveUtil {

	// 创建表
	public static void createTable(String sql) throws SQLException {
		Connection conn = DBHelper.getHiveConn();
		Statement stmt = conn.createStatement();
		ResultSet res = stmt.executeQuery(sql);
	}

	// 依据条件查询数据
	public static ResultSet queryData(String sql) throws SQLException {
		Connection conn = DBHelper.getHiveConn();
		Statement stmt = conn.createStatement();
		ResultSet res = stmt.executeQuery(sql);
		return res;
	}

	// 加载数据
	public static void loadData(String sql) throws SQLException {
		Connection conn = DBHelper.getHiveConn();
		Statement stmt = conn.createStatement();
		ResultSet res = stmt.executeQuery(sql);
	}

	// 把数据存储到 MySQL 中
	public static void hiveToMySQL(ResultSet res) throws SQLException {
		Connection conn = DBHelper.getMySQLConn();
		Statement stmt = conn.createStatement();
		while (res.next()) {
			String rdate = res.getString(1);
			String time = res.getString(2);
			String type = res.getString(3);
			String relateclass = res.getString(4);
			String information = res.getString(5) + res.getString(6)
					+ res.getString(7);
			StringBuffer sql = new StringBuffer();
			sql.append("insert into hadooplog values(0,'");
			sql.append(rdate + "','");
			sql.append(time + "','");
			sql.append(type + "','");
			sql.append(relateclass + "','");
			sql.append(information + "')");

			int i = stmt.executeUpdate(sql.toString());
		}
	}
}

 

 

    3)日志分析处理类 AnalyseHadoopLog

 

import java.sql.ResultSet;
import java.sql.SQLException;

/**
 * 分析Hadoop日志
 * 
 * @author 吖大哥
 * 
 */
public class AnalyseHadoopLog {
	public static void main(String[] args) throws SQLException {
		StringBuffer sql = new StringBuffer();

		// 第一步:在 Hive 中创建表
		sql.append("create table if not exists loginfo( ");
		sql.append("rdate string,  ");
		sql.append("time array<string>, ");
		sql.append("type string, ");
		sql.append("relateclass string, ");
		sql.append("information1 string, ");
		sql.append("information2 string, ");
		sql.append("information3 string)  ");
		sql.append("row format delimited fields terminated by ' '  ");
		sql.append("collection items terminated by ','   ");
		sql.append("map keys terminated by  ':'");

		System.out.println(sql);
		HiveUtil.createTable(sql.toString());

		// 第二步:加载 Hadoop 日志文件
		sql.delete(0, sql.length());
		sql.append("load data local inpath ");
		sql.append("'/home/hadoop01/hadooplog'");
		sql.append(" overwrite into table loginfo");
		System.out.println(sql);
		HiveUtil.loadData(sql.toString());

		// 第三步:查询有用信息
		sql.delete(0, sql.length());
		sql.append("select rdate,time[0],type,relateclass,");
		sql.append("information1,information2,information3 ");
		sql.append("from loginfo where type='INFO'");
		System.out.println(sql);
		ResultSet res = HiveUtil.queryData(sql.toString());

		// 第四步:查出的信息经过变换后保存到 MySQL 中
		HiveUtil.hiveToMySQL(res);

		// 第五步:关闭 Hive 连接
		DBHelper.closeHiveConn();

		// 第六步:关闭 MySQL 连接
		DBHelper.closeMySQLConn();
	}
}

 

 

5、查看操作结果

    1)hive中的数据(部分数据):

 

hive> select * from loginfo
    > ;
OK
2013-03-06      ["15:23:48","132"]      INFO    org.apache.hadoop.hdfs.server.datanode.DataNode:        STARTUP_MSG:    NULL
/************************************************************   null    NULL    NULL    NULL    NULL    NULL
STARTUP_MSG:    ["Starting"]    DataNode        NULL    NULL    NULL    NULL
STARTUP_MSG:    []              host    =       ubuntu/127.0.0.1        NULL
STARTUP_MSG:    []              args    =       []      NULL
STARTUP_MSG:    []              version =       1.1.1   NULL
STARTUP_MSG:    []              build   =       https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1      -r
************************************************************/   null    NULL    NULL    NULL    NULL    NULL
2013-03-06      ["15:23:48","288"]      INFO    org.apache.hadoop.metrics2.impl.MetricsConfig:  loaded  properties      from
2013-03-06      ["15:23:48","298"]      INFO    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:   MBean   for     source
2013-03-06      ["15:23:48","299"]      INFO    org.apache.hadoop.metrics2.impl.MetricsSystemImpl:      Scheduled       snapshot period
2013-03-06      ["15:23:48","299"]      INFO    org.apache.hadoop.metrics2.impl.MetricsSystemImpl:      DataNode        metrics  system
2013-03-06      ["15:23:48","423"]      INFO    org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:   MBean   for     source
2013-03-06      ["15:23:48","427"]      WARN    org.apache.hadoop.metrics2.impl.MetricsSystemImpl:      Source  name    u

 

 

    2):mysql中的数据(部分数据):    

 

mysql> select * from hadooplog;
+----+------------+----------+------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
| id | rdate      | time     | type | relateclass                                                  | information                                                                                                     |
+----+------------+----------+------+--------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+
|  1 | 2013-03-06 | 15:23:48 | INFO | org.apache.hadoop.hdfs.server.datanode.DataNode:             | STARTUP_MSG:null                                                                                                |
|  2 | 2013-03-06 | 15:23:48 | INFO | org.apache.hadoop.metrics2.impl.MetricsConfig:               | loadedpropertiesfrom                                                                                            |
|  3 | 2013-03-06 | 15:23:48 | INFO | org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:        | MBeanforsource                                                                                                  |
|  4 | 2013-03-06 | 15:23:48 | INFO | org.apache.hadoop.metrics2.impl.MetricsSystemImpl:           | Scheduledsnapshotperiod                                                                                         |
|  5 | 2013-03-06 | 15:23:48 | INFO | org.apache.hadoop.metrics2.impl.MetricsSystemImpl:           | DataNodemetricssystem                                                                                           |
|  6 | 2013-03-06 | 15:23:48 | INFO | org.apache.hadoop.metrics2.impl.MetricsSourceAdapter:        | MBeanforsource                                                                                                  |
|  7 | 2013-03-06 | 15:23:53 | INFO | org.apache.hadoop.hdfs.server.datanode.DataNode:             | RegisteredFSDatasetS

 

 

 

 

  • 大小: 29.9 KB
分享到:
评论
2 楼 haoxuexi87 2018-05-10  
【学途无忧网】Hive详解及实战
课程学习地址:http://www.xuetuwuyou.com/course/187
课程出自学途无忧网:http://www.xuetuwuyou.com

课程简介:
课程由浅入深,介绍了Hive基本架构及环境部署,带领大家认识数据仓库Hive的优势及Hive具体使用。并通过企业实际需求分析,讲解HiveQL中的DDL和DML的使用,以及常见的性能优化方案。
1 楼 siophy 2016-09-01  
楼主没有续作吗?没感觉到分析功能的存在,只看到了数据的迁移

相关推荐

    基于Hadoop网站流量日志数据分析系统.zip

    基于Hadoop网站流量日志数据分析系统 1、典型的离线流数据分析系统 2、技术分析 - Hadoop - nginx - flume - hive - mysql - springboot + mybatisplus+vcharts nginx + lua 日志文件埋点的 基于Hadoop网站流量...

    大数据 hive 实战数据

    在大数据处理领域,Hive是一个极其重要的工具,它被广泛应用于大数据分析和数据仓库操作。本实战数据集主要涉及两个核心部分:`video`数据和`user`数据,这些都是构建大数据分析模型的基础元素。让我们深入探讨一下...

    基于Hadoop的网站流量日志数据分析系统项目源码+教程.zip

    基于Hadoop网站流量日志数据分析系统项目源码+教程.zip网站流量日志数据分析系统 典型的离线流数据分析系统 技术分析 hadoop nginx flume hive sqoop mysql springboot+mybatisplus+vcharts 基于Hadoop网站流量日志...

    HIVE实战测试数据,HIVE实战测试数据

    在HIVE实战测试中,数据通常来自各种来源,如日志文件、交易记录、社交媒体数据等。这些数据经过预处理后,被转化为Hive可识别的格式,如CSV或JSON,然后上传到HDFS(Hadoop分布式文件系统)中。测试数据的选择至关...

    hive+hadoop win 部署

    - **日志分析**:当遇到问题时,查看Hadoop和Hive的日志文件,它们通常位于logs目录下,有助于定位问题。 通过以上步骤,你可以在Windows环境中成功部署Hive和Hadoop,实现大数据处理和分析。不过要注意,虽然...

    hadoop硬实战

    9. **Hadoop实战案例**:书中可能包含实际的案例分析,如网页日志分析、推荐系统、机器学习等,这些案例有助于将理论知识应用到实践中,提升解决问题的能力。 10. **大数据分析与可视化**:结合Hadoop与其他工具...

    实战hadoop源代码

    7. **实战案例**:书中的实战部分可能涉及实际业务场景,如网页点击流分析、日志处理、推荐系统等,这些案例将帮助你了解Hadoop在真实项目中的应用。 通过深入学习和实践这些源代码,你不仅可以掌握Hadoop的基本...

    4703031《Hadoop大数据处理实战》(康开锋)423-1资源包.rar

    9. **案例分析**:书中可能包含各种实际案例,如日志分析、推荐系统、社交网络分析等,通过这些案例来展示Hadoop在实际工作中的应用。 10. **Hadoop与其他工具集成**:例如,Hadoop可以与Spark、Storm等实时计算...

    【63课时完整版】大数据实践HIVE详解及实战

    43.复杂日志分析-字段提取及临时表的创建 44.复杂日志分析-指标结果的分析实现 45.Hive中数据文件的存储格式介绍及对比 46.常见的压缩格式及MapReduce的压缩介绍 47.Hadoop中编译配置Snappy压缩 48.Hadoop及Hive配置...

    HADOOP+HBASE+HIVE整合工程和文档

    总结起来,Hadoop、HBase和Hive的整合旨在实现大数据的高效存储、快速查询和深度分析。在Hadoop的分布式环境中,HBase提供了实时的数据存储,而Hive则提供了便捷的数据分析接口,两者的结合使得大数据处理更加灵活和...

    hadoop几个实例

    例如,可能有一个实例是使用Hadoop处理日志文件,分析用户行为;或者使用MapReduce计算大规模数据集的统计指标,如平均值、最大值和最小值。 5. **Hadoop生态系统**:Hadoop并不是孤立的,它有一个丰富的生态系统,...

    Hadoop源代码分析(完整版).pdf

    * Hive:是一个基于 Hadoop 的数据仓库工具,提供了 SQL -like 的查询语言。 * Pig:是一个基于 Hadoop 的数据处理工具,提供了高级的数据处理语言。 Hadoop 的源代码分析可以帮助开发者更好地理解 Hadoop 的架构和...

    Hadoop简单应用案例

    3. **Web日志分析**:通过对Web服务器的日志数据进行分析,可以获取用户行为、网站流量等重要信息。在Hadoop环境中,可以利用MapReduce对这些日志进行高效处理,提取出有价值的信息,如最受欢迎的页面、访问模式等。...

    Hive实战项目数据文件和Zeppelin源文件

    在实际项目中,"Hive实战项目数据文件"通常包含了各种类型的数据集,这些数据可能来自于日志、传感器、用户行为等。数据文件可能以CSV、JSON或其他格式存储,Hive可以通过加载这些文件来建立表。在使用Hive时,我们...

    大数据题库_大数据_大数据;_hive;_hbase等;_hadoop;_

    5. 实战案例:提供具体场景下的大数据解决方案,比如日志分析、用户行为分析等,帮助读者理解大数据技术的实际应用。 通过对这些知识点的学习和练习,读者能够深入理解大数据处理流程,熟练掌握Hadoop、HBase和Hive...

Global site tag (gtag.js) - Google Analytics