- 浏览: 245746 次
- 性别:
- 来自: 成都
最新评论
-
oldrat:
https://github.com/oldratlee/tr ...
Kafka: High Qulity Posts
文章列表
flatten
players = load 'baseball' as (name:chararray, team:chararray,position:bag{t:(p:chararray)}, bat:map[]);pos= foreach players generate name, flatten(position) as position;bypos= group pos by position;
Jorge Posada,New York Yankees,{(Catcher),(Designated_hitter)},...
==>
Jorge Pos ...
ZooKeeper: Install
- 博客分类:
- ZooKeeper
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
Download from offical websit http://zookeeper.apache.org/
#tar -zxf zookeeper-3.4.6.tar.gz
#cd zookeeper-3.4.6
----conf/zoo.cfg ...
HBase: Install
- 博客分类:
- HBase
EVN: hadoop2.3.0 ubuntu12.04 64 hue3.5.0 pig0.12.0 hive0.12.0 oozie4.0.0
Install
Download tarball form http://mirrors.cnnic.cn/apache/hbase/hbase-0.98.0/
#tar -xzvf hbase-0.98.0-hadoop2-bin.tar.gz
#cd hbase-0.98.0-hadoop2
Configure
Note:the following conf is suited to fully- ...
Relational Operations
foreach
foreach takes a set of expressions and applies them to every record in the data pipeline.
A = load 'input' as (user:chararray, id:long, address:chararray, phone:chararray,preferences:map[]);B = foreach A generate user, id;
prices = load 'NYSE_daily' as (exc ...
Relation and Field
Pig Latin is a dataflow language. Each processing step results in a new data set, or relation.
A = load 'NYSE_dividends' (exchange, symbol, date, dividends);
//A is relation exchange,symbol,date and dividends are all fields
Case Sensitivity
Keywords in Pig Latin a ...
Pig: Data Model
- 博客分类:
- Pig
Data types
Nulls
In Pig a null data element means the value is unknown.which is completely different from the concept of null in C, Java, Python, etc.
Schemas
dividends = load 'NYSE_dividends' as (exchange:chararray, symbol:chararray, date:chararray, dividend:float);
dividends ...
Pig: Grunt Usage
- 博客分类:
- Pig
Grunt* is Pig’s interactive shell.
Start
#pig -x local //interacte with local file system
#pig //interacte with hadoop cluster
Exit
grunt>quit;
or
CTRL+D
Note:Grunt provides command-line history and editing,as well as Tab completion. It does not provide file ...
Pig: Basic Usage
- 博客分类:
- Pig
Running Locally
#pig -x local average_dividend.pig
Runnig on Hadoop Cluster
#pig -e fs -mkdir /user/username //username is the name who run pig
#pig -e fs -copyFromLocal NYSE_dividends NYSE_dividends //put test data to /user/username dir
#pig average_dividend.pig
#pig -e cat averag ...
DataBase
hive>show databases;
hive>show databases like 'h.*';
hive>describe database mydb;
hive>describe database extended mydb;
hive>create database mydb;
hive>create database if not exists mydb;
hive>create database mydb location '/my/prefered/direcotry';
hive&g ...
HiveConf java class for current Hive configuration options
Metastore Conf
All the metadata for Hive tables and partitions are stored in Hive Metastore.
there are 3 different ways to setup metastore server using different Hive configurations:
Embedded Metastore
An embedded metastore is ma ...
Install hive
1. download hive-0.12.0.bin.tar.gz
2.#tar -xzvf hive-0.12.0.bin.tar.gz
3.add the bin dir to PATH in ~/.bashrc
4.#source ~/.bashrc
The dir structure of hive-0.12.0.bin likes the following:
lib/ : contains JARs., which implement a particular subset of Hive's functionali ...
EVN: ubuntu 12.04/13.01 hadoop2.x.0 hue3.5.0 pig0.12.0 hive0.12.0 sqoop1.99.3
oozie4.0.0 hbase0.98.0
-
Prepare evn
-----------------
#sudo apt-get update
#sudo apt-get install libxml2-dev
#sudo apt-get install libxslt-dev
#sudo apt-get install libsasl2-dev
#sudo a ...
Primative types:
TINYINT SMALLINT INT BIGINT BOOLEAN FLOAT DOUBLE STRING TIMESTAMP BINARY
Collection Data Types:
Example:
CREATE TABLE employees (name STRING,salary FLOAT,subordinates ARRAY<STRING>,deductions ...
Pig: Install and Rebuild
- 博客分类:
- Pig
ENV: Hadoop2.3.0 pig0.12
Hadoop is runnig and pig grunt works well.
but when load data and dump it to screen
#actor = load '/test/actor' using PigStorage(',') as (id, name, addr, time);
#dump actor;
the error is :
ackend error message during job submission----------------------------------- ...
Sqoop2 Install
1. install server
download the tarball form the official website
#tar -xzvf sqoop-1.99.3-bin-hadoop200.tar.gz
Assume that the server and client will install in the same host:192.168.122.1
configure server related configuration files in dir
/path/to/sqoop-1.99.3-bin-hadoop ...