- 浏览: 246952 次
- 性别:
- 来自: 成都
最新评论
-
oldrat:
https://github.com/oldratlee/tr ...
Kafka: High Qulity Posts
文章列表
Running Solr on HDFS
- 博客分类:
- Solr
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS
http://www.quora.com/How-does-LinkedIns-recommendation-system-work
I gave this talk earlier this week at Hadoop World(http://www.hadoopworld.com/sessi...), a conference that is evangelizing Hadoop by way of highlighting how people across the industry are solving big business challenges by lever ...
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
The Log: What every software engineer should know about real-time data's unifying abstraction
Jay Kreps
Principal Staff Engineer
Posted on 12/16/2013
...
Install ant
#wget http://apache.tradebit.com/pub//ant/binaries/apache-ant-1.9.4-bin.zip#unzip apache-ant-1.9.4-bin.zip
# mv apache-ant-1.9.4/ /opt/ant
#ln -s /opt/ant/bin/ant /usr/bin/ant
#vi /etc/profile.d/ant.sh
#!/bin/bash
ANT_HOME=/opt/antPATH=$ANT_HOME/bin:$PATH
export PATH ANT_HOME
ex ...
#tar -xzf jdk-7u51-linux-x64.tar.gz -C /opt/
#ln -s /opt/jdk1.7.0_51/bin/java /sbin/java
#echo "export JAVA_HOME=/opt/jdk1.7.0_51" > /etc/profile.d/java_env.sh
#echo "export JRE_HOME=/opt/jdk1.7.0_51/jre" >> /etc/profile.d/java_env.sh
#echo "export CLASSPATH=.:\ ...
Case Study: Automatic Reduce Parallelism
Motivation
Distributed data processing is dynamic by nature and it is extremely difficult to statically determine optimal concurrency and data movement methods a priori. More information is available during runtime, like data samples and sizes, which may he ...
The previous couple of blogs covered Tez concepts and APIs. This gives some details on what is required to write a custom Input / Processor / Output, along with examples of existing I/P/Os provided by the Tez runtime library.
Tez Task
A Tez task is constituted of all the Inputs on its incoming edg ...
What is Apache Tez?
Apache Tez generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks. It also represents the next logical next step for Hadoop 2 and the introduction of with YARN and its more general-purpose resource management framework.
While MapReduce has ...
Overview
Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing. Thus user logic, that analyses and modifies the data, sits in the vertices. Edges determine the consumer of ...
Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing. The user logic, that analyses and modifies the data, sits in the vertices. Edges determine the consumer of the data, ...
Build from source code under ubuntu12.04
1. donw
#wget http://mirrors.hust.edu.cn/apache/ambari/ambari-1.6.1/ambari-1.6.1.tar.gz
#tar -xvfz ambari-1.6.1.tar.gz
#cd ambari-1.6.1
2.prepare env
see: https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Development
not: https://cwiki. ...
1. download souce code
#svn checkout https://svn.apache.org/repos/asf/hama/trunk hama-trunk
2. build
#mvn -Declipse.workspace="/home/zhaohj/workspace/" eclipse:configure-workspace
#mvn clean install -Phadoop2 -Dhadoop.version=2.3.0
#mvn eclipse:eclipse
Note: use java 1.7. IF jav ...
http://hadoopecosystemtable.github.io/
http://blog.andreamostosi.name/big-data/
https://github.com/youngwookim/awesome-hadoop
https://wiki.apache.org/solr/OpenNLP
build from source code
1. download from http://tez.apache.org/install.html
if you want to get the lattest codes through this command
#git clone https://git-wip-us.apache.org/repos/asf/tez.git
#tar xvf apache-tez-0.5.1-src.tar.gz
#cd apache-tez-0.5.1-src
#mvn package -Dhadoop.version= ...