`
文章列表

Running Solr on HDFS

    博客分类:
  • Solr
https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS  
http://www.quora.com/How-does-LinkedIns-recommendation-system-work   I gave this talk earlier this week at Hadoop World(http://www.hadoopworld.com/sessi...), a conference that is evangelizing Hadoop by way of highlighting how people across the industry are solving big business challenges by lever ...
http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying   The Log: What every software engineer should know about real-time data's unifying abstraction Jay Kreps Principal Staff Engineer Posted on 12/16/2013 ...
Install ant #wget http://apache.tradebit.com/pub//ant/binaries/apache-ant-1.9.4-bin.zip#unzip apache-ant-1.9.4-bin.zip # mv apache-ant-1.9.4/ /opt/ant #ln -s /opt/ant/bin/ant /usr/bin/ant #vi /etc/profile.d/ant.sh #!/bin/bash ANT_HOME=/opt/antPATH=$ANT_HOME/bin:$PATH export PATH ANT_HOME ex ...
#tar -xzf jdk-7u51-linux-x64.tar.gz -C /opt/ #ln -s /opt/jdk1.7.0_51/bin/java /sbin/java #echo "export JAVA_HOME=/opt/jdk1.7.0_51" > /etc/profile.d/java_env.sh #echo "export JRE_HOME=/opt/jdk1.7.0_51/jre" >> /etc/profile.d/java_env.sh #echo "export CLASSPATH=.:\ ...
Case Study: Automatic Reduce Parallelism Motivation Distributed data processing is dynamic by nature and it is extremely difficult to statically determine optimal concurrency and data movement methods a priori. More information is available during runtime, like data samples and sizes, which may he ...
The previous couple of blogs covered Tez concepts and APIs. This gives some details on what is required to write a custom Input / Processor / Output, along with examples of existing I/P/Os provided by the Tez runtime library. Tez Task A Tez task is constituted of all the Inputs on its incoming edg ...
What is Apache Tez? Apache Tez generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks. It also represents the next logical next step for Hadoop 2 and the introduction of with YARN and its more general-purpose resource management framework. While MapReduce has ...
Overview Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing. Thus user logic, that analyses and modifies the data, sits in the vertices. Edges determine the consumer of ...
Apache Tez models data processing as a dataflow graph, with the vertices in the graph representing processing of data and edges representing movement of data between the processing. The user logic, that analyses and modifies the data, sits in the vertices. Edges determine the consumer of the data, ...
Build from source code  under ubuntu12.04 1. donw  #wget   http://mirrors.hust.edu.cn/apache/ambari/ambari-1.6.1/ambari-1.6.1.tar.gz #tar -xvfz ambari-1.6.1.tar.gz #cd ambari-1.6.1   2.prepare env see: https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Development not: https://cwiki. ...
1. download souce code #svn checkout https://svn.apache.org/repos/asf/hama/trunk hama-trunk   2. build #mvn -Declipse.workspace="/home/zhaohj/workspace/" eclipse:configure-workspace #mvn clean install -Phadoop2 -Dhadoop.version=2.3.0 #mvn eclipse:eclipse Note: use java 1.7.  IF jav ...
http://hadoopecosystemtable.github.io/ http://blog.andreamostosi.name/big-data/ https://github.com/youngwookim/awesome-hadoop
https://wiki.apache.org/solr/OpenNLP
build from source code   1. download   from http://tez.apache.org/install.html if you want to get the lattest codes through this command #git clone https://git-wip-us.apache.org/repos/asf/tez.git   #tar xvf apache-tez-0.5.1-src.tar.gz #cd apache-tez-0.5.1-src #mvn package  -Dhadoop.version= ...
Global site tag (gtag.js) - Google Analytics