`
文章列表
15/12/09 16:47:52 INFO yarn.ExecutorRunnable: Setting up executor with environment: Map(CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/s hare/hadoop/common/*<CPS>$HADOOP_COMMON_HOME/share/hadoop/common/lib/*<CPS>$HADOOP_ ...
  run on a yarn ensemble is straightforward,   1.setup HADOOP_CONF_DIR    u can use command export HADOOP_CONF_DIR=xx    or add it to spark-env.sh    2. spark-submit --master yarn --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/ ...
  yep,u can submit a app to spark ensemble by spark-submit command ,e.g. spark-submit --master spark://gzsw-02:7077 --class org.apache.spark.examples.JavaWordCount --verbose --deploy-mode client ~/spark/spark-1.4.1-bin-hadoop2.4/lib/spark-examples-1.4.1-hadoop2.4.0.jar spark/spark-1.4.1-bin-hadoo ...
  per partition versions of map() and foreach,  ref :learning spark
http://www.baidu.com/p/hejuncheng1018?from=wenku chapter 9,http://www.tuicool.com/articles/3mMz6b chapter 11 chapter 12 chapter 14 chapter 20(this answer is not all correct:( chapter 17

hadoop-compression

http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ (namely :使hadoop支持Splittable压缩lzo) Very basic question about Hadoop and compressed input files Hadoop gzip input file using only one mapper Why can't hadoop split up a large text file and then compress t ...
all figures below are from 'learing-spark',         
answers for the books: outline required 1    required 2 https://zhidao.baidu.com/question/1671038094187205107.html optional 2.1 optional 2-3 required 3 required 4 required 5   -- teacher's book  
  in zookeeper ,during certain io pressure,the client will try to reconnect to quorum.after that,the quorum peer will return a new session timeout (akka negotiatedSessionTimeout) to former,then client will recompulate the real connTimeout and readTimeout from the response .the negotiatedSessionTime ...
  after a heavy cost time(primary at download huge number of jars),the first example from book 'learning spark' is run through.     the source code is very simple /** * Illustrates flatMap + countByValue for wordcount. */ package com.oreilly.learningsparkexamples.scala import org.apache. ...
    as u know,the hbas's data logs (akka wal) will roll after certain intervals to speedup restore data lost occasionally.and of course,both log rolling and flush memstore will block up all wirtes but reads.so if decreasing the log rolling will optimize the cluster perf.   1.case   during the h ...
abstract,spark can be compiled with: maven, sbt, intellj ideal ref:Spark1.0.0 源码编译和部署包生成     also,if u want to load spark-project into eclipse ,then it is necessary to make a 'eclipse project' first by one of below solutions: 1.mvn eclipse:eclipse [optional] 2. ./sbt/sbt clean compile packa ...
    the flows of distributing a scala project by using scala sbt(scala simple build tool) plugin,sbt assembly plugin through 'create scala project','download dependent jars','publish scala project' steps scala eclipse sbt( Simple Build Tool) 应用程序开发
  below u will see different ip selection for command 'ping' and telnet' in linux:   server1@myhost18:~$ telnet host1-26 60020 Trying 192.168.1.126... Connected to host1-26. Escape character is '^]'. quit |??)org.apache.hadoop.ipc.RPC$VersionMismatch>Server IPC version 3 cannot communica ...
  in these days ,i learned to the data warehouse framework-hive ,mainly from the ebook 'programming hive' [1],as it's about 23 chapters in detail;)   so below are the outlines about this topic:   1.overview 2.architecture 3.features 4. hive vs pig,hive vs hbase 5.use cases   1.overview   ...
Global site tag (gtag.js) - Google Analytics