- 浏览: 283569 次
- 性别:
- 来自: 广州
最新评论
-
jpsb:
...
为什么需要分布式? -
leibnitz:
hi guy, this is used as develo ...
compile hadoop-2.5.x on OS X(macbook) -
string2020:
撸主真土豪,在苹果里面玩大数据.
compile hadoop-2.5.x on OS X(macbook) -
youngliu_liu:
怎样运行这个脚本啊??大牛,我刚进入搜索引擎行业,希望你能不吝 ...
nutch 数据增量更新 -
leibnitz:
also, there is a similar bug ...
2。hbase CRUD--Lease in hbase
文章列表
1.flow
1.1 shuffle abstract
1.2 shuffle flow
1.3 sort flow in shuffle
1.4 data structure in mem
2.core code paths
//SortShuffleWriter
override def write(records: Iterator[Product2[K, V]]): Unit = { //-how to collect this result by partition?by index file
//-1 sort ...
in this section,we will verify that how does spark collect data from prevous stage to next stage(result task)
figure after finishing ShuffleMapTask computation(ie post process ).note:the last method 'reviveOffers()' is redundant in this mode as the step 13 will setup next stage(reuslttask ...
now we will dive into spark internal as per this simple example(wordcount,later articles will reference this one by default) below
sparkConf.setMaster("local[2]") //-local[*] by default
//leib-confs:output all the dependencies logs
sparkConf.set("spark.logLineage","tru ...
similar to other open source projects,spark has several shells are listed there
sbin
server side shells
start-all.sh
start the whole spark daemons
(ie. start-master.sh,start-slaves.sh)
start-master.sh
startup the spark's master process
deliver to "spark-daemon.sh ...
如果你想比较一下看看两个对象是否相等,可以使用或者==,或它的反义 !=。(对所有对象都适用,而不仅仅是基本数据类型)
with enabling both system environment 'SPARK_PRINT_LAUNCH_COMMAND' and --verbose ,the spark command is more detailed that outputed from spark-submit.sh:
hadoop@GZsw04:~/spark/spark-1.4.1-bin-hadoop2.4$ spark-submit --master yarn --verbose --class org.apache.spark.examples.JavaWordCount lib/spark ...
ref :scala object 转Class Scala强制类型转换
[spark-src] 1-overview
- 博客分类:
- spark
what is
"Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spark
in despite of it's real a fact or not, i think certain key concepts/components to ...
base on :
spark-1.4.1
hadoop-2.5.2
Base from simpleness to complexity and working flow principle,we conform to these steps:
1.[spark-src] spark overview
2.[spark-src] core
from basic demos to dive into spark internal.this section will envolve many components,so it's much detail ...
google AlphaGo vs Lee on 'the game of go' VS 回广州了,再战江湖
cheers
env:
hbase,94.26
zookeeper,3.4.3
---------------
1.downed node
this morning we found a regionserver(host-34) downed in our monitor.so we dived into the logs of hbase and found that in this host:
2016-02-29 00:50:36,799 INFO [regionserver60020-SendThread(host-04:2181)] ClientCnxn.java:108 ...
spark stream lineage
ref:
Spark Streaming:大规模流式数据处理的新贵
hbase qq学习交流群476390228 ,专注hbase技术交流,但也不排斥nosql相关数据库探讨,所谓举一反三
cheers
when i do some sql-related certain simple test programs,this exception occurs to me.although it seems weird.
(used spark-1.3.1 for project needness)
scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror
at scala.reflect.internal. ...
i wanna export a table to json format files,but after gging,nothing solutions found.i known,pig is used to do soome sql like mapreduces stuff; and hive is a dataware to build on hbase.but i cant some soutions /wordaround to do that too( maybe i miss something)
so i consider to use mr to figure ...