`
文章列表
refer to 0.20.6 version   yes ,as hbase will use Local fs to access so it's needless to start hadoop (which is run in any mode). do it like this simplely:     start-hbase.sh   check success or not:   a)jps   HMaster                //hbase master server   HQuorumPeer     //zookeeper instan ...
some hdfs features of hadoop-0.20.2 高度容错性     a.块复制时不需要全部复制成功才算成功,它有个阀值控制,只要达到就算成功。之后NN会 检查是否有不满足replication num要求的blocks并进行处理。(相信这只是其中之一)     b.使用skipping模式,更灵活控制。     高可靠性     a.采用redundancy机制,而且分布在不同的racks上。     b.很多情况下,出现异常时并非立即退出,而是采用retry机制(因为通常异常来源于rpc通信),同样有次数控制。 自动修正能力     a.使用推 ...
when run in cluster(pseudo),it is difficult to debug the code.i have tried to debug use some tricks to do it: a use the jdpa debug platform. add this options to hadoop opts after start and before to run a job: HADOOP_OPTS="$HADOOP_OPTS -agentlib:jdwp=transport=dt_socket,address=8001,server= ...
here is my summry during reading the sources,consider to  my ability and  the complexity of hadoop,and i have not read all the sources,there will be some inlogical statements in them,so if you find a little uncomortable in them,tell me your ideas:)   一.概念 Map(Mapper class)是一个单独的map task,一个InputS ...
it is used to schedule jobs and tasks also.there are certain ones :  JobQueueTaskScheduler(by default,and is FIFO algorithm);  FairScheduler;  CapacityScheduler; ...   here is the simple flow of JobQueueTaskScheduler in Mapreduce:
"split" which is a logical concept relatives to a "block" whis is real store unit. when a client submit a job to JT,it will compute the splits by file,than the TT will generate InputSplit to map task. so splits are used for spawn mappers ,if you use FileINputformat and se ...
as writing a file to hdfs,the client get a DistributedSystem to communicate with Namenode.and the DS will create a DFSClient to create DFSInputstream which is encasluted to FSDataInputStream.   off course ,the input stream get a LocatedBlock which contains 10 blocks and theirs address by default a ...
data structure transfrom file to hdfs from client
i choice to learn the sources from the flow-work of FS to MR,so the steps are these: a.FS(IO) b.MR c.IPC   outline of API:     simple flow of hadoop:
as i reviewed the book,it saied: a.it 's goal is to let each user fairly share the cluster resources b.jobs are placed in pools ,and each user has their own pool . c.u can set the priorify of a pool ,so d.this Scheduler supports preemption   first ,i thought,if i run a client to submit ...
it is a trick to understand  the shuffle principle in hadoop.let us to learn together. if u find something wrong in this descriptions,please tell me why :) here is a 4 maps + 3 reds   1. why there are threes flows in the illustration(as marked by digit parts)? when i check the property &quo ...
1.simple workflow of running a job 2.MR handled by streaming or pipe process from which out of java programs  3.report job or  task stauts from TT to client through JT     ============ TaskTracker flow:                                                     
references: http://home.lupaworld.com/home-space-uid-94908-do-blog-id-32301.html http://sucre.iteye.com/blog/587673 Linux中有许多种不同的shell,通常我们使用bash (bourne again shell) 进行shell编程,因为bash不仅免费(自由)且易于使用 #!/bin/sh 符号#!用来告诉系统执行该脚本的程序,(必须放在文件的第一行) 所有变量都由字符串组成,并且不需要声明。 hmod +x filename # 使其可执 防止混乱 echo ...
Global site tag (gtag.js) - Google Analytics