- 浏览: 283342 次
- 性别:
- 来自: 广州
最新评论
-
jpsb:
...
为什么需要分布式? -
leibnitz:
hi guy, this is used as develo ...
compile hadoop-2.5.x on OS X(macbook) -
string2020:
撸主真土豪,在苹果里面玩大数据.
compile hadoop-2.5.x on OS X(macbook) -
youngliu_liu:
怎样运行这个脚本啊??大牛,我刚进入搜索引擎行业,希望你能不吝 ...
nutch 数据增量更新 -
leibnitz:
also, there is a similar bug ...
2。hbase CRUD--Lease in hbase
文章列表
refer to 0.20.6 version
yes ,as hbase will use Local fs to access so it's needless to start hadoop (which is run in any mode).
do it like this simplely:
start-hbase.sh
check success or not:
a)jps
HMaster //hbase master server
HQuorumPeer //zookeeper instan ...
some hdfs features of hadoop-0.20.2
高度容错性
a.块复制时不需要全部复制成功才算成功,它有个阀值控制,只要达到就算成功。之后NN会
检查是否有不满足replication num要求的blocks并进行处理。(相信这只是其中之一)
b.使用skipping模式,更灵活控制。
高可靠性
a.采用redundancy机制,而且分布在不同的racks上。
b.很多情况下,出现异常时并非立即退出,而是采用retry机制(因为通常异常来源于rpc通信),同样有次数控制。
自动修正能力
a.使用推 ...
when run in cluster(pseudo),it is difficult to debug the code.i have tried to debug use some tricks to do it:
a use the jdpa debug platform.
add this options to hadoop opts after start and before to run a job:
HADOOP_OPTS="$HADOOP_OPTS -agentlib:jdwp=transport=dt_socket,address=8001,server= ...
here is my summry during reading the sources,consider to my ability and the complexity of hadoop,and i have not read all the sources,there will be some inlogical statements in them,so if you find a little uncomortable in them,tell me your ideas:)
一.概念
Map(Mapper class)是一个单独的map task,一个InputS ...
sources study-part 5-hdfs - advanced features - blocks allocation policy
- 博客分类:
- hadoop sources reading
TODO
sources study-part 4-mapreduce - advanced features - spill,merge and sort
- 博客分类:
- hadoop sources reading
TODO
it is used to schedule jobs and tasks also.there are certain ones :
JobQueueTaskScheduler(by default,and is FIFO algorithm);
FairScheduler;
CapacityScheduler;
...
here is the simple flow of JobQueueTaskScheduler in Mapreduce:
"split" which is a logical
concept relatives to a "block" whis is real store unit.
when a client submit a job to JT,it will compute the splits by file,than the TT will generate InputSplit to map task.
so splits are used for spawn mappers
,if you use FileINputformat and se ...
as writing a file to hdfs,the client get a DistributedSystem to communicate with Namenode.and the DS will create a DFSClient to create DFSInputstream which is encasluted to FSDataInputStream.
off course ,the input stream get a LocatedBlock which contains 10 blocks and theirs address by default a ...
data structure transfrom file to hdfs from client
i choice to learn the sources from the flow-work of FS to MR,so the steps are these:
a.FS(IO)
b.MR
c.IPC
outline of API:
simple flow of hadoop:
as i reviewed the book,it saied:
a.it 's goal is to let each user
fairly share the cluster resources
b.jobs are placed in pools
,and each user has their own pool
.
c.u can set the priorify of a pool ,so
d.this Scheduler supports preemption
first ,i thought,if i run a client to submit ...
it is a trick to understand the shuffle principle in hadoop.let us to learn together.
if u find something wrong in this descriptions,please tell me why :)
here is a 4 maps + 3 reds
1.
why there are threes flows in the illustration(as marked by digit parts)?
when i check the property &quo ...
1.simple workflow of running a job
2.MR handled by streaming or pipe process from which out of java programs
3.report job or task stauts from TT to client through JT
============
TaskTracker flow:
references:
http://home.lupaworld.com/home-space-uid-94908-do-blog-id-32301.html
http://sucre.iteye.com/blog/587673
Linux中有许多种不同的shell,通常我们使用bash (bourne again shell) 进行shell编程,因为bash不仅免费(自由)且易于使用
#!/bin/sh 符号#!用来告诉系统执行该脚本的程序,(必须放在文件的第一行)
所有变量都由字符串组成,并且不需要声明。
hmod +x filename # 使其可执
防止混乱
echo ...