hadoop知识

p_x1984

浏览: 1189061 次
性别:
来自: 北京

最近访客更多访客>>

u012363178

清风_秋雨

sun80264629

shaoaj

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop

Hadoop Workflow Eclipse BBS HBase

hadoop 技术论坛。http://bbs.hadoopor.com/index.php
1.hadoop0.20.0 + eclipse环境搭建http://bbs.hadoopor.com/thread-43-1-1.html
台湾一个人写的，很好。hadoop0.20.0 + eclipse环境搭建http://trac.nchc.org.tw/cloud/wiki/waue/2009/0617教怎么打包成jar，制作jar包
注意里面的那个Makefile文件“jar -cvf ${JarFile} -C bin/ .”
”hadoop jar ${JarFile} ${MainFunc} input output“等要用tab开头，而不是空格，至于help下面的我都注释掉了在前面加“#”，因为还不知道怎么用

1.hadoop集群配置中client如何向hadoop传输数据http://bbs.hadoopor.com/thread-362-1-1.html：
使用DFSClient工具，客户端上传数据不需要部署hadoop，只需要安装有DFSClient工具就可以上传数据了。
bin/hadoop fs -put 就是以DFSClient的方式“远程”访问HDFS的（当然也是在本地）

2.Hadoop对mysql支持http://bbs.hadoopor.com/thread-132-1-2.html
lance(274105045) 09:48:43
好像0。20里面有提供对DB的输入，输出。
hadoopor(784027584) 09:48:50
但要使用Job并行化，就不得使用默认的调试器，Facebook提供的FaireScheduler支持对Job的并行调度。?????
Spork(47986766) 09:49:16
不是好像了，就是有，只是目前支持的较好的都是开源的，如mysql

3.SequenceFile介绍:http://bbs.hadoopor.com/thread-144-1-1.html

4.JobTracker.JobInProcesshttp://bbs.hadoopor.com/thread-212-1-1.html用于监控一个Job的调度情况。一个Job会被分解成N个Tasks，这些Tasks被分配到集群中的TaskTracer节点，由TaskTracer节点去执行这些Tasks。

==========搜索自Nabble Hadoop===============
1.Hadoop 0.17 schedules jobs fifo. If it isn't,
that is a bug. http://old.nabble.com/Hadoop-job-scheduling-issue-td19659938.html#a19659938

2.Can jobs be configured to be sequential. it means jobs in Group1 excute first, and jobs in Group2 excute later. and Group2 jobs depends on Group1 jobs. The jobs in Group1 or Group2 are independent.
http://old.nabble.com/Can-jobs-be-configured-to-be-sequential-td20043257.html#a20043257
I recommend that you look at http://cascading.org as
an abstraction layer for managing these kinds of workflows. We've
found it quite useful.

3.Sequence of Streaming Jobs： if you are using the sh or bash, the variable $? holds the exit status of the last command to execute.

hadoop jar streaming.jar ...
if [ $? -ne 0 ]; then
echo "My job failed" 2>&1
exit 1
fi

Caution $? is the very last command to execute's exit status. It is easy to run another command before testing and then test the wrong command's exit status
http://old.nabble.com/Sequence-of-Streaming-Jobs-td23336043.html#a23351848

4.mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
mapred.map.multithreadedrunner.threads

5.http://old.nabble.com/Linking-2-MarReduce-jobs-together--td18756178.html#a18756178
Is it possible to put the output from the reduce phase of job 1
to be the input to job number 2？
Well your data has to be somewhere between the two jobs... So I'd say yes, put it in HBase or HDFS to reuse it

6.the <pro hadoop> chapter8 covers this topic.

7.http://old.nabble.com/Customizing-machines-to-use-for-different-jobs-td23864519.html#a23864519
Customizing machines to use for different jobs:
Unfortunately there is no built-in way of doing this. You'd have to
instantiate two entirely separate Hadoop clusters to accomplish what you're
trying to do, which isn't an uncommon thing to do.

I'm not sure why you're hoping to have this behavior, but the fair share
scheduler might be helpful to you. It let's you essentially divvy up your
cluster into queues, where each queue has its own "chunk" of the cluster.
When resources are available outside of the "chunk," then jobs can span into
other queues' space.

Cloudera's Distribution for Hadoop (<http://www.cloudera.com/hadoop>)
includes the fair share scheduler. I recommend using our distribution,
otherwise here is the fair share JIRA:

<http://issues.apache.org/jira/browse/HADOOP-3746>

8.http://old.nabble.com/How-to-run-many-jobs-at-the-same-time--td23151917.html#a23151917
How to run many jobs at the same time?:JobControl example

9.http://issues.apache.org/jira/browse/HADOOP-5170
Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide

once the tasktracker starts, the maximum number of tasks
per node can not be changed. In my case, I've solved this challenge by stopping and starting mapred (stop-mapred.sh, start-mapred.sh) between jobs.
There is a jira so this may be changed in the future: jira HADOOP-5170 (
http://issues.apache.org/jira/browse/HADOOP-5170)
可能已经修正了

10.Oozie, Hadoop Workflow System
https://issues.apache.org/jira/browse/HADOOP-5303

11.http://kdpeterson.net/blog/2009/11/hadoop-workflow-tools-survey.html
Hadoop Workflow Tools Survey
very clear about jobs schedule
一个视频http://developer.yahoo.net/blogs/theater/archives/2009/08/hadoop_summit_workflow_oozie.html

12.http://wiki.dspace.org/index.php/Creating_and_Applying_Patches_in_Eclipse
Creating and Applying Patches in Eclipse
http://www.ibm.com/developerworks/cn/opensource/os-eclipse-galileopatch/

13.JobControl:http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/jobcontrol/JobControl.html

14. http://kdpeterson.net/blog/2009/11/hadoop-workflow-tools-survey.html
By default, Hadoop uses a FIFO scheduler, but there are two more advanced schedulers which are widely used. The Capacity Scheduler is focused on guaranteing that various users of a cluster will have access to their guaranteed number of slots while making it and the Fair Scheduler is

分享到：

谷歌眼中的搜索未来 | 部署自己写的map/reduce程序的方法

2010-07-10 10:35
浏览 2169
评论(0)
分类:非技术
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hadoop知识

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

hadoop知识

评论

发表评论

相关推荐

Java并发编程总结---Hadoop核心源码实例解读

使用hadoop的lzo问题!

secondarynamenode配置使用总结

Map/Reduce中的Combiner的使用

Map/Reduce中的Partiotioner使用

hadoop如何添加节点

hadoop如何恢复namenode

Hadoop删除节点（Decommissioning Nodes）

hadoop知识点整理

喜欢hadoop的同学们值得一看

hadoop优化

hadoop分配任务的问题

hadoop-FAQ

Apache Hadoop 0.21版本新功能ChangeNode

Hadoop关于处理大量小文件的问题和解决方法

hadoop常见错误及解决办法！

Hadoop节点热拔插

hadoop动态添加节点

欢迎大家讨论hadoop性能优化

hadoop错误之二：could only be replicated to 0 nodes, instead of 1

最近访客更多访客>>