Hadoop remote debugging

sunwinner

浏览: 204295 次
性别:
来自: 上海

最近访客更多访客>>

luojianbing

yanghuangsanguo

jahentao

baichoufei90sina

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop

When a task fails and there is not enough information logged to diagnose the error, you may want to resort to running a debugger for that task. This is hard to arrange when running the job on a cluster, as you don’t know which node is going to process which part of the input, so you can’t set up your debugger ahead of the failure. However, there are a few other options available:

Reproduce the failure locally

Often the failing task fails consistently on a particular input. You can try to repro- duce the problem locally by downloading the file that the task is failing on and running the job locally, possibly using a debugger such as Java’s VisualVM.

Use JVM debugging options

A common cause of failure is a Java out of memory error in the task JVM. You can setmapred.child.java.optstoinclude-XX:-HeapDumpOnOutOfMemoryError -XX:Heap DumpPath=/path/to/dumps. This setting produces a heap dump that can be examined afterward with tools such as jhat or the Eclipse Memory Analyzer. Note that the JVM options should be added to the existing memory settings specified by mapred.child.java.opts.

Use task profiling

Java profilers give a lot of insight into the JVM, and Hadoop provides a mechanism to profile a subset of the tasks in a job.

Use IsolationRunner

Older versions of Hadoop provided a special task runner called IsolationRunner that could rerun failed tasks in situ on the cluster. Unfortunately, it is no longer available in recent versions, but you can track its replacement at https://issues .apache.org/jira/browse/MAPREDUCE-2637.

In some cases it’s useful to keep the intermediate files for a failed task attempt for later inspection, particularly if supplementary dump or profile files are created in the task’s working directory. You can set keep.failed.task.files to true to keep a failed task’s files.

You can keep the intermediate files for successful tasks, too, which may be handy if you want to examine a task that isn’t failing. In this case, set the property keep.task.files.pattern to a regular expression that matches the IDs of the tasks you want to keep.

To examine the intermediate files, log into the node that the task failed on and look for the directory for that task attempt. It will be under one of the local MapReduce direc- tories, as set by the mapred.local.dir property. If this property is a comma-separated list of directories (to spread load across the physical disks on a machine), you may need to look in all of the directories before you find the directory for that particular task attempt. The task attempt directory is in the following location:

mapred.local.dir/taskTracker/jobcache/job-ID/task-attempt-ID

分享到：

JDK 1.6.0_29 Mac OS X profiling issue | 记录一下Hadoop在Mac OS Mountain Lion上安 ...

2013-06-26 09:07
浏览 799
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论