关于 MapReduce Too Many fetch failures.Failing the attempt错误

hf200012

浏览: 136020 次

最近访客更多访客>>

tritreechina

futhead

ahlon

yu350873809

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

hadoop
mapreduce

reduce task启动后的第一阶段是shuffle（向map端fetch数据），每次fetch数据的时候都可能因为connect timeout，read timeout，checksum error等原因时报，因而reduce task为每个map设置了一个计数器，用以记录fetch该map输出时失败的次数，当失败次数达到一定阀值的时候。会通知MRAppMaster 从该map fetch数据时失败的次数太多了，并打印想要的log;

该阀值计算方式：

org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.java

      float failureRate = runningReduceTasks == 0 ? 1.0f :
          (float) fetchFailures / runningReduceTasks;
        // declare faulty if fetch-failures >= max-allowed-failures
        boolean isMapFaulty =
            (failureRate >= MAX_ALLOWED_FETCH_FAILURES_FRACTION);
        if (fetchFailures >= MAX_FETCH_FAILURES_NOTIFICATIONS && isMapFaulty) {
          LOG.info("Too many fetch-failures for output of task attempt: " +
              mapId + " ... raising fetch failure to map");
          job.eventHandler.handle(new TaskAttemptEvent(mapId,
              TaskAttemptEventType.TA_TOO_MANY_FETCH_FAILURE));
          job.fetchFailuresMapping.remove(mapId);
        }

默认的阀值是3,

  //The maximum fraction of fetch failures allowed for a map
  private static final double MAX_ALLOWED_FETCH_FAILURES_FRACTION = 0.5;
// Maximum no. of fetch-failure notifications after which map task is failed
  private static final int MAX_FETCH_FAILURES_NOTIFICATIONS = 3;

最终的日志信息是在

org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.TooManyFetchFailureTransition类中打印出来的

 private static class TooManyFetchFailureTransition implements
      SingleArcTransition<TaskAttemptImpl, TaskAttemptEvent> {
    @SuppressWarnings("unchecked")
    @Override
    public void transition(TaskAttemptImpl taskAttempt, TaskAttemptEvent event) {
      //add to diagnostic
      taskAttempt.addDiagnosticInfo("Too Many fetch failures.Failing the attempt");
      //set the finish time
      taskAttempt.setFinishTime();
      
      if (taskAttempt.getLaunchTime() != 0) {
        taskAttempt.eventHandler
            .handle(createJobCounterUpdateEventTAFailed(taskAttempt));
        TaskAttemptUnsuccessfulCompletionEvent tauce =
            createTaskAttemptUnsuccessfulCompletionEvent(taskAttempt,
                TaskAttemptState.FAILED);
        taskAttempt.eventHandler.handle(new JobHistoryEvent(
            taskAttempt.attemptId.getTaskId().getJobId(), tauce));
      }else {
        LOG.debug("Not generating HistoryFinish event since start event not " +
        		"generated for taskAttempt: " + taskAttempt.getID());
      }
      taskAttempt.eventHandler.handle(new TaskTAttemptEvent(
          taskAttempt.attemptId, TaskEventType.T_ATTEMPT_FAILED));
    }
  }

分享到：

hadoop自动化安装、管理及监控工具Ambari安 ... | linux下清理svn的垃圾文件.svn文件夹

2012-12-05 11:50
浏览 7154
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

关于 MapReduce Too Many fetch failures.Failing the attempt错误

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

关于 MapReduce Too Many fetch failures.Failing the attempt错误

评论

发表评论

相关推荐

HBase 写入数据Region路由机制

hive 支持hadoop-0.23.1

hadoop-2.0-cdh4 HA 解决方案安装文档

hbase入库研究

hbase源码分析（一）：客户端数据入库

如何编译hadoop中的libhdfs

GC策略笔记备忘（被namenode所迫）

最近访客更多访客>>