`

yarn 集群部署,遇到的有关问题小结

 
阅读更多

 

yarn 集群部署,遇到的有关问题小结

优良自学吧提供yarn 集群部署,遇到的有关问题小结,yarn 集群部署,遇到的问题小结 版本信息: hadoop 2.3.0  hive 0.11.0 1. Application Master 无法访问     点击application mater 链接,出现 http 500 错误

 
<iframe src="http://cb.baidu.com/ecom?adn=12&amp;at=103&amp;aurl=&amp;cad=1&amp;ccd=24&amp;cec=UTF-8&amp;cfv=15&amp;ch=0&amp;col=zh-CN&amp;conBW=0&amp;conOP=1&amp;cpa=1&amp;dai=4&amp;dis=0&amp;hn=3&amp;ltr=&amp;ltu=http%3A%2F%2Fwww.ylzx8.cn%2Fzonghe%2Fsystem%2F1013421.html&amp;lunum=6&amp;n=v77y4_cpr&amp;pcs=1635x706&amp;pis=10000x10000&amp;ps=0x0&amp;psr=1920x1080&amp;pss=1635x10768&amp;qn=658e1bd8328454fe&amp;rad=&amp;rs=301&amp;rsi0=450&amp;rsi1=60&amp;rsi5=4&amp;rss0=%23FFFFFF&amp;rss1=%23FFFFFF&amp;rss2=%23FA0303&amp;rss3=&amp;rss4=&amp;rss5=&amp;rss6=%23e10900&amp;rss7=&amp;scale=&amp;skin=&amp;td_id=9223372032564458138&amp;titFF=%E5%AE%8B%E4%BD%93&amp;titFS=14&amp;titTA=left&amp;tn=baiduTlinkInlay&amp;tpr=1418017342632&amp;ts=1&amp;wn=4&amp;xuanting=0&amp;dtm=BAIDU_DUP2_SETJSONADSLOT&amp;dc=2&amp;di=728818" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" align="center,center" width="450" height="60"></iframe>
yarn 集群部署,遇到的问题小结
版本信息: hadoop 2.3.0  hive 0.11.0
 
1. Application Master 无法访问
 
    点击application mater 链接,出现 http 500 错误,java.lang.Connect.exception:
    问题是由于设定web ui时,50030 端口对应的ip地址为0.0.0.0,导致application master 链接无法定位。
 
解决办法:
     yarn-site.xml 文件
    <property>
        <description>The address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>xxxxxxxxxx:50030</value>
    </property>
    这是2.3.0 的里面的一个bug 1811 ,2.4.0已经修复
 
2. History UI 无法访问 和 container 打不开
     点击 Tracking URL:History无法访问
       问题是 history service 没有启动
      
  解决办法:
     配置:选择(xxxxxxxxxx: 作为history sever
   
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
   <property>
        <name>mapreduce.jobhistory.address</name>
        <value>xxxxxxxxxx::10020</value>
    </property>
 
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
        <value>xxxxxxxxxx:19888</value>
    </property>
 
  sbin/mr-jobhistory-daemon.sh   
start historyserver
相关链接:http://www.iteblog.com/archives/936

3 yarn 平台的优化
 
设置 虚拟cpu的个数
    <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>23</value> 
    </property>
    设置使用的内存
    <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>61440</value>
        <description>the amount of memory on the NodeManager in GB</description>
    </property>
设置每个任务最大使用的内存
    <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>49152</value>
        <source>yarn-default.xml</source>
    </property>

4 运行任务 提示: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected
修改pom,重新install
    <dependency>
           <groupId>org.apache.hadoop</groupId>
           <artifactId>hadoop-common</artifactId>
           <version>2.3.0</version>
   </dependency>    
 <dependency>
         <groupId>org.apache.hadoop</groupId>
         <artifactId>hadoop-mapreduce-client-core</artifactId>
         <version>2.3.0</version>
   </dependency>
   <dependency>
                <groupId>org.apache.mrunit</groupId>
                <artifactId>mrunit</artifactId>
                <version>1.0.0</version>
                <classifier>hadoop2</classifier>
                <scope>test</scope>
            </dependency>
jdk 换成1.7



5 运行任务提示shuffle内存溢出Java heap space
2014-05-14 16:44:22,010 FATAL [IPC Server handler 4 on 44508] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1400048775904_0006_r_000004_0 - exited : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#3
    at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
    at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
    at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
    at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:411)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
    at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)

来源: <http:/xxxxxxxxxx:19888/jobhistory/logs/ST-L09-05-back-tj-yarn15:8034/container_1400048775904_0006_01_000001/job_1400048775904_0006/hadoop/syslog/?start=0>
 
 
解决方法:
调低mapreduce.reduce.shuffle.memory.limit.percent的值 默认为0.25 现在调成0.10 
 

参考:
http://www.sqlparty.com/yarn%E5%9C%A8shuffle%E9%98%B6%E6%AE%B5%E5%86%85%E5%AD%98%E4%B8%8D%E8%B6%B3%E9%97%AE%E9%A2%98error-in-shuffle-in-fetcher/

6 reduce 任务的log 中间发现:

2014-05-14 17:51:21,835 WARN [Readahead Thread #2] org.apache.hadoop.io.ReadaheadPool: Failed readahead on ifile
EINVAL: Invalid argument
    at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
    at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:263)
    at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:142)
    at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

来源: <http://xxxxxxxxxx:8042/node/containerlogs/container_1400060792764_0001_01_000726/hadoop/syslog/?start=-4096>
 
ps: 错误没有再现,暂无解决方法
 

7 hive 任务

java.lang.InstantiationException: org.antlr.runtime.CommonToken
Continuing ...
java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();
 
参考:https://issues.apache.org/jira/browse/HIVE-4222s

8 hive 任务自动把join装换mapjoin时内存溢出,解决方法:关闭自动装换,11前的版本默认值为false,后面的为true;
在任务脚本里面加上:set hive.auto.convert.join=false;
或者在hive-site.xml 配上为false;
出错日志:
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-05-15 02:40:58     Starting to launch local task to process map join;      maximum memory = 1011351552
2014-05-15 02:41:00     Processing rows:        200000  Hashtable size: 199999  Memory usage:   110092544       rate:   0.109
2014-05-15 02:41:01     Processing rows:        300000  Hashtable size: 299999  Memory usage:   229345424       rate:   0.227
2014-05-15 02:41:01     Processing rows:        400000  Hashtable size: 399999  Memory usage:   170296368       rate:   0.168
2014-05-15 02:41:01     Processing rows:        500000  Hashtable size: 499999  Memory usage:   285961568       rate:   0.283
2014-05-15 02:41:02     Processing rows:        600000  Hashtable size: 599999  Memory usage:   408727616       rate:   0.404
2014-05-15 02:41:02     Processing rows:        700000  Hashtable size: 699999  Memory usage:   333867920       rate:   0.33
2014-05-15 02:41:02     Processing rows:        800000  Hashtable size: 799999  Memory usage:   459541208       rate:   0.454
2014-05-15 02:41:03     Processing rows:        900000  Hashtable size: 899999  Memory usage:   391524456       rate:   0.387
2014-05-15 02:41:03     Processing rows:        1000000 Hashtable size: 999999  Memory usage:   514140152       rate:   0.508
2014-05-15 02:41:03     Processing rows:        1029052 Hashtable size: 1029052 Memory usage:   546126888       rate:   0.54
2014-05-15 02:41:03     Dump the hashtable into file: file:/tmp/hadoop/hive_2014-05-15_14-40-53_413_3806680380261480764/-local-10002/HashTable-Stage-4/MapJoin-mapfile01--.hashtable
2014-05-15 02:41:06     Upload 1 File to: file:/tmp/hadoop/hive_2014-05-15_14-40-53_413_3806680380261480764/-local-10002/HashTable-Stage-4/MapJoin-mapfile01--.hashtable File size: 68300588
2014-05-15 02:41:06     End of local task; Time Taken: 8.301 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 2 out of 2

log出错日志:
2014-05-15 13:52:54,007 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
    at java.io.ObjectInputStream$HandleTable.grow(ObjectInputStream.java:3465)
    at java.io.ObjectInputStream$HandleTable.assign(ObjectInputStream.java:3271)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1789)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at java.util.HashMap.readObject(HashMap.java:1183)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.hadoop.hive.ql.exec.persistence.HashMapWrapper.initilizePersistentHash(HashMapWrapper.java:128)
    at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
    at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:212)
    at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1377)
    at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1381)

来源: <http://xxxxxxxxxx:19888/jobhistory/logs/ST-L09-10-back-tj-yarn21:8034/container_1400064445468_0013_01_000002/attempt_1400064445468_0013_m_000000_0/hadoop/syslog/?start=0>
 
 
 

9 hive运行时 提示: failed to evaluate: <unbound>=Class.new(); ,升级到0.13.0
参考https://issues.apache.org/jira/browse/HIVE-4222
https://issues.apache.org/jira/browse/HIVE-3739
 
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]OKTime taken: 2.28 secondsjava.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing ...java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();Continuing ...java.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing ...java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();Continuing ...java.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing ...java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();Continuing ...java.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing ...java.lang.RuntimeException: failed to evaluate: <unbound>=Class.new();Continuing ...java.lang.InstantiationException: org.antlr.runtime.CommonTokenContinuing ...

这个应该升级后能解决,不过不知道为什么我升级12.0 和13.0 ,一运行就报错fileNotfundHIVE_PLANxxxxxxxxx 。ps (参考11)应该是我配置有问题,暂无解决方法。



10 hive 创建表或者数据库的时候 Couldnt obtain a new sequence (unique id) : You have an error in your SQL syntax
解决方法:这个是因为hive元数据库的名字是yarn-hive, sql中中划线是关键词,所以sql错误。把数据库名去掉中划线,问题解决。
错误日志:
FAILED: Error in metadata: MetaException(message:javax.jdo.JDOException: Couldnt obtain a new sequence (unique id) : You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-hive.`SEQUENCE_TABLE` WHERE `SEQUENCE_NAME`='org.apache.hadoop.hive.metastore.m' at line 1
        at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596)
        at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
        at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
        at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:643)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
        at com.sun.proxy.$Proxy14.createTable(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1070)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1103)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
        at com.sun.proxy.$Proxy15.create_table_with_environment_context(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:466)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:455)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
        at com.sun.proxy.$Proxy16.createTable(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:597)
        at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3777)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:256)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:144)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1362)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1146)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:952)
        at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
        at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
        at shark.SharkCliDriver.main(SharkCliDriver.scala)
NestedThrowablesStackTrace:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '-hive.`SEQUENCE_TABLE` WHERE `SEQUENCE_NAME`='org.apache.hadoop.hive.metastore.m' at line 1
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:406)
        at com.mysql.jdbc.Util.getInstance(Util.java:381)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1030)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3558)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3490)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1959)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2109)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2648)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2077)
        at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2228)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
        at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:381)
        at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:504)
        at org.datanucleus.store.rdbms.valuegenerator.SequenceTable.getNextVal(SequenceTable.java:197)
        at org.datanucleus.store.rdbms.valuegenerator.TableGenerator.reserveBlock(TableGenerator.java:190)
        at org.datanucleus.store.valuegenerator.AbstractGenerator.reserveBlock(AbstractGenerator.java:305)
        at org.datanucleus.store.rdbms.valuegenerator.AbstractRDBMSGenerator.obtainGenerationBlock(AbstractRDBMSGenerator.java:170)
        at org.datanucleus.store.valuegenerator.AbstractGenerator.obtainGenerationBlock(AbstractGenerator.java:197)
        at org.datanucleus.store.valuegenerator.AbstractGenerator.next(AbstractGenerator.java:105)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.getStrategyValueForGenerator(RDBMSStoreManager.java:2019)
        at org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1385)
        at org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3727)
        at org.datanucleus.state.JDOStateManager.setIdentity(JDOStateManager.java:2574)
        at org.datanucleus.state.JDOStateManager.initialiseForPersistentNew(JDOStateManager.java:526)
        at org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:202)
        at org.datanucleus.ExecutionContextImpl.newObjectProviderForPersistentNew(ExecutionContextImpl.java:1326)
        at org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2123)
        at org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:1972)
        at org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1820)
        at org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
        at org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:727)
        at org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
        at org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:643)


11 安装hive 12 和13 后,运行任务报错提示:FileNotFoundException: HIVE_PLAN
解决方法:可能是hive一个bug,也可能那里配置错了 ,待解决

错误日志

2014-05-16 10:27:07,896 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: Paths:/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000000_0:201326592+60792998,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000001_0_copy_1:201326592+58503492,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000001_0_copy_2:67108864+67108864,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000001_0_copy_2:134217728+67108864,/user/hive/warehouse/game_predata.db/game_login_log/dt=0000-00-00/000002_0_copy_1:67108864+67108864InputFormatClass: org.apache.hadoop.mapred.TextInputFormat

2014-05-16 10:27:07,954 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.io.FileNotFoundException: HIVE_PLAN14c8af69-0156-4633-9273-6a812eb91a4c (没有那个文件或目录)
    at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:230)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:381)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:168)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.FileNotFoundException: HIVE_PLAN14c8af69-0156-4633-9273-6a812eb91a4c (没有那个文件或目录)
    at java.io.FileInputStream.open(Native Method)
    at java.io.FileInputStream.<init>(FileInputStream.java:146)
    at java.io.FileInputStream.<init>(FileInputStream.java:101)
    at org.apache.hadoop.hive.ql.exec.Utilities.getMapRedWork(Utilities.java:221)
    ... 12 more

2014-05-16 10:27:07,957 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task

来源: <http://sxxxxxxxxxx:19888/jobhistory/logs/ST-L10-10-back-tj-yarn10:8034/container_1400136017046_0026_01_000030/attempt_1400136017046_0026_m_000000_0/hadoop>
 

12java.lang.OutOfMemoryError: GC overhead limit exceeded 
分析:这个是JDK6新添的错误类型。是发生在GC占用大量时间为释放很小空间的时候发生的,是一种保护机制。解决方案是,关闭该功能,可以添加JVM的启动参数来限制使用内存: -XX:-UseGCOverheadLimit 
添加位置是:mapred-site.xml 里新增项:mapred.child.java.opts 内容:-XX:-UseGCOverheadLimit
来源: <http://www.cnblogs.com/niocai/archive/2012/07/31/2616252.html>
参考14 
 

13hive   hive 0.10.0为了执行效率考虑,简单的查询,就是只是select,不带count,sum,group by这样的,都不走map/reduce,直接读取hdfs文件进行filter过滤。这样做的好处就是不新开mr任务,执行效率要提高不少,但是不好的地方就是用户界面不友好,有时候数据量大还是要等很长时间,但是又没有任何返回。

改这个很简单,在hive-site.xml里面有个配置参数叫

hive.fetch.task.conversion

将这个参数设置为more,简单查询就不走map/reduce了,设置为minimal,就任何简单select都会走map/reduce。


来源: <http://slaytanic.blog.51cto.com/2057708/1170431>
 参考14 

14 运行mr 任务的时候提示:

错误日志
Container [pid=30486,containerID=container_1400229396615_0011_01_000012] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.7 GB of 2.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1400229396615_0011_01_000012 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 30501 30486 30486 30486 (java) 3924 322 1720471552 262096 /opt/jdk1.7.0_55/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -XX:-UseGCOverheadLimit -Djava.io.tmpdir=/home/nodemanager/local/usercache/hadoop/appcache/application_1400229396615_0011/container_1400229396615_0011_01_000012/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 30.30.30.39 47925 attempt_1400229396615_0011_m_000000_0 12 |- 30486 12812 30486 30486 (bash) 0 0 108642304 302 /bin/bash -c /opt/jdk1.7.0_55/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1024m -XX:-UseGCOverheadLimit -Djava.io.tmpdir=/home/nodemanager/local/usercache/hadoop/appcache/application_1400229396615_0011/container_1400229396615_0011_01_000012/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 30.30.30.39 47925 attempt_1400229396615_0011_m_000000_0 12 1>/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012/stdout 2>/home/hadoop/logs/nodemanager/logs/application_1400229396615_0011/container_1400229396615_0011_01_000012/stderr Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143


来源: <http://xxxxxxxxxx:50030/proxy/application_1400229396615_0011/mapreduce/attempts/job_1400229396615_0011/m/FAILED>
 
解决方法:
 
下面的参数是关于mapreduce任务运行时的内存设置,如果有的任务需要可单独配置,就统一配置了。如果有container被kill 可以适当调高
mapreduce.map.memory.mb    map任务的最大内存
mapreduce.map.java.opts -Xmx1024M map任务jvm的参数
mapreduce.reduce.memory.mb  reduce任务的最大内存
mapreduce.reduce.java.opts -Xmx2560M reduce任务jvm的参数
mapreduce.task.io.sort.mb 512 Higher memory-limit while sorting data for efficiency.
 
摘自:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html#Configuring_the_Hadoop_Daemons_in_Non-Secure_Mode
关闭内存检测进程:
是在搞不清楚 问什么有的任务就物理内存200多MB ,虚拟内存就飙到2.7G了,估计内存检测进程有问题,而且我有的任务是需要大内存的,为了进度,索性关了,一下子解决所有内存问题。
yarn.nodemanager.pmem-check-enabled false
yarn.nodemanager.vmem-check-enabled false
 

15 yarn 的webUI 有关的调整:

yarn 集群部署,遇到的有关问题小结

1 cluser 页面 application的starttime 和finishtime 都是 UTC格式,改成 +8区时间也就是北京时间。

./share/hadoop/yarn/hadoop-yarn-common-2.3.0.jar 里面的webapps.static.yarn.dt.plugins.js
 
或者源码包里面:/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/yarn.dt.plugins.js

添加代码:
Date.prototype.Format = function (fmt) { //author: meizz 
    var o = {
        "M+": this.getMonth() + 1, //月份 
        "d+": this.getDate(), //日 
        "h+": this.getHours(), //小时 
        "m+": this.getMinutes(), //分 
        "s+": this.getSeconds(), //秒 
        "q+": Math.floor((this.getMonth() + 3) / 3), //季度 
        "S": this.getMilliseconds() //毫秒 
    };
    if (/(y+)/.test(fmt)) fmt = fmt.replace(RegExp.$1, (this.getFullYear() + "").substr(4 - RegExp.$1.length));
    for (var k in o)
    if (new RegExp("(" + k + ")").test(fmt)) fmt = fmt.replace(RegExp.$1, (RegExp.$1.length == 1) ? (o[k]) : (("00" + o[k]).substr(("" + o[k]).length)));
    return fmt;
};

 
同时按下面修改下的代码
function renderHadoopDate(data, type, full) 
{ if (type === 'display' || type === 'filter') { if(data === '0') { return "N/A"; } 
return new Date(parseInt(data)).Format("yyyy-MM-dd hh:mm:ss"); }


 

16  MR1的任务用到DistributedCache 的任务迁移到MR2上出错。原来我里面使用文件名区分不同的缓存文件,MR2里面分发文件以后只保留的文件名如:
application_xxxxxxx/container_14xxxx/part-m-00000
application_xxxxxxx/container_14xxxx/part-m-00001
application_xxxxxxx/container_14xxxx/00000_0



解决方法:每个缓存文件添加符号链接,链接为 父级名字+文件名
DistributedCache.addCacheFile(new URI(path.toString() + "#"+ path.getParent().getName() + "_" + path.getName()),
configuration);


这样就会生成带有文件名的缓存文件

 

 

 

分享到:
评论

相关推荐

    Sparkonyarn集群搭建详细过程.pdf

    Spark on YARN 集群搭建是指在 YARN 集群上部署 Spark 集群的过程。YARN(Yet Another Resource Negotiator)是 Hadoop 的资源管理层,Spark on YARN 集群搭建可以充分利用 YARN 的资源管理功能,提高 Spark 集群的...

    Spark实验:On Yarn模式安装部署(带答案)1

    在这个实验中,我们将详细探讨如何在Yarn模式下安装和部署Spark集群。 首先,我们需要准备实验环境,这里包括三台虚拟机,操作系统为CentOS 7.5,Hadoop版本为2.7.3,Spark版本为2.1.1。这些版本的兼容性对于实验的...

    YARN框架概述与集群部署.pdf

    ### YARN框架概述与集群部署知识点详解 #### 一、YARN框架的产生与发展 **1.1 YARN产生和发展简史** ##### 1.1.1 Hadoop演进阶段 - **Ad Hoc集群阶段**(阶段0): 在这一阶段,Hadoop集群主要用于临时搭建,通常...

    Spark on Yarn模式部署.docx

    Spark on Yarn 模式部署是指将 Spark 应用程序部署在 Yarn 集群上,使得 Spark 能够使用 Yarn 的资源管理和调度功能。这种部署方式可以提高 Spark 应用程序的性能和可靠性。 描述解释 本文档是 Spark on Yarn 模式...

    Spark on Yarn集群搭建手册

    Spark on Yan集群搭建的详细过程,减少集群搭建的时间

    HadoopHA集群部署、YARNHA测试Job教学课件.pptx

    【Hadoop HA 集群部署与 YARN HA 测试Job 教学】 在大数据处理领域,Hadoop 是一个至关重要的分布式计算框架,它提供了高可用性(HA)的特性来确保服务的连续性和稳定性。Hadoop HA 主要指的是 HDFS(Hadoop ...

    HadoopHA集群部署、YARNHA配置、启动与验证教学课件.pptx

    【Hadoop HA集群部署】是高可用(High Availability)配置的一种,主要目的是为了确保Hadoop分布式文件系统在遇到单点故障时,能够自动切换到备用节点,保证服务不间断。这通常涉及到NameNode HA,Secondary NameNode...

    机房hadoop集群部署

    "机房Hadoop集群部署"是一个复杂的过程,涉及到硬件准备、软件安装、配置优化以及系统测试等多个环节。以下是对这一主题的详细阐述: 1. **硬件规划**:机房中的Hadoop集群通常由多台服务器组成,包括NameNode(主...

    yarn与npm的命令行小结

     npm install === yarn —— install 安装是默认行为。  npm install taco --save === yarn add taco —— taco 包立即被保存到 package.json 中。  npm uninstall taco --save === yarn remove taco 在 npm 中,...

    1-1-HDFS+and+YARN.pdf

    Hadoop集群具体来说包含两个集群:HDFS集群和YARN集群,两者逻辑上分离,但物理上常在一起。 (1)HDFS集群:负责海量数据的存储,集群中的角色主要有 NameNode / DataNode/SecondaryNameNode。 (2)YARN集群:负责...

    一键启动HDFS集群、YARN集群、Hive服务脚本

    一键启动HDFS集群、YARN集群、Hive服务脚本

    一键停止HDFS集群、YARN集群、Hive服务脚本

    一键停止HDFS集群、YARN集群、Hive服务脚本

    大数据集群部署手册(最新最全)

    《大数据集群部署手册》是针对当前信息技术领域热门的大数据处理技术进行深度解析的参考资料,尤其在Cloudera CDH和Storm这两个关键组件上提供了详尽的指导。这本手册不仅覆盖了基础理论,还包含了实战操作,对于想...

    05.yarn集群安装启动.mp4

    05.yarn集群安装启动.mp4

    presto on yarn安装部署.docx

    Presto on YARN 安装部署指南 Presto on YARN 是 Presto 的分布式实现,能够在 Hadoop 生态系统中运行。下面是 Presto on YARN 的安装部署指南。 Presto on YARN 安装部署 1. 下载 Presto on YARN 的源代码,地址...

    Sparkonyarn集群搭建详细过程.docx

    Spark on YARN 集群搭建是一个复杂的过程,涉及到多台服务器的配置和软件的安装。以下是详细步骤和相关知识点: 1. **主机配置与网络通信** - `/etc/hosts` 文件配置至关重要,它用于解析主机名到IP地址的映射。...

    Hadoop集群部署方案.docx

    "Hadoop集群部署方案" Hadoop 集群部署方案是指在分布式系统中部署 Hadoop 集群的详细步骤和配置过程。下面是该方案的详细知识点解释: 1. Hadoop 简介 Hadoop 是Apache软件基金会旗下的开源项目,主要用于大数据...

    spark集群部署.docx

    Spark 集群部署涉及多种模式,包括Standalone、Mesos和YARN。每种模式都有其特定的应用场景和优势。在企业环境中,特别是当Hadoop YARN已存在时,通常会选择YARN模式来部署Spark,因为它能提供统一的资源管理和支持...

Global site tag (gtag.js) - Google Analytics