执行Hive语句运行MapReduce程序时突然出现这样的异常:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapred.reduce.tasks=<number>
Starting Job = job_1510149586421_0024, Tracking URL = http://shizhan:8088/proxy/application_1510149586421_0024/
Kill Command = /root/apps/hadoop-2.6.4/bin/hadoop job -Dmapred.job.tracker=ignorethis -kill job_1510149586421_0024
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-11-08 22:55:28,184 Stage-1 map = 0%, reduce = 0%
2017-11-08 22:55:37,507 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:38,539 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:39,564 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:40,588 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:41,615 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:42,644 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:43,679 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:44,709 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:45,737 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:46,770 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:47,794 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
2017-11-08 22:55:48,823 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.4 sec
2017-11-08 22:55:49,855 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 1.4 sec
MapReduce Total cumulative CPU time: 1 seconds 400 msec
Ended Job = job_1510149586421_0024 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1510149586421_0024_m_000000 (and more) from job job_1510149586421_0024
Exception in thread "Thread-414" java.lang.IllegalArgumentException: Does not contain a valid host:port authority: ignorethisat org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212)at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153)at org.apache.hadoop.hive.ql.exec.JobTrackerURLResolver.getURL(JobTrackerURLResolver.java:42)at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:209)at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:92)at java.lang.Thread.run(Thread.java:744)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 1.4 sec HDFS Read: 1125 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 1 seconds 400 msec
hive> date -s '2017-11-8 22:58:00'> ;
FAILED: Parse Error: line 1:0 cannot recognize input near 'date' '-' 's'
平时怎么操作都没问题,今天突然就抛出这样的异常,也有点让人摸不着头脑。重点是执行某些简单的查询语句时没有任何问题,一旦查询语句复杂点,就会抛出这样的异常。根据抛出的异常信息发现不了什么,于是就去查看hadoop打印的log日志,发现了问题的原因。
关键日志如下:
2017-11-08 22:54:52,543 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1510149586421_0023
2017-11-08 22:54:52,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1510149586421_0023 failed 2 times due to Error launching appattempt_1510149586421_0023_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1510181693928 found 1510153492530
Note: System times on machines may be out of sync. Check system time and time zones.at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
. Failing the application.
2017-11-08 22:54:52,545 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1510149586421_0023 State change from FINAL_SAVING to FAILED
2017-11-08 22:54:52,545 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1510149586421_0023 failed 2 times due to Error launching appattempt_1510149586421_0023_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
This token is expired. current time is 1510181693928 found 1510153492530
Note: System times on machines may be out of sync. Check system time and time zones.at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)at java.lang.reflect.Constructor.newInstance(Constructor.java:526)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)
. Failing the application. APPID=application_1510149586421_0023
2017-11-08 22:54:52,545 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary:
可以看到这样的一行:Note: System times on machines may be out of sync. Check system time and time zones.
原来是节点间的时间不同步。。。
这样也能解释下为什么有些简单的语句可以执行,有些复杂的却执行不了。
原因就是当我执行简单语句时,maptask和reducetask都在同一个节点上执行,不会有时间不同步问题;而当我执行复杂的语句时,也许有一个或多个reducetask,如果是多个的话,那么多个reducetask分布在不同的节点上,时间就会不同步,最终写出数据到文件时就会有时间差异,所以才会报错。如果仍是一个,那这一个reducetask肯定在另一个节点上,时间不同步,才会抛出异常。(比如我reduce的时间要早于map的时间)
但是从最初的异常信息来看,我个人认为是在最终将reduce的结果输出到文件的这个阶段出现了异常(reduce阶段已经达到100%)。当然,如果想确认在那一块出现了问题,还得去查看源码。