`
cloudeagle
  • 浏览: 106152 次
  • 性别: Icon_minigender_1
  • 来自: 合肥
社区版块
存档分类
最新评论

yarn2.2.0安装问题记录

 
阅读更多

1. mapreduce.shuffle set in yarn.nodemanager.aux-services is invalid.The valid service name should only contain a-zA-Z0-9_ and can not start with numbers


解决办法:

在yarn-site.xml 配置文件中增加:

  1. <property>
  2. <name>yarn.nodemanager.aux-services</name>
  3. <value>mapreduce_shuffle</value>
  4. </property>

重启就ok了

这个问题其实是由于

  1. yarn.nodemanager.aux-services
配置错误或者没有配置,其实,如果没有配置这个参数应该会有默认值。这是目前版本的一个小bug:

https://issues.apache.org/jira/i#browse/YARN-1289



2.Datanode denied communication with namenode: DatanodeRegistration(0.0.0.0, storageID=DS-445663431-127.0.0.1-50010-1394867858930, infoPort=50075, ipcPort=50020, storageInfo=lv=-47;cid=CID-982d36dc-3def-47fa-8cf7-9f2f19089eaa;nsid=1165019335;c=0)


网上查了一些资料,说了可能是两种原因造成的:

1. clusterID不一致,namenode的cid和datanode的cid不一致,导致的原因是对namenode进行format的之后,datanode不会进行format,所以datanode里面的cid还是和format之前namenode的cid一样,解决办法是删除datanode里面的dfs.datanode.data.dir目录和tmp目录,然后再启动start-dfs.sh
2.即使删除iptables之后,仍然报Datanode denied communication with namenode: DatanodeRegistration错误,参考文章http://stackoverflow.com/questions/17082789/cdh4-3exception-from-the-logs-after-start-dfs-sh-datanode-and-namenode-star,可以知道需要把集群里面每个houst对应的ip写入/etc/hosts文件就能解决问题。


我每次datanode数据目录下和tmp都会清空,因此不是第一种问题,但是第2个问题我没看懂,我猜是DNS解析是/etc/hosts文件出的问题,我原hdfs配置文件虽然配的都是本机,但是用的不是localhost, 而是ip地址的形式,我猜是这个原因,因此将配置文件中所有ip地址改为localhost, 问题解决。


参考:http://wang-2011-ying.iteye.com/blog/1996654

http://grokbase.com/t/cloudera/scm-users/135y4jn3dw/datanode-denied-connection-with-namenode


3.2014-03-15 16:15:10,307 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to login
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to login
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:631)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:872)
Caused by: java.net.UnknownHostException: localhost.localdomain: localhost.localdomain
at java.net.InetAddress.getLocalHost(InetAddress.java:1425)
at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:227)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:247)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.doSecureLogin(ResourceManager.java:685)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:629)
... 2 more
2014-03-15 16:15:10,308 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
2014-03-15 16:15:10,308 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.
2014-03-15 16:15:10,308 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
2014-03-15 16:15:10,309 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to login
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:631)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:872)
Caused by: java.net.UnknownHostException: localhost.localdomain: localhost.localdomain
at java.net.InetAddress.getLocalHost(InetAddress.java:1425)
at org.apache.hadoop.security.SecurityUtil.getLocalHostName(SecurityUtil.java:227)
at org.apache.hadoop.security.SecurityUtil.login(SecurityUtil.java:247)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.doSecureLogin(ResourceManager.java:685)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:629)


以上问题是因为我在第2个问题中只改了hdfs.xml, 没有改/etc/hosts文件,

没辙,只能再改:

# Do not remove the following line, or various programs
# that require network functionality will fail.
#127.0.0.1 localhost.localdomain localhost
#::1 localhost6.localdomain6 localhost6
127.0.0.1 localhost


注意了,这里光改这个是不行的,还要修改/etc/sysconfig/network

执行hostname命令可以看到当前的主机名

也就是说,Hadoop在格式化HDFS的时候,通过hostname命令获取到的主机名是localhost.localdomain,然后在/etc/hosts文件中进行映射的时候,没有找到,也就说,通过localhost.localdomain根本无法映射到一个IP地址,所以报错了。修改/etc/sysconfig/network文件:

NETWORKING=yes
NETWORKING_IPV6=yes
#HOSTNAME=localhost.localdomain
HOSTNAME=localhost


重新执行mapreduc作业:

hadoop jar hadoop-mapreduce-examples-2.2.0.jar pi 200 1000

OK.

参考:http://blog.csdn.net/shirdrn/article/details/6562292

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics