`
尘事随缘
  • 浏览: 9755 次
社区版块
存档分类
最新评论

hbase故障分析之-regionserver突然宕机启动后也是宕机

阅读更多
近日发现测试环境中的RegionServer总是突然宕机,重新启动节点依然无效,无耐看了半天日志发现如下信息:
2015-02-13 05:40:04,325 WARN  [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/slave2,60020,1423777199540 already deleted, retry=false
2015-02-13 05:40:04,325 WARN  [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/slave2,60020,1423777199540
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:179)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1342)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1054)
	at java.lang.Thread.run(Thread.java:745)
2015-02-13 05:40:04,329 INFO  [regionserver60020-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-02-13 05:40:04,329 INFO  [regionserver60020] zookeeper.ZooKeeper: Session: 0x14b7113ebc50012 closed
2015-02-13 05:40:04,329 INFO  [regionserver60020] regionserver.HRegionServer: stopping server null; zookeeper connection closed.
2015-02-13 05:40:04,330 INFO  [regionserver60020] regionserver.HRegionServer: regionserver60020 exiting


找了半天问题任然没有解决,无头绪中。。。。

喝杯茶,继续往上翻,突然发现救命稻草:
2015-02-13 05:40:04,294 FATAL [regionserver60020] [color=red]regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: [/color]org.apache.hadoop.hbase.ClockOutOfSyncException: Server slave2,60020,1423777199540 has been rejected; Reported time is too far out of sync with master.  Time difference of 71419ms > max allowed of 30000ms
	at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:345)
	at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:238)
	at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1294)
	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:7910)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
	at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)


问题找到了,由于是服务器Master的时间和RegionServer的时间不一致,没有装时间同步服务,导致此问题发生。
手动修改下RegionServer的时间 data -s 时间 ,重启RegionServer问题解决。

下一步需要在测试环境也安装NTP服务。
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics