Zookeeper的会话状态变迁图:
Connection Loss:
CONNECTION_LOSS意味着客户端和服务器端的连接断开,比如,客户端创建一个Zookeeper实例,开始客户端和服务端的会话,然后进行一系列的操作。如果客户端挂了,网络出现异常或者服务器端挂了,都会导致
客户端和服务器端的连接断开。连接断开时,如果客户端进程正常工作,它将收到一个Disconnected事件,收到此事件,客户端不能假设是服务器端挂了还是网络出现问题,同样,服务器如果仍然正常工作,也不能假设是客户端挂了还是出现了网络问题。
在Connection Loss的情况下,客户端不能假设它之前发出的请求是否已经成功执行,例如客户端发起一个创建一个znode的操作,这个请求的处理可能存在如下几种情况
1. 请求发送到服务器端,服务端执行完,返回的过程中,连接断开
2. 请求尚未发送到服务器端,因此请求压根没有执行
3. 请求发送到服务器端,请求在执行过程中,服务器端挂了
第三种情况是一种极限的情况,对于一致性要求很高的场景,这个请求已经执行的部分操作应该全部失败,服务器端的状态应该是请求未执行前的状态,Zookeeper的读写操作都是原子操作,因此可以保证不会部分读取和部分写入的情况,这就保证了数据一致性。
客户端与服务器断开链接后,客户端不能确定是网络链接问题还是Zookeeper服务器挂了,因此,客户端在受到COLLECTION_LOSS事件后,
1.客户端不需要重新创建一个Zookeeper会话,客户端在Zookeeper Client Library的帮助下会持续处于CONNECTING状态,不会出现会话超时的情况(虽然会话超时时间在客户端创建Zookeeper时指定,但是Zookeeper Client Libarary不会检测会话超时)
2.客户端需要检测上次上次操作的执行情况,比如通过检查znode是否存在以判断znode是否创建成功,检查znode的数据以判断znode是否更新成功
3.在1中提到,在服务器不可用的情况,客户端在Zookeeper Client Library的帮助下会持续处于CONNECTING状态,当Zookeeper服务器恢复可用的情况下,Zookeeper尝试于Zookeeper服务器恢复链接,加入在session超时之前,恢复链接,那么对于客户端来说,会话恢复,包括已经注册的watcher,客户端会受到一个SyncConnection事件;如果超时,那么客户端会收到一个Session Expired事件。
假如session的超时时间是10s,而session持续3s的时候链接断开,那么当链接恢复时,在7s内完成就会恢复会话,如果超过7s,那么客户端会收到session expired事件(这个计算方式是否正确??)
2. How should I handle the CONNECTION_LOSS error?
CONNECTION_LOSS means the link between the client and server was broken. It doesn't necessarily mean that the request failed. If you are doing a create request and the link was broken after the request reached the server and before the response was returned, the create request will succeed. If the link was broken before the packet went onto the wire, the create request failed. Unfortunately, there is no way for the client library to know, so it returns CONNECTION_LOSS. The programmer must figure out if the request succeeded or needs to be retried. Usually this is done in an application specific way. Examples of success detection include checking for the presence of a file to be created or checking the value of a znode to be modified.
When a client (session) becomes partitioned from the ZK serving cluster it will begin searching the list of servers that were specified during session creation. Eventually, when connectivity between the client and at least one of the servers is re-established, the session will either again transition to the "connected" state (if reconnected within the session timeout value) or it will transition to the "expired" state (if reconnected after the session timeout). The ZK client library will handle reconnect for you automatically. In particular we have heuristics built into the client library to handle things like "herd effect", etc... Only create a new session when you are notified of session expiration (mandatory).
Session Expired
客户端收到会话超时的事件后,表明这个Zookeeper对象已经不可再使用,需要重新初始化一个
客户端长时间不做增删改查znode操作,客户端并没有收到会话超时,原因是客户端会定时的向Zookeeper服务器端发送心跳包,以保持会话有效
客户端链接到一个Zookeeper集群中,如果它链接的server挂了,Zookeeper Client Library会自动将它与其它server链接,只要会话还没有超时,那么session会保持到原来的状态,包括已经在该session上注册的watcher
How should I handle SESSION_EXPIRED?
SESSION_EXPIRED automatically closes the ZooKeeper handle. In a correctly operating cluster, you should never see SESSION_EXPIRED. It means that the client was partitioned off from the ZooKeeper service for more the the session timeout and ZooKeeper decided that the client died. Because the ZooKeeper service is ground truth, the client should consider itself dead and go into recovery. If the client is only reading state from ZooKeeper, recovery means just reconnecting. In more complex applications, recovery means recreating ephemeral nodes, vying for leadership roles, and reconstructing published state.
Library writers should be conscious of the severity of the expired state and not try to recover from it. Instead libraries should return a fatal error. Even if the library is simply reading from ZooKeeper, the user of the library may also be doing other things with ZooKeeper that requires more complex recovery.
Session expiration is managed by the ZooKeeper cluster itself, not by the client. When the ZK client establishes a session with the cluster it provides a "timeout" value. This value is used by the cluster to determine when the client's session expires. Expirations happens when the cluster does not hear from the client within the specified session timeout period (i.e. no heartbeat). At session expiration the cluster will delete any/all ephemeral nodes owned by that session and immediately notify any/all connected clients of the change (anyone watching those znodes). At this point the client of the expired session is still disconnected from the cluster, it will not be notified of the session expiration until/unless it is able to re-establish a connection to the cluster. The client will stay in disconnected state until the TCP connection is re-established with the cluster, at which point the watcher of the expired session will receive the "session expired" notification.
Example state transitions for an expired session as seen by the expired session's watcher:
- 'connected' : session is established and client is communicating with cluster (client/server communication is operating properly)
- .... client is partitioned from the cluster
- 'disconnected' : client has lost connectivity with the cluster
- .... time elapses, after 'timeout' period the cluster expires the session, nothing is seen by client as it is disconnected from cluster
- .... time elapses, the client regains network level connectivity with the cluster
- 'expired' : eventually the client reconnects to the cluster, it is then notified of the expiration
参考
1. Zookeeper FAQ
http://wiki.apache.org/hadoop/ZooKeeper/FAQ#1
2. Zookeeper关于超时和链接断开的邮件组讨论
http://markmail.org/message/p5j7rvy5zf5qjsje#query:+page:1+mid:s4d7dnsxulv5yieh+state:results
3.关于Session Timeout和Connection Lost的Blog
http://www.ngdata.com/so-you-want-to-be-a-zookeeper/
- 大小: 49.1 KB
分享到:
相关推荐
zookeeper学习笔记
ZooKeeper会话超时以及重连机制
Zookeeper学习笔记
自己整理的ZooKeeper学习笔记,适合刚刚接触ZooKeeper的人学习
java ZooKeeper学习笔记\ZooKeeper原理、运用
ZooKeeper是一种为分布式应用所设计的高可用、高性能且一致的开源协调服务,...有了这些数据结构和原语还不够,因为我们的ZooKeeper是工作在一个分布式的环境下,我们的服务是通过消息以网络的形式发送给我们的分布式应
本文适合但不限于软件开发人员阅读。本文档能够使阅读者对zookeeper有一个宏观且全面的了解,内容主要包含zookeeper架构、数据模型、读写及工作原理、典型应用场景、指令汇总等,
尚硅谷2021 zookeeper 笔记
zookeeper客户端连接工具,亲测有效
zookeeper笔记
hadoop,hbase,zookeeper安装笔记hadoop,hbase,zookeeper安装笔记hadoop,hbase,zookeeper安装笔记
资源名称:zookeeper笔记和搭建 资源太大,传百度网盘了,链接在附件中,有需要的同学自取。
zookeeper笔记
zookeeper笔记.pdf
zookeeper连接工具,可视化数据查看
zookeeper连接工具zktools.rar
如果在创建znode时Flag设置为EPHEMERAL,那么当创建这个znode的节点和Zookeeper失去连接后,这个znode将不再存在在Zookeeper里,Zookeeper使用Watcher察觉事件信息。当客户端接收到事件信息,比如连接超时、节点数据...
适合初学入门,知识巩固。涵盖安装配置、命令操作、Java API操作、事件监听、分布式锁、集群搭建等知识
今天小编就为大家分享一篇关于Zookeeper连接超时问题与拒绝连接的解决方案,小编觉得内容挺不错的,现在分享给大家,具有很好的参考价值,需要的朋友一起跟随小编来看看吧