0.前言
本文为日常工作记录,烂笔头系列。
源码前面,了无秘密 — by 侯杰
近期的一个C++项目里使用了Zookeeper做服务发现,期间遇到了SessionTimeOut问题的困扰,明明通过zookeeper c client设置了超时时间,但无效。
请原谅我一开始对zookeeper不熟悉。最终通过分析源码了解到SessionTimeOut最终的确定是一个协商的过程,而不是简单的配置生效。
在这里记录下Session超时时间的有关分析,基于zookeeper 3.4.8
1.zookeeper client SessionTimeOut
项目中使用的是 C ,通过zookeer_init
创建zk session,调用了zookeeper_init其实就开始了建立链接到ZK集群的过程,这里设置的recv_timeout 为客户端所期望的session超时时间,单位为毫秒。
ZOOAPI zhandle_t *zookeeper_init(const char *host, watcher_fn fn, int recv_timeout, const clientid_t *clientid, void *context, int flags);
连接成功之后客户端发起握手协议,可以看到之前设置的recv_timeout随握手协议一起发送给服务端,。
static int prime_connection(zhandle_t *zh){ int rc; /*this is the size of buffer to serialize req into*/ char buffer_req[HANDSHAKE_REQ_SIZE]; int len = sizeof(buffer_req); int hlen = 0; struct connect_req req; req.protocolVersion = 0; req.sessionId = zh->seen_rw_server_before ? zh->client_id.client_id : 0; req.passwd_len = sizeof(req.passwd); memcpy(req.passwd, zh->client_id.passwd, sizeof(zh->client_id.passwd)); req.timeOut = zh->recv_timeout; <-这里设置timeOut req.lastZxidSeen = zh->last_zxid; req.readOnly = zh->allow_read_only; hlen = htonl(len); /* We are running fast and loose here, but this string should fit in the initial buffer! */ rc=zookeeper_send(zh->fd, &hlen, sizeof(len)); serialize_prime_connect(&req, buffer_req); rc=rc<0 ? rc : zookeeper_send(zh->fd, buffer_req, len); if (rc<0) { return handle_socket_error_msg(zh, __LINE__, ZCONNECTIONLOSS, "failed to send a handshake packet: %s", strerror(errno)); }
再来看看处理握手协议Resp的逻辑
static int check_events(zhandle_t *zh, int events){ if (zh->fd == -1) return ZINVALIDSTATE; …… …… …… deserialize_prime_response(&zh->primer_storage, zh->primer_buffer.buffer); /* We are processing the primer_buffer, so we need to finish * the connection handshake */ oldid = zh->client_id.client_id; newid = zh->primer_storage.sessionId; if (oldid != 0 && oldid != newid) { zh->state = ZOO_EXPIRED_SESSION_STATE; errno = ESTALE; return handle_socket_error_msg(zh,__LINE__,ZSESSIONEXPIRED, "sessionId=%#llx has expired.",oldid); } else { zh->recv_timeout = zh->primer_storage.timeOut; //设置为Resp的Timeout zh->client_id.client_id = newid; }
至此可以发现,最终客户端的SessionTimeOut时间实际是经过服务端下发之后的,并不一定是最先设置的。
2.Zookeeper Server SessionTimeOut
2.1协商客户端上报的SessionTimeOut
来看看服务端握手的处理逻辑。
public void processConnectRequest(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException { BinaryInputArchive bia = BinaryInputArchive.getArchive(new ByteBufferInputStream(incomingBuffer)); ConnectRequest connReq = new ConnectRequest(); connReq.deserialize(bia, "connect");……………… //根据客户端上报的timeout和服务端自身的minSessionTimeOut。 //如果上报的timeout小于minSessionTimeOut则 设置timeout为minSessionTimeOut. //如果上报的timeout大于maxSessionTimeOut则 设置timeout为maxSessionTimeOut. //如果介于两则之间,则以上报的时间为准。 int sessionTimeout = connReq.getTimeOut(); byte passwd[] = connReq.getPasswd(); int minSessionTimeout = getMinSessionTimeout(); if (sessionTimeout < minSessionTimeout) { sessionTimeout = minSessionTimeout; } int maxSessionTimeout = getMaxSessionTimeout(); if (sessionTimeout > maxSessionTimeout) { sessionTimeout = maxSessionTimeout; } cnxn.setSessionTimeout(sessionTimeout);……………… }
可以一句话概括,客户端上报的期望timeout一定要在服务端设置的上下界之间,如果越过边界,则以边界为准。
2.2 服务端MinSessionTimeOut和MaxSessionTimeOut的确定
继续看和
public static final int DEFAULT_TICK_TIME = 3000; protected int tickTime = DEFAULT_TICK_TIME; /** value of -1 indicates unset, use default */ protected int minSessionTimeout = -1; /** value of -1 indicates unset, use default */ protected int maxSessionTimeout = -1; protected SessionTracker sessionTracker;
tickTime为3000毫秒,minSessionTimeOut和maxSessionTimeOut缺省值为-1
public int getTickTime() { return tickTime; } public void setTickTime(int tickTime) { LOG.info("tickTime set to " + tickTime); this.tickTime = tickTime; } public int getMinSessionTimeout() { return minSessionTimeout == -1 ? tickTime * 2 : minSessionTimeout; //如果minSessionTimeOut为缺省值这设置minSessionTimeOut为2倍tickTime } public void setMinSessionTimeout(int min) { LOG.info("minSessionTimeout set to " + min); this.minSessionTimeout = min; } public int getMaxSessionTimeout() { return maxSessionTimeout == -1 ? tickTime * 20 : maxSessionTimeout; //如果maxSessionTimeout为缺省值则设置maxSessionTimeout为20倍tickTime } public void setMaxSessionTimeout(int max) { LOG.info("maxSessionTimeout set to " + max); this.maxSessionTimeout = max; }
可以知道minSessionTimeOut和maxSessionTimeOut在缺省的时候则跟tickTime有关,分别为2和20倍tickTime,继续分析。
public ZooKeeperServer(FileTxnSnapLog txnLogFactory, int tickTime, int minSessionTimeout, int maxSessionTimeout, DataTreeBuilder treeBuilder, ZKDatabase zkDb) { serverStats = new ServerStats(this); this.txnLogFactory = txnLogFactory; this.zkDb = zkDb; this.tickTime = tickTime; this.minSessionTimeout = minSessionTimeout; this.maxSessionTimeout = maxSessionTimeout; LOG.info("Created server with tickTime " + tickTime + " minSessionTimeout " + getMinSessionTimeout() + " maxSessionTimeout " + getMaxSessionTimeout() + " datadir " + txnLogFactory.getDataDir() + " snapdir " + txnLogFactory.getSnapDir()); }
tickTime、minSessionTimeOut、maxSessionTimeOut实际构造函数传入,当然还有一个无参构造函数以及一些setter和getter可以设置这几个参数。
继续分析
public void runFromConfig(ServerConfig config) throws IOException { LOG.info("Starting server"); FileTxnSnapLog txnLog = null; try { // Note that this thread isn't going to be doing anything else, // so rather than spawning another thread, we will just call // run() in this thread. // create a file logger url from the command line args ZooKeeperServer zkServer = new ZooKeeperServer(); txnLog = new FileTxnSnapLog(new File(config.dataLogDir), new File( config.dataDir)); zkServer.setTxnLogFactory(txnLog); zkServer.setTickTime(config.tickTime); //我们可以发现实际运行的几个参数除了默认值以外,可以通过配置文件来配置生效。 zkServer.setMinSessionTimeout(config.minSessionTimeout); zkServer.setMaxSessionTimeout(config.maxSessionTimeout); cnxnFactory = ServerCnxnFactory.createFactory(); cnxnFactory.configure(config.getClientPortAddress(), config.getMaxClientCnxns()); cnxnFactory.startup(zkServer); cnxnFactory.join(); if (zkServer.isRunning()) { zkServer.shutdown(); } } catch (InterruptedException e) { // warn, but generally this is ok LOG.warn("Server interrupted", e); } finally { if (txnLog != null) { txnLog.close(); } } }
到此问题就明了了,我们可以通过配置来修改SessionTimeOut,默认配置文件只配置了tickTime,如下。
# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.dataDir=/tmp/zookeeper# the port at which the clients will connectclientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1
3.总结
经过源码分析,得出SessionTimeOut的协商如下:
情况1: 配置文件配置了maxSessionTimeOut和minSessionTimeOut
最终SessionTimeOut,必须在minSessionTimeOut和maxSessionTimeOut区间里,如果跨越上下界,则以跨越的上届或下界为准。
情况2:配置文件没有配置maxSessionTimeOut和minSessionTimeOut
maxSessionTimeout没配置则 maxSessionTimeOut设置为 20 * tickTime
minSessionTimeOut没配置则 minSessionTimeOut设置为 2 * tickTime
也就是默认情况下, SessionTimeOut的合法范围为 4秒~40秒,默认配置中tickTime为2秒。
如果tickTime也没配置,那么tickTime缺省为3秒。
遇到问题从源码分析一定是最好的,能使得理解更深入记忆更深刻。
最后 ending...如有不足请指点,亦可留言或联系 fobcrackgp@163.com.
本文为笃行原创文章首发于,永久链接:
https://www.ifobnn.com/zookeepersessiontimeout.html