通过NIO实现Memcached multi get

maoyidao

浏览: 314027 次
性别:
来自: 北京

最近访客更多访客>>

xinhezai

coolone

ldlovemm

ed19900303

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

技术
高性能通讯架构

java NIO Memcached

Memcached（简称为：MC）在互联网广泛使用，是最基础的架构。但MC的mget（即一次获取多个值）一直是一个难题，我们的要求是mget性能上要尽量接近普通memcache get。下面通过一段伪代码介绍了如何以接近get single value的性能实现mget，并且就该架构在实际环境中遇到的一些问题加以讨论。

场景

在开始这个话题之前先考虑一个问题，为什么需要MC mget？Redis不是已经很好的实现了list，hashset，hashtable，zset等等丰富的数据结构吗？这个问题需要从本厂的应用场景开始。用户登陆之后会修改自己的状态，同时获得自己关注人的状态。修改自己的状态是一次MC set过程。自己的关注人列表可以从Redis中获得，此时key是用户的uid，value是关注任的list。获得自己关注人的状态则是根据关注人uid的一次MC get，时间复杂度是O(1)。可以这样做，在程序中执行一个for循环，依次从MC中get关注人状态，这个get过程的时间复杂度是O(n)。当关注人列表扩展到2000时，每次MC get平均耗时2~5ms，这种线性循环获取好友状态的办法要耗时10s，是完全无法接受的。怎么解决这个问题呢？

通过NIO实现mget，并发的执行MC get

danga.memcached2.0.1已经使用NIO框架来实现mget，但是它的实现有些问题，参考：http://blog.csdn.net/e_wsq/article/details/7876801。mget伪代码如下：

private final class Conn {
	public ByteBuffer outgoing;
	// 使用一个ByteBuffer list来存储从MC读出的内容
	public List<ByteBuffer> incoming = new ArrayList<ByteBuffer>();

	public Conn(Selector selector) {
		channel = getSock().getChannel();
		channel.configureBlocking( false );
		channel.register( selector, SelectionKey.OP_WRITE | SelectionKey.OP_READ, this );

		outgoing = ByteBuffer.wrap( request.append( "\r\n" ).toString().getBytes() );
	}

	public boolean isFinished() {
		// judge if get "END\r\n"
	}

	public ByteBuffer getBuffer() {
		int last = incoming.size()-1;
		if ( last >= 0 && incoming.get( last ).hasRemaining() ) {
			return incoming.get( last );
		}
		else {
			ByteBuffer newBuf = ByteBuffer.allocate( 8192 );
			incoming.add( newBuf );
			return newBuf;
		}
	}
}

public Object getMulti() throws Exception {
	selector = Selector.open();
	Conn conn = new Conn(selector);

   try {
		while(timeRemaining) {
			int n = selector.select(timeout));
			if ( n > 0 ) {
				Iterator<SelectionKey> it = selector.selectedKeys().iterator();
			   	while ( it.hasNext() ) {
			      SelectionKey key = it.next();
			      it.remove();

			      if ( key.isReadable() )
						readResponse( key );
					else if ( key.isWritable() )
						writeRequest( key );
			   }
			}
			else {
				// error...
			}

			timeRemaining = timeout - (SystemTimer.currentTimeMillis() - startTime);
		}
	}
	finally {
		selector.close();
	}
}

public void writeRequest( SelectionKey key ) throws IOException {
	ByteBuffer buf = ((Conn) key.attachment()).outgoing;
	SocketChannel sc = (SocketChannel)key.channel();

	if ( buf.hasRemaining() ) {
		sc.write( buf );
	}

	if ( !buf.hasRemaining() ) {
	   // switching to read mode for server
		key.interestOps( SelectionKey.OP_READ );
	}
}

public void readResponse( SelectionKey key ) throws IOException {
	Conn conn = (Conn)key.attachment();
	ByteBuffer buf = conn.getBuffer();
	int count = conn.channel.read( buf );
	if ( count > 0 ) {
		if ( log.isDebugEnabled() )
			log.debug( "read  " + count + " from " + conn.channel.socket().getInetAddress() );

		if ( conn.isFinished() ) {
			...
			return;
		}
	}
}

伪代码中主要给出了NIO中的一些逻辑。并发mget的好处是非常明显的，但这段代码有几个明显的坑。

mget伪代码的几个坑

1. Too many open files的坑

每次getMulti都执行Selector.open()?? Linux系统中，执行Selector.open()打开一对pipe（参考：http://blog.csdn.net/haoel/article/details/2224055），当后续IO慢时，Selector就不能及时关闭。造成大量pipe被创建，导致Too many open files错误。一般NIO的逻辑是只有一个全局selector，新channel注册后只需selector.wakeup() 即可。

2. 死循环的坑

Java6 NIO有两个众所周知的坑：http://bugs.sun.com/view_bug.do?bug_id=6693490和http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933。简单的说，就是Selector应该只在2种情况有返回值，即有网络事件发生或者超时。但是Selector有时却会在没有获得任何selectionKey的情况返回，这是一个Java6 NIO的bug。上面这段mget的伪代码中没有相关处理，容易造成死循环。我们可以参考MINA的解决方法，伪代码如下：

long t0 = System.currentTimeMillis();
int selected = select(1000L);
long t1 = System.currentTimeMillis();
long delta = (t1 - t0);

if ((selected == 0) && !wakeupCalled.get() && (delta < 100)) {
    // Last chance : the select() may have been
    // interrupted because we have had an closed channel.
    if (isBrokenConnection()) {
        LOG.warn("Broken connection");

        // we can reselect immediately
        // set back the flag to false
        wakeupCalled.getAndSet(false);

        continue;
    } else {
        LOG.warn("Create a new selector. Selected is 0, delta = " + (t1 - t0));
        // Ok, we are hit by the nasty epoll
        // spinning.
        // Basically, there is a race condition
        // which causes a closing file descriptor not to be
        // considered as available as a selected channel, but
        // it stopped the select. The next time we will
        // call select(), it will exit immediately for the same
        // reason, and do so forever, consuming 100%
        // CPU.
        // We have to destroy the selector, and
        // register all the socket on a new one.
        registerNewSelector();
    }

    // Set back the flag to false
    wakeupCalled.getAndSet(false);

    // and continue the loop
    continue;
}

这段代码非常清晰，触发条件是selector返回值为0，网络没有断开，并且时间<100ms就认为是触发了Java NIO的bug。处理的方法就是重建一个selector。另外一个可以参考的例子是Jetty的处理方法：http://wiki.eclipse.org/Jetty/Feature/JVM_NIO_Bug

2
顶

0
踩

分享到：

架构师笔记【杂】cache/容量及带宽规划/sys ... | 构建高性能服务（三）Java高性能缓冲设计 v ...

2012-11-30 23:58
浏览 6021
评论(0)
分类:企业架构
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论