- 浏览: 275954 次
文章分类
最新评论
-
feargod:
...
ActivityGroup的子activity响应back事件的顺序问题 -
hoarhoar:
谢谢你,终于解决了,我真是受够了,总是45秒钟,真是疯了。
youku 的广告必须要屏蔽 -
lilai:
...
youku 的广告必须要屏蔽 -
aijuans2:
...
youku 的广告必须要屏蔽 -
weiwo1978:
说的非常好,mark
SELECT语句执行的顺序
今天来总看下namenode这个类的主要功能
首先看下这个类的注释说明:
- /**********************************************************
- * NameNode serves as both directory namespace manager and
- * "inode table" for the Hadoop DFS. There is a single NameNode
- * running in any DFS deployment. (Well, except when there
- * is a second backup/failover NameNode.)
- *
- * The NameNode controls two critical tables:
- * 1) filename->blocksequence (namespace)
- * 2) block->machinelist ("inodes")
- *
- * The first table is stored on disk and is very precious.
- * The second table is rebuilt every time the NameNode comes
- * up.
- *
- * 'NameNode' refers to both this class as well as the 'NameNode server'.
- * The 'FSNamesystem' class actually performs most of the filesystem
- * management. The majority of the 'NameNode' class itself is concerned
- * with exposing the IPC interface and the http server to the outside world,
- * plus some configuration management.
- *
- * NameNode implements the ClientProtocol interface, which allows
- * clients to ask for DFS services. ClientProtocol is not
- * designed for direct use by authors of DFS client code. End-users
- * should instead use the org.apache.nutch.hadoop.fs.FileSystem class.
- *
- * NameNode also implements the DatanodeProtocol interface, used by
- * DataNode programs that actually store DFS data blocks. These
- * methods are invoked repeatedly and automatically by all the
- * DataNodes in a DFS deployment.
- *
- * NameNode also implements the NamenodeProtocol interface, used by
- * secondary namenodes or rebalancing processes to get partial namenode's
- * state, for example partial blocksMap etc.
- **********************************************************/
注释里说namenode这个节点维护的信息是:文件与文件块的关联关系(也叫namespache),这个主要由之前blog里的INodeFile这个类体现出来。文件块与机器节点之间的关系,可以用DatanodeDescriptor和BlockInfo体现出来,以后会写篇blog分析如何依据这3个模型来查询与新增修改数据关系等。
file与block的对应关系这个会持久化,如果namenode重启时需要加载这些数据,对于block与datanode的关系,因为只有持有数据的人才有资格说话,所以这个数据为了保证是最新的,需要由datanode来report给namenode,这个report是每次在心跳时一起发送的(具体分析datanode时再细说),在namenode重启时进入到safemode,datanode就会连接上namenode然后report自己磁盘上所有的block信息,当达到一定的额度就认为所有的datanode与block的信息report完毕然后退出safemode。
主要的NameNode Server的工作都是交给FSNamesystem来完成的,这个类有4700多行,他自己只是参与IPC与jetty web的功能,所以Namenode的
主要功能都在 FSNamesystem这个类里实现的。
因为Namenode自身的重要性,所以他作为一个枢纽联系着client datanode secondNameNode,从设计哲学上说就变成了namenode这个类需要实现满足枢纽功能的三个接口,分别对应与client datanode secondName做交互,当你觉得namenode需要另外一种职责时可以定义新的接口,然后让namenode实现即可。
首先是ClientProtocol,先看下他的注释说明
- /**********************************************************************
- * ClientProtocol is used by user code via
- * {@link org.apache.hadoop.hdfs.DistributedFileSystem} class to communicate
- * with the NameNode. User code can manipulate the directory namespace,
- * as well as open/close file streams, etc.
- *
- **********************************************************************/
例如client可以调用FileSystem来完成文件的上传,删除等等,我们可以看到这个接口里比较常用的2个函数
新建文件的
- /**
- * Create a new file entry in the namespace.
- * <p>
- * This will create an empty file specified by the source path.
- * The path should reflect a full path originated at the root.
- * The name-node does not have a notion of "current" directory for a client.
- * <p>
- * Once created, the file is visible and available for read to other clients.
- * Although, other clients cannot {@link #delete(String)}, re-create or
- * {@link #rename(String, String)} it until the file is completed
- * or explicitly as a result of lease expiration.
- * <p>
- * Blocks have a maximum size. Clients that intend to
- * create multi-block files must also use {@link #addBlock(String, String)}.
- *
- * @param src path of the file being created.
- * @param masked masked permission.
- * @param clientName name of the current client.
- * @param overwrite indicates whether the file should be
- * overwritten if it already exists.
- * @param replication block replication factor.
- * @param blockSize maximum block size.
- *
- * @throws AccessControlException if permission to create file is
- * denied by the system. As usually on the client side the exception will
- * be wrapped into {@link org.apache.hadoop.ipc.RemoteException}.
- * @throws QuotaExceededException if the file creation violates
- * any quota restriction
- * @throws IOException if other errors occur.
- */
- public void create(String src,
- FsPermission masked,
- String clientName,
- boolean overwrite,
- short replication,
- long blockSize
- )
删除文件的,对应的fs命令为rm -rm/-rmr
- /**
- * Delete the given file or directory from the file system.
- * <p>
- * same as delete but provides a way to avoid accidentally
- * deleting non empty directories programmatically.
- * @param src existing name
- * @param recursive if true deletes a non empty directory recursively,
- * else throws an exception.
- * @return true only if the existing file or directory was actually removed
- * from the file system.
- */
- public boolean delete(String src, boolean recursive)
再来看下DatanodeProtocol,以下是注释说明
- /* NameNode also implements the DatanodeProtocol interface, used by
- * DataNode programs that actually store DFS data blocks. These
- * methods are invoked repeatedly and automatically by all the
- * DataNodes in a DFS deployment.
- **********************************************************************/
- /**********************************************************************
- * Protocol that a DFS datanode uses to communicate with the NameNode.
- * It's used to upload current load information and block reports.
- *
- * The only way a NameNode can communicate with a DataNode is by
- * returning values from these functions.
- *
- **********************************************************************/
具体看几个常用函数就明白这个意思了,同时注意还有个返回值,是namenode返回给datanode的,这个等分析datanode再说。
- /** Register Datanode.
- *
- * @see org.apache.hadoop.hdfs.server.datanode.DataNode#dnRegistration
- * @see org.apache.hadoop.hdfs.server.namenode.FSNamesystem#registerDatanode(DatanodeRegistration)
- *
- * @return updated {@link org.apache.hadoop.hdfs.server.protocol.DatanodeRegistration}, which contains
- * new storageID if the datanode did not have one and
- * registration ID for further communication.
- */
- public DatanodeRegistration register(DatanodeRegistration registration)
- 当datanode启动时就调用这个。
- /** sendHeartbeat() tells the NameNode that the DataNode is still
- * alive and well. Includes some status info, too.
- * It also gives the NameNode a chance to return
- * an array of "DatanodeCommand" objects.
- * A DatanodeCommand tells the DataNode to invalidate local block(s),
- * or to copy them to other DataNodes, etc.
- */
- public DatanodeCommand[] sendHeartbeat(DatanodeRegistration registration,
- long capacity,
- long dfsUsed, long remaining,
- int xmitsInProgress,
- int xceiverCount)
- /**
- * blockReport() tells the NameNode about all the locally-stored blocks.
- * The NameNode returns an array of Blocks that have become obsolete
- * and should be deleted. This function is meant to upload *all*
- * the locally-stored blocks. It's invoked upon startup and then
- * infrequently afterwards.
- * @param registration
- * @param blocks - the block list as an array of longs.
- * Each block is represented as 2 longs.
- * This is done instead of Block[] to reduce memory used by block reports.
- *
- * @return - the next command for DN to process.
- * @throws IOException
- */
- public DatanodeCommand blockReport(DatanodeRegistration registration,
- long [] blocks)
再来看下NamenodeProtocol这个接口,首先看下注释
- /* NameNode also implements the NamenodeProtocol interface, used by
- * secondary namenodes or rebalancing processes to get partial namenode's
- * state, for example partial blocksMap etc.
- */
- /*****************************************************************************
- * Protocol that a secondary NameNode uses to communicate with the NameNode.
- * It's used to get part of the name node state
- *****************************************************************************/
这个接口提供了2个功能,一个是和secondNamenode之间的交互使用,另外一个是给balance使用,那么分别看看这2个功能接口。
balance主要是查看单个datanode机器上的block数,如果发现某个机器上的数很大,超出普通的那就需要将这个机器上的block移动到一个block数少的机器上了,那么做操作之前需要有一些数据支持,所以由namenode来提供了,接口是
- /** Get a list of blocks belonged to <code>datanode</code>
- * whose total size is equal to <code>size</code>
- * @param datanode a data node
- * @param size requested size
- * @return a list of blocks & their locations
- * @throws RemoteException if size is less than or equal to 0 or
- datanode does not exist
- */
- public BlocksWithLocations getBlocks(DatanodeInfo datanode, long size)
- throws IOException;
这个具体的调用实现等分析到balance时再细细分析下。
和secondeNamenode相关的接口如下:
- /**
- * Closes the current edit log and opens a new one. The
- * call fails if the file system is in SafeMode.
- * @throws IOException
- * @return a unique token to identify this transaction.
- */
- public CheckpointSignature rollEditLog() throws IOException;
- /**
- * Rolls the fsImage log. It removes the old fsImage, copies the
- * new image to fsImage, removes the old edits and renames edits.new
- * to edits. The call fails if any of the four files are missing.
- * @throws IOException
- */
- public void rollFsImage() throws IOException;
针对备份或者说二级namenode的具体实施过程需要有个流程图的东西来解释下。
更多信息请查看 java进阶网 http://www.javady.com
发表评论
-
hadoop FSNamesystem中的recentInvalidateSets
2012-04-20 20:28 970今天早就回来了,然后偷懒了2个小时,现在才开始分析代码, ... -
hadoop namenode后台jetty web
2012-04-20 20:28 1651现在开始分析namenode启动时开启的第2类线程, ... -
hadoop namenode format做了什么?
2012-04-18 20:58 1064一看到format就和磁盘格式化联想到一起,然后这个fo ... -
hadoop分布式配置(服务器系统为centos5,配置时使用的用户是root)
2012-04-14 21:19 1007目前我们使 ... -
hadoop系列A:多文件输出
2012-04-14 21:18 1344package org.myorg; import ... -
Hadoop 安装问题和解决方案
2012-04-10 13:21 1190前几天在Window和Linux主机安装了Hadoop, ... -
运行Hadoop遇到的问题
2012-04-10 13:19 1532运行Hadoop遇到的问题 1, 伪分布式模式 ... -
运行Hadoop遇到的问题
2012-04-10 13:19 0运行Hadoop遇到的问题 1, 伪分布式模式 ... -
hadoop使用过程中的一些小技巧
2012-04-09 10:16 1104hadoop使用过程中的一些小技巧 ------------- ... -
运行hadoop时的一些技巧
2012-04-09 10:14 731//用来给key分区的,需要实现Partitioner接口 ... -
hive相关操作文档收集
2012-04-08 10:51 0How to load data into Hive ... -
hive sql doc
2012-04-08 10:51 0记录2个常用的hive sql语法查询地 官方 ht ... -
hive Required table missing : "`DBS`" in Catalog "" Schema "
2012-04-08 10:51 0最近需要提取一些数据,故开始使用hive,本机搭建了一个hiv ... -
HDFS数据兼容拷贝
2012-04-08 10:50 0系统中使用了hadoop 19.2 20.2 2个版本,为啥有 ... -
hdfs 简单的api 读写文件
2012-04-08 10:50 0Java代码 import ... -
hbase之htable线程安全性
2012-04-22 15:22 1110在单线程环境下使用hbase的htable是没有问题,但是突然 ... -
hbase之scan的rowkey问题
2012-04-22 15:22 1690最近使用到hbase做存储,发现使用scan的时候,返回的ro ... -
datanode启动开启了那些任务线程
2012-04-22 15:22 1029今天开始分析datanode,首先看看datanode开启了哪 ... -
hadoop监控
2012-04-22 15:21 1562通过从hadoop的 hadoop-metrics文件中就可以 ... -
zookeeper集群配置注意项
2012-04-21 21:32 1096项目中需要使用hbase,故准备在本机搭建hbase,考虑到h ...
相关推荐
NameNode职责
1. Hadoop 2.0 2. 部署在2个Ubuntu上 3. 2个namenode 2个datanode
Hadoop Namenode性能诊断及优化
hdfs的namenode的元数据管理机制,简要画出了元数据管理的流程分析
NameNode及SecondaryNameNode分析
NULL 博文链接:https://snv.iteye.com/blog/1884565
未知原因导致namenode 的fsimage等文件丢失,namenode重启失败的参考解决
hadoop NameNode 源码解析
Hadoop Namenode恢复
详细讲解了Hdfs中NameNode节点的配置,备份和恢复,以及secondNamenode的配置
王家林的“云计算分布式大数据Hadoop实战高手之路---从零开始”的第九讲Hadoop图文训练课程:剖析NameNode和Secondary NameNode的工作机制和流程. 此教程来自于王家林免费发布的3本Hadoop教程:云计算分布式大数据...
首先,我们做个假设,如果存储在NameNode节点的磁盘中,因为经常需要进行随机访问,还有响应客户请求,必然是效率过低。因此,元数据需要存放在内存中。但如果只存在内存中,一旦断电,元数据丢失,整个集群就无法...
最新的hdfs namenode主备安装文档,详细,命令只需要copy执行即可
其源码类图如下图所示:正如上图所示,NameNode和DataNode继承了很多的protocol用于彼此间的通信,其实nameNode还实现了RefreshUserMappingsProtocol和RefreshAuthorizationPolicyProtocol两个协议,用于权限控制和...
今天小编就为大家分享一篇关于Hadoop之NameNode Federation图文详解,小编觉得内容挺不错的,现在分享给大家,具有很好的参考价值,需要的朋友一起跟随小编来看看吧
HDFS体系结构(NameNode、DataNode详解)
在12月1日“Hadoop生态系统”主题分论坛,华为电信与核心网产品线BigData团队的架构师Uma Maheshwara Rao G,负责HDFS项目整体技术开发。对电信领域有深刻理解,从2010年起从事HDFS开发,是HDFS的核心设计人员。...
(1)第一次启动NameNode格式化后,创建Fsimage和Edits文件 (2)客户端对元数据进行增删改的请求 (3)NameNode记录操作日志,更新滚动
HDFS的概念-namenode和datanode.pdf 学习资料 复习资料 教学资源