open the bin/hadoop file,you will see the there is a config file to load:
either libexec/hadoop-config.sh or bin/hadoop-config.sh
and the previor is loaded if exists,else the load the later.
you will see the HADOOP_HOME is same as HADOOP_PREFIX at last:
export HADOOP_HOME=${HADOOP_PREFIX}
ok,now start to have a grance of shell-starting flow of distributed-mode:
namenode format -> start-dfs -> start-mapred
step 1-namenode format
appropriate cmd is: "hadoop namenode -format",and the related class entry is:
org.apache.hadoop.hdfs.server.namenode.NameNode
well ,what is NameNode(NN) responsible for? description copied from code:
* NameNode serves as both directory namespace manager and
* "inode table " for the Hadoop DFS. There is a single NameNode
* running in any DFS deployment. (Well, except when there
* is a second backup/failover NameNode.)
*
* The NameNode controls two critical tables:
* 1) filename->blocksequence (namespace )
* 2) block->machinelist ("inodes ")
*
* The first table is stored on disk and is very precious.
* The second table is rebuilt every time the NameNode comes
* up.
*
* 'NameNode' refers to both this class as well as the 'NameNode server'.
* The 'FSNamesystem' class actually performs most of the filesystem
* management. The majority of the 'NameNode' class itself is concerned
* with exposing the IPC interface and the http server to the outside world,
* plus some configuration management.
*
* NameNode implements the ClientProtocol interface, which allows
* clients to ask for DFS services. ClientProtocol is not
* designed for direct use by authors of DFS client code. End-users
* should instead use the org.apache.nutch.hadoop.fs.FileSystem class.
*
* NameNode also implements the DatanodeProtocol interface, used by
* DataNode programs that actually store DFS data blocks. These
* methods are invoked repeatedly and automatically by all the
* DataNodes in a DFS deployment.
*
* NameNode also implements the NamenodeProtocol interface, used by
* secondary namenodes or rebalancing processes to get partial namenode's
* state, for example partial blocksMap etc.
the formated files list are here:
hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/mapred/local/
hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/dfs/name/current/
-rw-r--r-- 1 hadoop hadoop 4 2012-05-01 15:41 edits
-rw-r--r-- 1 hadoop hadoop 2474 2012-05-01 15:41 fsimage
-rw-r--r-- 1 hadoop hadoop 8 2012-05-01 15:41 fstime
-rw-r--r-- 1 hadoop hadoop 100 2012-05-01 15:41 VERSION
hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/dfs/name/image/
-rw-r--r-- 1 hadoop hadoop 157 2012-05-01 15:41 fsimage
ok.let's to see what does these files to keep.
edits: FSEditLog maintains a log of the namespace modifications .(same as transactional logs)
(these files belong to FSImage listed below)
fsimage : FSImage handles checkpointing and logging of the namespace edits .
fstime : keep last checkpoint time
VERSION: File VERSION
contains the following fields:
- node type
- layout version
- namespaceID
- fs state creation time
- other fields specific for this node type
The version file is always written last during storage directory updates. The existence of the version file indicates that all other files have been successfully written in the storage directory, the storage is valid and does not need to be recovered.
a dir named 'previous.checkpoint ' wil be occured when :
* previous.checkpoint is a directory, which holds the previous
* (before the last save) state of the storage directory .
* The directory is created as a reference only, it does not play role
* in state recovery procedures, and is recycled automatically,
* but it may be useful for manual recovery of a stale state of the system.
content like this:
hadoop@leibnitz-laptop:/cc$ ll data/hadoop/hadoop-1.0.1/cluster-hadoop/dfs/name/previous.checkpoint/
-rw-r--r-- 1 hadoop hadoop 293 2012-04-25 02:26 edits
-rw-r--r-- 1 hadoop hadoop 2934 2012-04-25 02:26 fsimage
-rw-r--r-- 1 hadoop hadoop 8 2012-04-25 02:26 fstime
-rw-r--r-- 1 hadoop hadoop 100 2012-04-25 02:26 VERSION
yes, i found a import class named "Lease" which will do as:
A Lease governs all the locks held by a single client.
* For each client there's a corresponding lease , whose
* timestamp is updated when the client periodically
* checks in. If the client dies and allows its lease to
* expire, all the corresponding locks can be released.
相关推荐
windows下搭建nutch会遇到Hadoop下FileUtil.java问题,所以我们一般的做法是找到Hadoop-core-1.2.0源码中的org.apache.hadoop.fs下的FileUtil.java修改其中的CheckReturnValue方法,注释掉其中的内容这时运行会遇到...
开发方式:shell、vim、IDE(idea) 项目:推荐系统----模板,融会贯通(检索、反作弊、预测) 重点:架构思维,思考方式,解决方法。 在正式介绍MR之前,先铺垫一些Hadoop生态圈组件,如图所示,这些组件从下到上看,...
第一天 hadoop的基本概念 伪分布式hadoop集群安装 hdfs mapreduce 演示 01-hadoop职位需求状况.avi 02-hadoop课程安排.avi 03-hadoop应用场景.avi 04-hadoop对海量数据处理的解决思路.avi 05-hadoop版本选择和...
第一天 hadoop的基本概念 伪分布式hadoop集群安装 hdfs mapreduce 演示 01-hadoop职位需求状况.avi 02-hadoop课程安排.avi 03-hadoop应用场景.avi 04-hadoop对海量数据处理的解决思路.avi 05-hadoop版本选择和...
026 Eclipse导入Hadoop源码项目 027 HDFS 设计目标 028 HDFS 文件系统架构概述 029 HDFS架构之NameNode和DataNode 030 HDFS 架构讲解总结 031 回顾NameNode和DataNode 032 HDFS架构之Client和SNN功能 033 HDFS Shell...
NULL 博文链接:https://huanglz19871030.iteye.com/blog/1518589
Hadoop 培训课程(2)HDFS 分布式文件系统与HDFS HDFS体系结构与基本概念*** HDFS的shell操作*** java接口及常用api*** ---------------------------加深拓展---------------------- RPC调用** HDFS的分布式存储架构的...
用eclipse本地提交Hadoop任务(如WordCount)到服务器上跑的时候,会报错: Stack trace: ExitCodeException...这是hadoop本身的一个bug,可以通过修改NativeIO和YARNRunner的源码并替换解决。这是这两个.java的zip包。
如果是日志服务器数据较小、压力较小,可以直接使用shell命令把数据上传到HDFS中; 如果是日志服务器数据较大、压力较答,使用NFS在另一台服务器上上传数据; 如果日志服务器非常多、数据量大,使用flume进行数据...
一个自己写的Hadoop MapReduce实例源码,网上看到不少网友在学习MapReduce编程,但是除了wordcount范例外实例比较少,故上传自己的一个。包含完整实例源码,编译配置文件,测试数据,可执行jar文件,执行脚本及操作...
word源码java 简述作业流程控制 杨周涵 () (Version 1.0) 概述 简而言之,它是一个基于 Python 的元编程库,旨在管理具有各种类型任务的复杂工作流,例如 Hadoop(本地、Amazon EMR 或 Qubole)、Java 进程和 shell ...
flink 1.6.0 源码 <!-- Dummy module to force execution of the Maven Shade plugin (see Shade plugin below) --> <module>tools/force-shading <module>flink-annotations <module>flink-shaded-...
hive-exec-1.2.1.spark2.jar spark2-shell 支持 hive2 hadoop3
**在操作过程中,如果遇到权限相关的问题,基本上在代码前面加sudo就可以解决(这样可以暂时获取权限)** ### 1.1 Hadoop用户的创建 如果你安装 CentOS 的时候不是用的 “hadoop” 用户,那么需要增加一个名为 ...
基于Hadoop和MapReduce实现的朴素贝叶斯分类器源码+项目说明.zip 环境搭建 搭建 `Hadoop` 环境,本项目在Mac系统上搭建的 `hadoop-2.8.5`环境下完成。 数据集说明 数据集在项目文件`\Data\NBCorpus\`中 包含二个子...
hadoop2.2.0 在window下进行...源码org.apache.hadoop.util.Shell.java 的277行fullName的路径如: String fullExeName = "d:/hadoop" + File.separator + executable; 然后把winutils.exe放入到d:/hadoop目录下
包含,Ubuntu服务器创建、远程工具连接配置、Ubuntu服务器配置、Hadoop文件配置、Hadoop格式化、启动。(首更时间2016年10月27日) Hadoop-Configure-配置文件 core-site.xml hadoop-env.sh hdfs-site.xml mapred-...
1.3.4 Datanode启动、心跳以及执行名字节点指令流程 26 1.3.5 HA切换流程 27 第2章 Hadoop RPC 29 2.1 概述 29 2.1.1 RPC框架概述 29 2.1.2 Hadoop RPC框架概述 30 2.2 Hadoop RPC的使用 36 2.2.1 ...
Day2 介绍HDFS体系结构及shell、java操作方式 Day3 介绍MapReduce体系结构(1) Day4 介绍MapReduce体系结构(2) Day5 介绍Hadoop集群、zookeeper操作 Day6 介绍HBase体系结构及基本操作 Day7 介绍Hive、sqoop体系结构...
java8 集合源码分析 BigData 第一阶段 JavaSE基础 MySQL JDBC JavaWeb Redis Git Shell Hadoop