Single Node Setup
Purpose
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
Prerequisites
Supported Platforms
- GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
- Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
Required Software
Required software for Linux and Windows include:
- JavaTM 1.6.x, preferably from Sun, must be installed.
- ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Additional requirements for Windows include:
- Cygwin - Required for shell support in addition to the required software above.
Installing Software
If your cluster doesn't have the requisite software you will need to install it.
For example on Ubuntu Linux:
$ sudo apt-get install ssh
$ sudo apt-get install rsync
On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:
- openssh - the Net category
Download
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
- Local (Standalone) Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode
Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration
Use the following:
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ bin/stop-all.sh
相关推荐
对于hadoop0.20.2的单机模式安装就需要在一个节点(一台主机)上运行5个节点。 分别是: JobTracker:负责Hadoop的Job任务分发和管理。 TaskTracker:负责在单个节点上执行MapReduce任务。 对于Hadoop里面的HDFS的...
本文档为初次学习大数据的学生提供单节点配置的简单教程,同时简单介绍了Hadoop的分布式文件系统HDFS的一些简单命令。
上一篇文章中我们介绍了Hadoop编程基于MR程序实现倒排索引示例的有关内容,这里我们看看如何在Hadoop中动态地增加和删除节点(DataNode)。 假设集群操作系统均为:CentOS 6.7 x64 Hadoop版本为:2.6.3 一、动态...
Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是...
安装配置13#Hadoop 文件下载和解压13#配置 hadoop-env.sh 环境变量13#Hadoop Common组件 配置 core-site.xml13#HDFS NameNode,DataNode组建配置 hdfs-site.xml14#配置MapReduce - JobTracker TaskTracker 启动配置15...
1.搭建单机模式Hadoop 1.搭建准备工作 (1)关闭防火墙(2)关闭selinux防火墙 2.安装JDK (1)下载jdk (2)上传JDK至服务器 a.下载WinSCP软件 b.建立linux与windows之间文件互传 (4)配置JAVA环境变量3.安装Hadoop ...
教程:在linux虚拟机下(centos),通过docker容器,部署hadoop集群。一个master节点和三个slave节点。
hadoop集群搭建,通过centos来进行平台搭建,主从节点,搭建Hadoop单机环境,安装hadoop,安装HBase单机环境,搭建Hadoop的集群环境
因为不需要与其他节点交互,单机模式就不使用HDFS,也不加载任何Hadoop的守护进程。该模式主要用于开发调试MapReduce程序的应用逻辑。 b. 伪分布模式 Hadoop守护进程运行在本地机器上,模拟一个小规模的的集群。...
hadoop单机和集群搭建过程,一共三个节点,很详细,每一步都有截图
该文档可以帮助您完美地搭建hadoop平台的单机模式
这三台机器的基本配置参考 Hadoop 单机伪分布安装章节中的虚拟机基本环境配置部分。 注意:除了基本配置之外还需要实现主节点能够免密码登陆到所有从节点,因为从节点上面的进程是由主节点通过 ssh 远程启动的。 在 ...
hadoop2 - 虚拟机VMWare - Linux(ubuntu) ,单节点伪分布环境搭建完整手册
hadoop集群只有一个节点,因此hdfs的块复制将限制为单个副本。 在单个节点上运行NameNode、DataNode、JobTracker、TaskTracker、SeconderyNameNode 这5个进程。 完全分布模式 在多个节点上运行。 使用分布式文件系统...
hadoop单机安装与测试 1•Local (Standalone) Mode(单节点的本地模式)Linux的文件系统就是hadoop的存储系统运行在单个的jvm环境,它使用linux的文件系统,适用于开发、测试、调试环境 运行案例 2•Pseudo-...
hadoop2 - 虚拟机VMWare - Linux(ubuntu) ,单节点伪分布环境搭建快速手册,同之前上传的doc文档配套。
其次,考虑到大规模骨架网络重构问题属于高维优化的范畴,单机版算法求解高维优化问题时计算效率低,为此提出一种基于Hadoop平台的分布式粒子群算法,该算法利用集群的计算和存储能力求解高维问题时能够显著提高计算效率...
在所有节点上安装Hadoop。HBase需要在Hadoop的分布式环境下运行,因此Hadoop是必需的。 安装ZooKeeper: 在所有节点上安装ZooKeeper。ZooKeeper用于管理HBase集群的元数据。 安装HBase: 下载并解压HBase安装包到...
Hadoop分布式集群搭建过程中IP配置,包括单机伪分布式和多台机器集群搭建。同时包括VBox linux虚拟机网卡配置