`

hadoop 单机节点安装

阅读更多

Single Node Setup

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites

Supported Platforms

  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
  • Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

Required Software

Required software for Linux and Windows include:

  1. JavaTM 1.6.x, preferably from Sun, must be installed.
  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Additional requirements for Windows include:

  1. Cygwin - Required for shell support in addition to the required software above.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

For example on Ubuntu Linux:

$ sudo apt-get install ssh 
$ sudo apt-get install rsync

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:

  • openssh - the Net category

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Try the following command:
$ bin/hadoop 
This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory. 
$ mkdir input 
$ cp conf/*.xml input 
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' 
$ cat output/*

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Use the following: 

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>


conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>


conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa 
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output 
$ cat output/*

or

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

分享到:
评论

相关推荐

    Hadoop单机模式的配置与安装

    对于hadoop0.20.2的单机模式安装就需要在一个节点(一台主机)上运行5个节点。 分别是: JobTracker:负责Hadoop的Job任务分发和管理。 TaskTracker:负责在单个节点上执行MapReduce任务。 对于Hadoop里面的HDFS的...

    单节点Hadoop环境配置

    本文档为初次学习大数据的学生提供单节点配置的简单教程,同时简单介绍了Hadoop的分布式文件系统HDFS的一些简单命令。

    hadoop动态增加和删除节点方法介绍

    上一篇文章中我们介绍了Hadoop编程基于MR程序实现倒排索引示例的有关内容,这里我们看看如何在Hadoop中动态地增加和删除节点(DataNode)。 假设集群操作系统均为:CentOS 6.7 x64 Hadoop版本为:2.6.3 一、动态...

    linux下安装hadoop伪分布与完全分布式安装

    Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop Hadoop 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是在单机上拟 伪分布式模是...

    Hadoop实战手册

    安装配置13#Hadoop 文件下载和解压13#配置 hadoop-env.sh 环境变量13#Hadoop Common组件 配置 core-site.xml13#HDFS NameNode,DataNode组建配置 hdfs-site.xml14#配置MapReduce - JobTracker TaskTracker 启动配置15...

    从裸机搭建完全分布式模式Hadoop.pdf

    1.搭建单机模式Hadoop 1.搭建准备工作 (1)关闭防火墙(2)关闭selinux防火墙 2.安装JDK (1)下载jdk (2)上传JDK至服务器 a.下载WinSCP软件 b.建立linux与windows之间文件互传 (4)配置JAVA环境变量3.安装Hadoop ...

    在docker上部署hadoop集群

    教程:在linux虚拟机下(centos),通过docker容器,部署hadoop集群。一个master节点和三个slave节点。

    hadoop集群搭建.docx

    hadoop集群搭建,通过centos来进行平台搭建,主从节点,搭建Hadoop单机环境,安装hadoop,安装HBase单机环境,搭建Hadoop的集群环境

    Hadoop完全分布式配置.txt

    因为不需要与其他节点交互,单机模式就不使用HDFS,也不加载任何Hadoop的守护进程。该模式主要用于开发调试MapReduce程序的应用逻辑。 b. 伪分布模式 Hadoop守护进程运行在本地机器上,模拟一个小规模的的集群。...

    hadoop集群环境搭建

    hadoop单机和集群搭建过程,一共三个节点,很详细,每一步都有截图

    用虚拟机在ubuntu上搭建hadoop平台的单机模式

    该文档可以帮助您完美地搭建hadoop平台的单机模式

    2.Hadoop3.2.1分布式集群安装

    这三台机器的基本配置参考 Hadoop 单机伪分布安装章节中的虚拟机基本环境配置部分。 注意:除了基本配置之外还需要实现主节点能够免密码登陆到所有从节点,因为从节点上面的进程是由主节点通过 ssh 远程启动的。 在 ...

    Hadoop2单机伪分布搭建

    hadoop2 - 虚拟机VMWare - Linux(ubuntu) ,单节点伪分布环境搭建完整手册

    Hadoop大数据平台构建、规划大数据平台集群教学课件.pptx

    hadoop集群只有一个节点,因此hdfs的块复制将限制为单个副本。 在单个节点上运行NameNode、DataNode、JobTracker、TaskTracker、SeconderyNameNode 这5个进程。 完全分布模式 在多个节点上运行。 使用分布式文件系统...

    07hadoop的安装&hdfs集群的配置与测试——好程序

    hadoop单机安装与测试 1•Local (Standalone) Mode(单节点的本地模式)Linux的文件系统就是hadoop的存储系统运行在单个的jvm环境,它使用linux的文件系统,适用于开发、测试、调试环境 运行案例 2•Pseudo-...

    Hadoop2.2.0单机伪分布环境快速搭建手册

    hadoop2 - 虚拟机VMWare - Linux(ubuntu) ,单节点伪分布环境搭建快速手册,同之前上传的doc文档配套。

    Hadoop架构下基于分布式粒子群算法的骨架网络重构方法

    其次,考虑到大规模骨架网络重构问题属于高维优化的范畴,单机版算法求解高维优化问题时计算效率低,为此提出一种基于Hadoop平台的分布式粒子群算法,该算法利用集群的计算和存储能力求解高维问题时能够显著提高计算效率...

    hbase的安装与简单操作.doc

    在所有节点上安装Hadoop。HBase需要在Hadoop的分布式环境下运行,因此Hadoop是必需的。 安装ZooKeeper: 在所有节点上安装ZooKeeper。ZooKeeper用于管理HBase集群的元数据。 安装HBase: 下载并解压HBase安装包到...

    Hadoop分布式集群IP配置和VBox linux虚拟机网卡配置

    Hadoop分布式集群搭建过程中IP配置,包括单机伪分布式和多台机器集群搭建。同时包括VBox linux虚拟机网卡配置

Global site tag (gtag.js) - Google Analytics