`

Hadoop

 
阅读更多
开始安装Hadoop

下载Hadoop
http://apache.01link.hk/hadoop/common/stable/

Cygwin 安装
http://v-lad.org/Tutorials/Hadoop/03%20-%20Prerequistes.html

非常靠谱的hadoop install安装过程
http://mysolvedproblem.blogspot.hk/2012/05/installing-hadoop-on-ubuntu-linux-on.html

1. Installing Sun JDK 1.6: Installing JDK is a required step to install Hadoop. You can follow the steps in my previous post.


2. Adding a dedicated Hadoop system user: You will need a user for hadoop system you will install. To create a new user "hduser" in a group called "hadoop", run the following commands in your terminal:

$sudo addgroup hadoop
$sudo adduser --ingroup hadoop hduser

3.Configuring SSH: in Michael Blog, he assumed that the SSH is already installed. But if you didn't install SSH server before, you can run the following command in your terminal: By this command, you will have installed ssh server on your machine, the port is 22 by default.


$sudo apt-get install openssh-server
We have installed SSH because Hadoop requires access to localhost (in case single node cluster) or    communicates with remote nodes (in case multi-node cluster).After this step, you will need to generate SSH key for hduser (and the users you need to administer Hadoop if any) by running the following commands, but you need first to switch to hduser:

$su - hduser
$ssh-keygen -t rsa -P ""

To be sure that SSH installation is went well, you can open a new terminal and try to create ssh session using hduser by the following command:

$ssh localhost



不是必须
4. Disable IPv6: You will need to disable IP version 6 because Ubuntu is using 0.0.0.0 IP for different Hadoop configurations. You will need to run the following commands using a root account:
$sudo gedit /etc/sysctl.conf
This command will open sysctl.conf in text editor, you can copy the following lines at the end of the file:


#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1


You can save the file and close it. If you faced a problem telling you don't have permissions, just remember to run the previous commands by your root account.

These steps required you to reboot your system, but alternatively, you can run the following command to re-initialize the configurations again.

$sudo sysctl -p

To make sure that IPV6 is disabled, you can run the following command:

$cat /proc/sys/net/ipv6/conf/all/disable_ipv6

The printed value should be 1, which means that is disabled.



Installing Hadoop


Now we can download Hadoop to begin installation. Go to Apache Downloads and download Hadoop version 0.20.2. To overcome the security issues, you can download the tar file in hduser directory, for example, /home/hduser. Check the following snapshot:



Then you need to extract the tar file and rename the extracted folder to 'hadoop'. Open a new terminal and run the following command:


$ cd /home/hduser
$ sudo tar xzf hadoop-0.20.2.tar.gz
$ sudo mv hadoop-0.20.2 hadoop


Please note if you want to grant access for another hadoop admin user (e.g. hduser2), you have to grant read permission to folder /home/hduser using the following command:


sudo chown -R hduser2:hadoop hadoop


Update $HOME/.bashrc


You will need to update the .bachrc for hduser (and for every user you need to administer Hadoop). To open .bachrc file, you will need to open it as root:


$sudo gedit /home/hduser/.bashrc


Then you will add the following configurations at the end of .bachrc file




# Set Hadoop-related environment variables
export HADOOP_HOME=/home/hduser/hadoop


# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun


# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"


# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}


# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin


Hadoop Configuration

Now, we need to configure Hadoop framework on Ubuntu machine. The following are configuration files we can use to do the proper configuration. To know more about hadoop configurations, you can visit this site


hadoop-env.sh


We need only to update the JAVA_HOME variable in this file. Simply you will open this file using a text editor using the following command:


$sudo gedit /home/hduser/hadoop/conf/hadoop-env.sh


Then you will need to change the following line


# export JAVA_HOME=/usr/lib/j2sdk1.5-sun


To


export JAVA_HOME=/usr/lib/jvm/java-6-sun


Note: if you faced "Error: JAVA_HOME is not set" Error while starting the services, then you seems that you forgot toe uncomment the previous line (just remove #).




core-site.xml
First, we need to create a temp directory for Hadoop framework. If you need this environment for testing or a quick prototype (e.g. develop simple hadoop programs for your personal test ...), I suggest to create this folder under /home/hduser/ directory, otherwise, you should create this folder in a shared place under shared folder (like /usr/local ...) but you may face some security issues. But to overcome the exceptions that may caused by security (like java.io.IOException), I have created the tmp folder under hduser space.


To create this folder, type the following command:


$ sudo mkdir  /home/hduser/tmp


Please note that if you want to make another admin user (e.g. hduser2 in hadoop group), you should grant him a read and write permission on this folder using the following commands:




$ sudo chown hduser2:hadoop /home/hduser/tmp
$ sudo chmod 755 /home/hduser/tmp
Now, we can open hadoop/conf/core-site.xml to edit the hadoop.tmp.dir entry.
We can open the core-site.xml using text editor:


$sudo gedit /home/hduser/hadoop/conf/core-site.xml


Then add the following configurations between <configuration> .. </configuration> xml elements:


<!-- In: conf/core-site.xml -->
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/tmp</value>
  <description>A base for other temporary directories.</description>
</property>


<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>



mapred-site.xml
We will open the hadoop/conf/mapred-site.xml using a text editor and add the following configuration values (like core-site.xml)


<!-- In: conf/mapred-site.xml -->
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>



hdfs-site.xml
Open hadoop/conf/hdfs-site.xml using a text editor and add the following configurations:


<!-- In: conf/hdfs-site.xml -->
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>



Formatting NameNode


You should format the NameNode in your HDFS. You should not do this step when the system is running. It is usually done once at first time of your installation.
Run the following command


$/home/hduser/hadoop/bin/hadoop namenode -format


Starting Hadoop Cluster


You will need to navigate to hadoop/bin directory and run ./start-all.sh script.


There is a nice tool called jps. You can use it to ensure that all the services are up.


Running an Example (Pi Example)
There are many built-in examples. We can run PI estimator example using the following command:


hduser@ubuntu:~/hadoop/bin$ hadoop jar ../hadoop-0.20.2-examples.jar pi 3 10


If you faced "Incompatible namespaceIDs" Exception you can do the following:


1.  Stop all the services (by calling ./stop-all.sh).
2.  Delete /tmp/hadoop/dfs/data/*
3.  Start all the services.

安装结束


名词:
------ Hadoop namenode
/hadoop/dfs/name is in an inconsistent state .

http://blog.csdn.net/limiteewaltwo/article/details/8565523


conf/core-site.xml
<property> 
   <name>hadoop.tmp.dir</name> 
   <value>/home/yangys/hadoop-1.1.2/temp</value>
</property>

<value>的值根据自己的情况调整, 不要使用默认的。

重启hadoop

bin/stop-all.sh 
bin/hadoop namenode -format 
bin/start-all.sh 
bin/start-all.sh问题解决。namenode已经可以启动了。


----- install hadoop

http://blog.csdn.net/shirdrn/article/details/5781776

http://blog.csdn.net/shirdrn/article/details/5781776

http://blog.csdn.net/shirdrn/article/details/5781776


---- org.apache.hadoop.security.AccessControlException: Permission denied: user=xxj, .

error:org.apache.oozie.action.ActionExecutorException: JA002: org.apache.hadoop.security.AccessControlException: Permission denied: user=xxj, access=WRITE, inode="user":hadoop:supergroup:rwxr-xr-x

sulution:added this entry to conf/hdfs-site.xml

<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
注意: dfs.permissions
另外如果出现权限问题,也可以通过查看日志信息,然后改变相应目录的权限实现

在仔细查看了/tmp/hadoop-rsync文件夹下面的目录结构后,发现了问题的关键所在:
[hadoop-user1@oser-624 hadoop-0.20.203.0]$ bin/hadoop fs -ls /tmp/hadoop-hadoop-user1/mapred/staging
Found 2 items
drwx------  - hadoop-user1    supergroup          0 2011-10-19 18:18 /tmp/hadoop-hadoop-user1/mapred/staging/hadoop-user1
drwx------  - hadoop-user2 supergroup          0 2011-10-27 18:38 /tmp/hadoop-hadoop-user1/mapred/staging/hadoop-user2
  原来不同用户提交的作业是在 /tmp/hadoop-hadoop-user1/mapred/staging/目录下以用户名区分,而之前的修改是直接使用-R选项直接修改/tmp/hadoop-rsync目录下的所有权限导致的错误,执行以下权限修改命令:
[hadoop-user1@oser-624 hadoop-0.20.203.0]$ bin/hadoop fs -chmod 777  /tmp/hadoop-hadoop-user1/mapred/
[hadoop-user1@oser-624 hadoop-0.20.203.0]$ bin/hadoop fs -chmod 777  /tmp/hadoop-hadoop-user1/mapred/staging
[hadoop-user1@oser-624 hadoop-0.20.203.0]$ bin/hadoop fs -chmod 777  /tmp/hadoop-hadoop-user1/
[hadoop-user1@oser-624 hadoop-0.20.203.0]$ bin/hadoop fs -chmod 777  /tmp
  hive查询正常。

    最近又遇上了这个问题,但是按照上面的方法修改后没能解决,于是查看namenode的日志:

2011-11-29 15:57:09,921 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call mkdirs(/opt/data/hive-zhaoxiuxiang/hive_2011-11-29_15-57-08_094_4199830510252920639, rwxr-xr-x) from 192.168.1.187:18457: error: org.apache.hadoop.security.AccessControlException: Permission denied: user=zhaoxiuxiang, access=WRITE, inode="data":rsync:supergroup:rwxr-xr-x

发现原来出现权限错误的是目录:/opt/data ,将该目录权限修改为777后错误解决。

附注:不同用户的mapred执行目录必须是700(rwx------)权限

参考:
http://www.linuxidc.com/Linux/2012-12/76703.htm
http://coderbase64.iteye.com/blog/2077697
分享到:
评论

相关推荐

    hadoop2.7.3 hadoop.dll

    在windows环境下开发hadoop时,需要配置HADOOP_HOME环境变量,变量值D:\hadoop-common-2.7.3-bin-master,并在Path追加%HADOOP_HOME%\bin,有可能出现如下错误: org.apache.hadoop.io.nativeio.NativeIO$Windows....

    Hadoop权威指南 中文版

    本书从hadoop的缘起开始,由浅入深,结合理论和实践,全方位地介绍hado叩这一高性能处理海量数据集的理想工具。全书共14章,3个附录,涉及的主题包括:haddoop简介:mapreduce简介:hadoop分布式文件系统;hadoop的i...

    Hadoop下载 hadoop-2.9.2.tar.gz

    Hadoop 是一个处理、存储和分析海量的分布式、非结构化数据的开源框架。最初由 Yahoo 的工程师 Doug Cutting 和 Mike Cafarella Hadoop 是一个处理、存储和分析海量的分布式、非结构化数据的开源框架。最初由 Yahoo...

    Hadoop下载 hadoop-3.3.3.tar.gz

    Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进 Hadoop是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不...

    《Hadoop大数据开发实战》教学教案—01初识Hadoop.pdf

    《Hadoop大数据开发实战》教学教案—01初识Hadoop.pdf《Hadoop大数据开发实战》教学教案—01初识Hadoop.pdf《Hadoop大数据开发实战》教学教案—01初识Hadoop.pdf《Hadoop大数据开发实战》教学教案—01初识Hadoop.pdf...

    hadoop-3.3.4 版本(最新版)

    Apache Hadoop (hadoop-3.3.4.tar.gz)项目为可靠、可扩展的分布式计算开发开源软件。官网下载速度非常缓慢,因此将hadoop-3.3.4 版本放在这里,欢迎大家来下载使用! Hadoop 架构是一个开源的、基于 Java 的编程...

    hadoop2.8.5-windows本地开发

    内容概要:windows环境下添加snappy源码,并对2.8.5的apache版本hadoop包进行编译,生成指定的hadoop.dll、snappy.dll文件,方便Windows环境下利用idea工具进行Hadoop、Spark的local模式下代码调试。 版本更新:...

    hadoop winutils 多个版本最全大合集dll,winutils.exe

    hadoop的hadoop.dll和winutils.exe “ Could not locate executable null\bin\winutils.exe in the Hadoop binaries”解决方法:把winutils.exe加入你的hadoop-x.x.x/bin下 包含hadoop.dll, winutils.exe hadoop-...

    Hadoop开发环境的插件hadoop-eclipse-plugin-2.10.1

    Hadoop Eclipse是Hadoop开发环境的插件,用户在创建Hadoop程序时,Eclipse插件会自动导入Hadoop编程接口的jar文件,这样用户就可以在Eclipse插件的图形界面中进行编码、调试和运行Hadop程序,也能通过Eclipse插件...

    Cloudera Hadoop 5&amp;Hadoop高阶管理及调优课程

    Cloudera Hadoop 5&Hadoop高阶管理及调优课程,完整版,提供课件代码资料下载。 内容简介 本教程针对有一定Hadoop基础的学员,深入讲解如下方面的内容: 1、Hadoop2.0高阶运维,包括Hadoop节点增加删除、HDFS和...

    hadoop最新版本3.1.1全量jar包

    hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...

    Hadoop集群程序设计与开发

    《Hadoop集群程序设计与开发(数据科学与大数据技术专业系列规划教材)》系统地介绍了基于Hadoop的大数据处理和系统开发相关技术,包括初识Hadoop、Hadoop基础知识、Hadoop开发环境配置与搭建、Hadoop分布式文件系统、...

    hadoop-3.1.3安装包

    Hadoop实现了一个分布式文件系统(Hadoop Distributed File System),简称HDFS。HDFS有高容错性的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问应用程序的数据,适合...

    hadoop配置资源 ,hadoop-3.0.0,hadoop.dll,winutils

    调用保存文件的算子,需要配置Hadoop依赖 将文件夹中的 hadoop-3.0.0 解压到电脑任意位置 在Python代码中使用os模块配置:os.environ[‘HADOOP_HOME’] = ‘HADOOP解压文件夹路径’ winutils.exe,并放入Hadoop解压...

    部署全分布模式Hadoop集群 实验报告

    部署全分布模式Hadoop集群 实验报告一、实验目的 1. 熟练掌握 Linux 基本命令。 2. 掌握静态 IP 地址的配置、主机名和域名映射的修改。 3. 掌握 Linux 环境下 Java 的安装、环境变量的配置、Java 基本命令的使用。 ...

    Hadoop The Definitive Guide PDF

    书名:Hadoop The Definitive Guide 语言:英文 The rest of this book is organized as follows. Chapter 2 provides an introduction to MapReduce. Chapter 3 looks at Hadoop filesystems, and in particular ...

    Hadoop.Essentials.1784396680

    Title: Hadoop Essentials Author: Shiva Achari Length: 172 pages Edition: 1 Language: English Publisher: Packt Publishing Publication Date: 2015-04-24 ISBN-10: 1784396680 ISBN-13: 9781784396688 Delve ...

    基于Hadoop图书推荐系统源码+数据库.zip

    基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书推荐系统源码+数据库.zip基于Hadoop图书...

    hadoop-lzo-0.4.20.jar

    hadoop2 lzo 文件 ,编译好的64位 hadoop-lzo-0.4.20.jar 文件 ,在mac 系统下编译的,用法:解压后把hadoop-lzo-0.4.20.jar 放到你的hadoop 安装路径下的lib 下,把里面lib/Mac_OS_X-x86_64-64 下的所有文件 拷到 ...

    hadoop-eclipse-plugin-2.10.0.jar

    Eclipse集成Hadoop2.10.0的插件,使用`ant`对hadoop的jar包进行打包并适应Eclipse加载,所以参数里有hadoop和eclipse的目录. 必须注意对于不同的hadoop版本,` HADDOP_INSTALL_PATH/share/hadoop/common/lib`下的jar包...

Global site tag (gtag.js) - Google Analytics