`

centos虚拟机上安装运行hadoop(伪分布)

 
阅读更多

1、先在确认能否不输入口令就用ssh登录localhost:

$ ssh localhost


 

如果不输入口令就无法用ssh登陆localhost,执行下面的命令:
[root@localhost ~]# ssh-keygen -t  rsa       (注意-keygen前面没有空格)
然后就回车,O(∩_∩)O哈哈~
日志如下:

[root@localhost ~]# ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: a8:7a:3e:f6:92:85:b8:c7:be:d9:0e:45:9c:d1:36:3b root@localhost.localdomain [root@localhost ~]# [root@localhost ~]# cd .. [root@localhost /]# cd root [root@localhost ~]# ls anaconda-ks.cfg Desktop install.log install.log.syslog [root@localhost ~]# cd .ssh [root@localhost .ssh]# cat id_rsa.pub > authorized_keys [root@localhost .ssh]# [root@localhost .ssh]# ssh localhost The authenticity of host 'localhost (127.0.0.1)' can't be established. RSA key fingerprint is 41:c8:d4:e4:60:71:6f:6a:33:6a:25:27:62:9b:e3:90. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'localhost' (RSA) to the list of known hosts. Last login: Tue Jun 21 22:40:31 2011 [root@localhost ~]#


  


2、解压hadoop
重新创建了一个hadoop用户,解压hadoop

[root@localhost hadoop]# tar zxvf hadoop-0.20.2.tar.gz ...... ...... ...... hadoop-0.20.203.0/src/contrib/ec2/bin/image/create-hadoop-image-remote hadoop-0.20.203.0/src/contrib/ec2/bin/image/ec2-run-user-data hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-cluster hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-master hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-slaves hadoop-0.20.203.0/src/contrib/ec2/bin/list-hadoop-clusters hadoop-0.20.203.0/src/contrib/ec2/bin/terminate-hadoop-cluster [root@localhost hadoop]#


 
3、安装jdk1.6,设置hadoop home
# set java environment export JAVA_HOME=/home/yqf/jdk/jdk1.6.0_13 export HADOOP_HOME=/home/hadoop/hadoop-0.20.2 export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib/:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:



 

 

4、修改、启动hadoop
修改hadoop配置
进入hadoop目录, conf下

#################################### [root@localhost conf]# vi hadoop-env.sh # set java environment export JAVA_HOME=/home/yqf/jdk/jdk1.6.0_13 (你自己的JAVA_HOME) ##################################### [root@localhost conf]# vi core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://namenode:9000/</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/hadooptmp</value> </property> </configuration> ####################################### [root@localhost conf]# vi hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/hdfs/name</value> </property> <property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> ######################################### [root@localhost conf]# vi mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>namenode:9001</value> </property> <property> <name>mapred.local.dir</name> <value>/usr/local/hadoop/mapred/local</value> </property> <property> <name>mapred.system.dir</name> <value>/tmp/hadoop/mapred/system</value> </property> </configuration> ######################################### [root@localhost conf]# vi masters #localhost namenode ######################################### [root@localhost conf]# vi slaves #localhost datanode01


 

 


启动hadoop

 

[root@localhost bin]# hadoop namenode -format 11/06/23 00:43:54 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost.localdomain/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.20.203.0 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011 ************************************************************/ 11/06/23 00:43:55 INFO util.GSet: VM type = 32-bit 11/06/23 00:43:55 INFO util.GSet: 2% max memory = 19.33375 MB 11/06/23 00:43:55 INFO util.GSet: capacity = 2^22 = 4194304 entries 11/06/23 00:43:55 INFO util.GSet: recommended=4194304, actual=4194304 11/06/23 00:43:56 INFO namenode.FSNamesystem: fsOwner=root 11/06/23 00:43:56 INFO namenode.FSNamesystem: supergroup=supergroup 11/06/23 00:43:56 INFO namenode.FSNamesystem: isPermissionEnabled=true 11/06/23 00:43:56 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 11/06/23 00:43:56 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 11/06/23 00:43:56 INFO namenode.NameNode: Caching file names occuring more than 10 times 11/06/23 00:43:57 INFO common.Storage: Image file of size 110 saved in 0 seconds. 11/06/23 00:43:57 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted. 11/06/23 00:43:57 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1 ************************************************************/ [root@localhost bin]# ########################################### [root@localhost bin]# ./start-all.sh starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode- localhost.localdomain.out starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out [root@localhost bin]# jps 11971 TaskTracker 11807 SecondaryNameNode 11599 NameNode 12022 Jps 11710 DataNode 11877 JobTracker


 

 

5、使用自带例子测试hadoop
步骤一:准备输入数据

 

 

 

在当前目录(如hadoop安装目录)下新建文件夹input,并在文件夹下新建两个文件file01、file02,这两个文件内容分别如下
file01中内容为:

 

Hello World Bye World


 
file02中内容为:

Hello Hadoop Goodbye Hadoop


 

 

步骤二:将文件夹input上传到分布式文件系统中

 

cd 到hadoop安装目录,运行下面命令:

 

bin/hadoop fs -put input input01


  


这个命令将input文件夹上传到了hadoop文件系统了,在该系统下就多了一个input01文件夹,你可以使用下面命令查看:

bin/hadoop fs -ls


 


步骤三:运行hadoop mapper reduce

 

 

运行命令:

 

bin/hadoop jar hadoop-*-examples.jar wordcount input01 output2


  

运行日志如下:

 

[root@localhost hadoop-0.20.2]# bin/hadoop jar hadoop-*-examples.jar wordcount input01 output2 12/11/14 22:51:51 INFO input.FileInputFormat: Total input paths to process : 4 12/11/14 22:51:52 INFO mapred.JobClient: Running job: job_201211141815_0003 12/11/14 22:51:53 INFO mapred.JobClient: map 0% reduce 0% ^[[3~12/11/14 22:53:03 INFO mapred.JobClient: map 50% reduce 0% 12/11/14 22:53:07 INFO mapred.JobClient: map 75% reduce 0% ^[[B12/11/14 22:53:12 INFO mapred.JobClient: map 100% reduce 0% ^[[3~12/11/14 22:53:17 INFO mapred.JobClient: map 100% reduce 25% 12/11/14 22:53:31 INFO mapred.JobClient: map 100% reduce 100% 12/11/14 22:53:34 INFO mapred.JobClient: Job complete: job_201211141815_0003 12/11/14 22:53:34 INFO mapred.JobClient: Counters: 17 12/11/14 22:53:34 INFO mapred.JobClient: Job Counters 12/11/14 22:53:34 INFO mapred.JobClient: Launched reduce tasks=1 12/11/14 22:53:34 INFO mapred.JobClient: Launched map tasks=4 12/11/14 22:53:34 INFO mapred.JobClient: Data-local map tasks=2 12/11/14 22:53:34 INFO mapred.JobClient: FileSystemCounters 12/11/14 22:53:34 INFO mapred.JobClient: FILE_BYTES_READ=79 12/11/14 22:53:34 INFO mapred.JobClient: HDFS_BYTES_READ=55 12/11/14 22:53:34 INFO mapred.JobClient: FILE_BYTES_WRITTEN=304 12/11/14 22:53:34 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41 12/11/14 22:53:34 INFO mapred.JobClient: Map-Reduce Framework 12/11/14 22:53:34 INFO mapred.JobClient: Reduce input groups=5 12/11/14 22:53:34 INFO mapred.JobClient: Combine output records=6 12/11/14 22:53:34 INFO mapred.JobClient: Map input records=2 12/11/14 22:53:34 INFO mapred.JobClient: Reduce shuffle bytes=97 12/11/14 22:53:34 INFO mapred.JobClient: Reduce output records=5 12/11/14 22:53:34 INFO mapred.JobClient: Spilled Records=12 12/11/14 22:53:34 INFO mapred.JobClient: Map output bytes=82 12/11/14 22:53:34 INFO mapred.JobClient: Combine input records=8 12/11/14 22:53:34 INFO mapred.JobClient: Map output records=8 12/11/14 22:53:34 INFO mapred.JobClient: Reduce input records=6


 

查看文件,多了一个output2。
[root@localhost hadoop-0.20.2]# bin/hadoop fs -ls
Found 2 items
drwxr-xr-x   - root supergroup          0 2012-11-14 22:41 /user/root/input01
drwxr-xr-x   - root supergroup          0 2012-11-14 22:53 /user/root/output2

 

 


查看output2/下面的内容
[root@localhost hadoop-0.20.2]# bin/hadoop fs -cat output2/*
Bye     1
Goodbye 1
Hadoop  2
Hello   2
World   2

 

 

 

 

wordcount应该是计算输入里面出现单词的个数。

 

 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics