1 整个过程视频教程:http://v.youku.com/v_show/id_XMzc5MzM1NDQw.html
下载地址:http://pan.baidu.com/share/link?shareid=211927&uk=1678594189
2 cygwin的下载网址:http://www.cygwin.com
3 cygwin的vim设置:http://blog.163.com/xjx_user/blog/static/21493137720130104037220/
注意".vimrc" 放在自己的目录下 首先通过cd ~ 切换到自己的目录 然以后vi .vimrc 然后设置
截图:
打开.c文件后为:
4 Cygwin下运行ssh-host-config(安全外壳协议,secureshell 加密后传输 一般的ftp,pop telnet是没有加密的)参考网址
http://blog.sina.com.cn/s/blog_62adf3670101c0bw.html
http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852201.html
登录ssh方式为:ssh localhost 就可以使用who命令了。
5 cygin上安装gcc工具链:http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852204.html
注意,一般下载与安装要分开重做一遍。否则容易出错。即使下载完全也可能提示出错。
6 hadoop下载地址:http://www.apache.org/dist/hadoop/core/
7 在eclipse中配置hadoop插件:
http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852205.html
8 windows7下eclipse与hadoop连接时产生的没有权限需要更改的文件hadoop-core-1.0.4.jar
网址:http://download.csdn.net/download/snow_eagle_howard/4842134
免费下载地址:http://pan.baidu.com/share/link?shareid=211924&uk=1678594189
9 hadoop启动的代码:到hadoop目录下 ./start-all.sh 然后就可以在bin目录下运行./hadoop dfsadmin -report
10 wordcount的代码:http://www.cnblogs.com/xjx-user/archive/2013/01/09/2852205.html
11 wordcount个人运行结果:
注意 运行前要在cygwin下先启动hadoop 同时保证cygwin服务已启动 同时保证ssh可用 如果之前已经有输出文件 output/1目录已经存在 要先删除
13/01/09 01:26:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 13/01/09 01:26:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/01/09 01:26:13 INFO input.FileInputFormat: Total input paths to process : 5 13/01/09 01:26:14 WARN snappy.LoadSnappy: Snappy native library not loaded 13/01/09 01:26:14 INFO mapred.JobClient: Running job: job_local_0001 13/01/09 01:26:14 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/01/09 01:26:14 INFO mapred.MapTask: io.sort.mb = 100 13/01/09 01:26:14 INFO mapred.MapTask: data buffer = 79691776/99614720 13/01/09 01:26:14 INFO mapred.MapTask: record buffer = 262144/327680 13/01/09 01:26:14 INFO mapred.MapTask: Starting flush of map output 13/01/09 01:26:14 INFO mapred.MapTask: Finished spill 0 13/01/09 01:26:14 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 13/01/09 01:26:15 INFO mapred.JobClient: map 0% reduce 0% 13/01/09 01:26:17 INFO mapred.LocalJobRunner: 13/01/09 01:26:17 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done. 13/01/09 01:26:17 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/01/09 01:26:17 INFO mapred.MapTask: io.sort.mb = 100 13/01/09 01:26:17 INFO mapred.MapTask: data buffer = 79691776/99614720 13/01/09 01:26:17 INFO mapred.MapTask: record buffer = 262144/327680 13/01/09 01:26:17 INFO mapred.MapTask: Starting flush of map output 13/01/09 01:26:17 INFO mapred.MapTask: Finished spill 0 13/01/09 01:26:17 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting 13/01/09 01:26:18 INFO mapred.JobClient: map 100% reduce 0% 13/01/09 01:26:20 INFO mapred.LocalJobRunner: 13/01/09 01:26:20 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done. 13/01/09 01:26:20 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/01/09 01:26:20 INFO mapred.MapTask: io.sort.mb = 100 13/01/09 01:26:20 INFO mapred.MapTask: data buffer = 79691776/99614720 13/01/09 01:26:20 INFO mapred.MapTask: record buffer = 262144/327680 13/01/09 01:26:20 INFO mapred.MapTask: Starting flush of map output 13/01/09 01:26:20 INFO mapred.MapTask: Finished spill 0 13/01/09 01:26:20 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting 13/01/09 01:26:23 INFO mapred.LocalJobRunner: 13/01/09 01:26:23 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done. 13/01/09 01:26:23 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/01/09 01:26:23 INFO mapred.MapTask: io.sort.mb = 100 13/01/09 01:26:23 INFO mapred.MapTask: data buffer = 79691776/99614720 13/01/09 01:26:23 INFO mapred.MapTask: record buffer = 262144/327680 13/01/09 01:26:23 INFO mapred.MapTask: Starting flush of map output 13/01/09 01:26:23 INFO mapred.MapTask: Finished spill 0 13/01/09 01:26:23 INFO mapred.Task: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting 13/01/09 01:26:26 INFO mapred.LocalJobRunner: 13/01/09 01:26:26 INFO mapred.Task: Task 'attempt_local_0001_m_000003_0' done. 13/01/09 01:26:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/01/09 01:26:26 INFO mapred.MapTask: io.sort.mb = 100 13/01/09 01:26:26 INFO mapred.MapTask: data buffer = 79691776/99614720 13/01/09 01:26:26 INFO mapred.MapTask: record buffer = 262144/327680 13/01/09 01:26:26 INFO mapred.MapTask: Starting flush of map output 13/01/09 01:26:26 INFO mapred.MapTask: Finished spill 0 13/01/09 01:26:26 INFO mapred.Task: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting 13/01/09 01:26:29 INFO mapred.LocalJobRunner: 13/01/09 01:26:29 INFO mapred.Task: Task 'attempt_local_0001_m_000004_0' done. 13/01/09 01:26:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null 13/01/09 01:26:29 INFO mapred.LocalJobRunner: 13/01/09 01:26:29 INFO mapred.Merger: Merging 5 sorted segments 13/01/09 01:26:29 INFO mapred.Merger: Down to the last merge-pass, with 5 segments left of total size: 2065 bytes 13/01/09 01:26:29 INFO mapred.LocalJobRunner: 13/01/09 01:26:29 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 13/01/09 01:26:29 INFO mapred.LocalJobRunner: 13/01/09 01:26:29 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed to commit now 13/01/09 01:26:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to /mapreduce/wordcount/output/1 13/01/09 01:26:32 INFO mapred.LocalJobRunner: reduce > reduce 13/01/09 01:26:32 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done. 13/01/09 01:26:33 INFO mapred.JobClient: map 100% reduce 100% 13/01/09 01:26:33 INFO mapred.JobClient: Job complete: job_local_0001 13/01/09 01:26:33 INFO mapred.JobClient: Counters: 19 13/01/09 01:26:33 INFO mapred.JobClient: File Output Format Counters 13/01/09 01:26:33 INFO mapred.JobClient: Bytes Written=1485 13/01/09 01:26:33 INFO mapred.JobClient: FileSystemCounters 13/01/09 01:26:33 INFO mapred.JobClient: FILE_BYTES_READ=6117827 13/01/09 01:26:33 INFO mapred.JobClient: HDFS_BYTES_READ=4960 13/01/09 01:26:33 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6423845 13/01/09 01:26:33 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1485 13/01/09 01:26:33 INFO mapred.JobClient: File Input Format Counters 13/01/09 01:26:33 INFO mapred.JobClient: Bytes Read=1036 13/01/09 01:26:33 INFO mapred.JobClient: Map-Reduce Framework 13/01/09 01:26:33 INFO mapred.JobClient: Map output materialized bytes=2085 13/01/09 01:26:33 INFO mapred.JobClient: Map input records=15 13/01/09 01:26:33 INFO mapred.JobClient: Reduce shuffle bytes=0 13/01/09 01:26:33 INFO mapred.JobClient: Spilled Records=216 13/01/09 01:26:33 INFO mapred.JobClient: Map output bytes=1835 13/01/09 01:26:33 INFO mapred.JobClient: Total committed heap usage (bytes)=986734592 13/01/09 01:26:33 INFO mapred.JobClient: SPLIT_RAW_BYTES=605 13/01/09 01:26:33 INFO mapred.JobClient: Combine input records=0 13/01/09 01:26:33 INFO mapred.JobClient: Reduce input records=108 13/01/09 01:26:33 INFO mapred.JobClient: Reduce input groups=87 13/01/09 01:26:33 INFO mapred.JobClient: Combine output records=0 13/01/09 01:26:33 INFO mapred.JobClient: Reduce output records=87 13/01/09 01:26:33 INFO mapred.JobClient: Map output records=108
12 编程实现对hdfs中文件的操作
代码:
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordCount { public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); if (args.length != 2) { System.err.println("Usage: wordcount "); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setReducerClass(IntSumReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
运行结果
13 sequenceFile(顺序文件)的读写 这里只实现了写(mapfile文件的读写则类似):
代码:
import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; public class SequenceFileWriteDemo { private static final String[] DATA= { "one,teo,buckle my shoe", "Three,four,shut the door", "Five,six,pick up sticks", "Seven,eight,lay them straight", "Nine,ten,a big fat hen" }; public static void main(String[] args) throws Exception{ String uri=args[0]; Configuration conf =new Configuration(); FileSystem fs=FileSystem.get(URI.create(uri),conf); Path path=new Path(uri); IntWritable key=new IntWritable(); Text value=new Text(); SequenceFile.Writer writer=null; try { writer=SequenceFile.createWriter(fs, conf, path,key.getClass(),value.getClass()); for(int i=0;i<100;i++) { key.set(100-i); value.set(DATA[i%DATA.length]); System.out.printf("[%s]\t%s\t%s\n",writer.getLength(),key,value); writer.append(key, value); } } finally{ IOUtils.closeStream(writer); } } }
运行eclipse结果:
之后通过cygin的读命令来查看(也可以通过编程来实现查看,注意是sequencefile文件,所以直接在windwos下记事本打开会出现乱码):
hadoop的网络用户界面:
JobTracker:(http://jobtracker-host:50030),方便跟踪Job工作进程,查看工作统计和日志;http://localhost:50030/
NameNode: (http://jobtracker-host:50070),查看NameNode的基本情况,HDFS中的内容,NameNode日志 http://localhost:50070/
相关推荐
Web服务器三剑客运维配置实战 Nginx+JVM+Tomcat+HTTP协议 视频教程+笔记+课件+资料 虽然在课程中还讲解了部分HTTP协议的技术,但是课程的重点还是NGINX、JVM、Tomcat三相运维与配置技术。课程内容包括了Nginx进阶...
jvm+ 垃圾算法+垃圾回收概述
2008Amazon四星半新书_The+Definitive+Guide+to+Terracotta_+Cluster+the+JVM+for+Spring,+Hibernate+and+P.rar
数据库:MySQL、Redis 框架:Spring、SpringBoot 消息中间件:RabbitMQ Java虚拟机 Redis+MySQL+Spring+RabbitMQ+JVM+面试常考知识点+思维导图
2022面试200题目和答案分布式+微服务+MYSQL+Redis+JVM+Spring
(nginx+redis+zookeeper+activemq+storm+dubbo+netty+jvm+并发编程锁+项目实战)
常见面试题 :分布式+微服务+MYSQL+Redis+JVM+Spring等等(MD文档格式)
阿里扫地僧多隆出品,深度剖析jvm+nio,匠心制作,看完保证会对jvm和nio有一个全新的认识.
Java面试JVM+多线程重点突破.zip
2023最新JVM+多线程面试真题(面试必过)
课程内容包括了Nginx进阶基础,Nginx配置提升,JVM虚拟机尝试,JVM运维实用排障工具,JVM监控工具,Tomcat配置,Tomcat运维与部署等Web服务器的必备运维部署技术。 (1)\笔记+课件+资料;目录中文件数:0个 (2)\...
JVM+多线程面试题最全版
内容概要:面试自我概要介绍+java基础八股文+jvm详解+锁分类+线程池简要+map数据结构+缓存简要+redis简要+数据库(mysql详解)+spring概要+网络高频+linux简要 适用人群:适用java后端找工作的人群,工作经验三年内...
jvm和多线程基础知识分享,可以作为面试材料
如何获得jvm的进程 通常, 我们回使用ps auxf |grep java 这么来查找Java进程, 通常情況是输出大把大把我们不需要的信息。 这个命令实在是不大好 用。JDK5.0加入了多个分析工具来让开发者更加方便地调试他们自己的...
该资源通过图像及文字详细分析回答了JVM垃圾回收机制的三个重要面试问题: 1.哪些垃圾是需要回收的? 判断对象是否需要回收有两种算法。一种是引用计数算法、一种是可达性分析算法。 2.有哪些重要的垃圾回收算法? ...
JVM是Java Virtual Machine(Java虚拟机)的缩写,是一个虚构出来的计算机,是通过在实际的计算机上仿真模拟计算机功能来实现的,JVM屏蔽了与具体操作系统平台相关的信息,Java程序只需生成在Java虚拟机上运行的字节...
redis面试题
JVM配置资料JVM配置资料JVM配置资料JVM配置资料
Java虚拟机与Hadoop的介绍,附带一些源码分析