安装好了hadoop集群环境,详细见(hadoop安装),当然得要跑一下Map/Reduce
package com.hadoop;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MaxValue {
static class MaxValueMapper extends
Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
// 从每行的数据中分解要统计的key和value
String thekey = line.substring(0, 4);
int theValue = Integer.parseInt(line.substring(5, 8));
context.write(new Text(thekey), new IntWritable(theValue));
}
}
static class MaxValueReducer extends
Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
// key求出最大的温度值
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxValue <input path> <output path>");
System.exit(-1);
}
Job job = new Job();
job.setJarByClass(MaxValue.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setMapperClass(MaxValueMapper.class);
job.setReducerClass(MaxValueReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
代码很简单,打包成jar包,随便命名成first.jar
然后写个程序,随机生成一批数据上传到hadoop中,为程序简单处理,生成一批格式如
2000 111
2012 333
2012 444
2000 222
类似一大批数据命名为temp.txt
上传
hadoop dfs -put temp1.txt /user/hadoop/input/
列出HDFS下的文件
hadoop dfs -ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2012-04-06 13:40 /user/hadoop/input
drwxr-xr-x - hadoop supergroup 0 2012-04-06 11:30 /user/hadoop/output
hadoop dfs -ls in 列出HDFS下某个文档中的文件
hadoop dfs -ls in /user/hadoop/input
Found 2 items
-rw-r--r-- 2 hadoop supergroup 2043 2012-02-29 18:18 /user/hadoop/input/slaves.sh
-rw-r--r-- 2 hadoop supergroup 10000 2012-04-06 13:39 /user/hadoop/input/temp1.txt
可以看到上传上去的temp1.txt文件
然后运行
hadoop jar first.jar com.hadoop.MaxValue /user/hadoop/input/temp1.txt output1
first.jar jar的名字
com.hadoop.MaxValue 类名
/user/hadoop/input/temp1.txt mian函数对应的输入
output1 mian函数对应的输出
然后可以看到
12/04/06 13:41:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/04/06 13:41:43 INFO input.FileInputFormat: Total input paths to process : 1
12/04/06 13:41:44 INFO mapred.JobClient: Running job: job_201203121856_0005
12/04/06 13:41:45 INFO mapred.JobClient: map 0% reduce 0%
12/04/06 13:41:58 INFO mapred.JobClient: map 100% reduce 0%
12/04/06 13:42:10 INFO mapred.JobClient: map 100% reduce 100%
12/04/06 13:42:15 INFO mapred.JobClient: Job complete: job_201203121856_0005
12/04/06 13:42:15 INFO mapred.JobClient: Counters: 25
12/04/06 13:42:15 INFO mapred.JobClient: Job Counters
12/04/06 13:42:15 INFO mapred.JobClient: Launched reduce tasks=1
12/04/06 13:42:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12195
12/04/06 13:42:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/04/06 13:42:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/04/06 13:42:15 INFO mapred.JobClient: Launched map tasks=1
12/04/06 13:42:15 INFO mapred.JobClient: Data-local map tasks=1
12/04/06 13:42:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10085
12/04/06 13:42:15 INFO mapred.JobClient: File Output Format Counters
12/04/06 13:42:15 INFO mapred.JobClient: Bytes Written=27
12/04/06 13:42:15 INFO mapred.JobClient: FileSystemCounters
12/04/06 13:42:15 INFO mapred.JobClient: FILE_BYTES_READ=11006
12/04/06 13:42:15 INFO mapred.JobClient: HDFS_BYTES_READ=10111
12/04/06 13:42:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=63937
12/04/06 13:42:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=27
12/04/06 13:42:15 INFO mapred.JobClient: File Input Format Counters
12/04/06 13:42:15 INFO mapred.JobClient: Bytes Read=10000
12/04/06 13:42:15 INFO mapred.JobClient: Map-Reduce Framework
12/04/06 13:42:15 INFO mapred.JobClient: Reduce input groups=3
12/04/06 13:42:15 INFO mapred.JobClient: Map output materialized bytes=11006
12/04/06 13:42:15 INFO mapred.JobClient: Combine output records=0
12/04/06 13:42:15 INFO mapred.JobClient: Map input records=1000
12/04/06 13:42:15 INFO mapred.JobClient: Reduce shuffle bytes=11006
12/04/06 13:42:15 INFO mapred.JobClient: Reduce output records=3
12/04/06 13:42:15 INFO mapred.JobClient: Spilled Records=2000
12/04/06 13:42:15 INFO mapred.JobClient: Map output bytes=9000
12/04/06 13:42:15 INFO mapred.JobClient: Combine input records=0
12/04/06 13:42:15 INFO mapred.JobClient: Map output records=1000
12/04/06 13:42:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=111
12/04/06 13:42:15 INFO mapred.JobClient: Reduce input records=1000
同时可以在控制台http://node1:50030/jobtracker.jsp
看到任务的运行情况,
运行完毕,可以执行
hadoop dfs -ls in /user/hadoop/output1
Found 3 items
-rw-r--r-- 2 hadoop supergroup 0 2012-04-06 13:42 /user/hadoop/output1/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2012-04-06 13:41 /user/hadoop/output1/_logs
-rw-r--r-- 2 hadoop supergroup 27 2012-04-06 13:42 /user/hadoop/output1/part-r-00000
生成了结果文件,
查看最终结果
hadoop dfs -cat /user/hadoop/output1/part-r-00000
2009 999
2010 129
2011 177
这是程序生成原始数据,然后手动加入2009 最大值999 可以检查结果是否正确
当然可以看看hadoop的示例,在http://hadoop.apache.org/common/docs/r0.20.2/cn/mapred_tutorial.html
分享到:
相关推荐
hadoop中map/reduce自学资料合集
讲述了Windows平台的Hadoop安装... 最后,以最简单的求和为例,剖析Hadoop的Map/Reduce工作机制,对于初学Hadoop及Map/Reduce的读者有很大的帮助。相信通过最简单的求和为例,读者可步入Hadoop的Map/Reduce开发者行列。
NULL 博文链接:https://sgq0085.iteye.com/blog/1879442
hadoop开发文档
Hadoop Map Reduce教程,介绍hadoop map/reduce框架的各个方面
【摘要】在对Map/Reduce算法进行分析的基础上,利用开源Hadoop软件设计出高容错高性能的分布式搜索引擎,以面对搜索引擎对海量数据的处理和存储问题。
hadoop的map reduce 学习手册,很实用
Hadoop Map-Reduce教程,hadoop,mapreduce
本文在研究BIRCH算法、规则关联算法、Hadoop的map/reduce机制的基础上,提 出了一种基于map/reduce的应用于网络安全事件分析的并行关联方法。一方面,通过对BIRCH 算法的改进,在BIRCH的分层次思想中引入预定义的...
Hadoop学习总结之三:Map-Reduce入门
hadoop权威指南,hadoop map/reduce 分布式计算
hadoop权威指南,hadoop map/reduce 分布式计算
eclipse中使用Hadoop Map/Reduce插件进行map/reduce的开发
hadoop2.2.0/2.6.0/2.7.0/2.7.1 64位安装包。
在对Map/Reduce算法进行分析的基础上,利用开源Hadoop软件设计出高容错高性能的分布式搜索引擎,以面对搜索引擎对海量数据的处理和存储问题
放到eclipse的plugins目录下。重启eclipse ...配置 Map/Reduce Master和DFS Mastrer,Host和Port配置成hdfs-site.xml与core-site.xml的设置一致即可。 如果连接成功,会出现hdfs上面的文件夹
Hadoop安装教程_单机/伪分布式配置_Hadoop2.7.1/Ubuntu 16.04
Hadoop Map/Reduce是一个使用简易的软件框架,基于它写出来的应用程序能够运行在由上千个商用机器组成的大型集群上,并以一种可靠容错的方式并行处理上T级别的数据集。 一个Map/Reduce 作业(job) 通常会把输入的...
Hadoop/etc/hadoop/slaves 的IP地址要变。 5个重要的配置文件逐个检查,IP地址需要变 2.配置文件确认无错误,看日志: 从logs日志中寻找slave的namenode为什么没有起来。 3.最后发现是hdfs中存在上次的数据,删掉...
Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04