- 浏览: 572465 次
- 性别:
- 来自: 厦门
文章分类
- 全部博客 (669)
- oracle (36)
- java (98)
- spring (48)
- UML (2)
- hibernate (10)
- tomcat (7)
- 高性能 (11)
- mysql (25)
- sql (19)
- web (42)
- 数据库设计 (4)
- Nio (6)
- Netty (8)
- Excel (3)
- File (4)
- AOP (1)
- Jetty (1)
- Log4J (4)
- 链表 (1)
- Spring Junit4 (3)
- Autowired Resource (0)
- Jackson (1)
- Javascript (58)
- Spring Cache (2)
- Spring - CXF (2)
- Spring Inject (2)
- 汉字拼音 (3)
- 代理模式 (3)
- Spring事务 (4)
- ActiveMQ (6)
- XML (3)
- Cglib (2)
- Activiti (15)
- 附件问题 (1)
- javaMail (1)
- Thread (19)
- 算法 (6)
- 正则表达式 (3)
- 国际化 (2)
- Json (3)
- EJB (3)
- Struts2 (1)
- Maven (7)
- Mybatis (7)
- Redis (8)
- DWR (1)
- Lucene (2)
- Linux (73)
- 杂谈 (2)
- CSS (13)
- Linux服务篇 (3)
- Kettle (9)
- android (81)
- protocol (2)
- EasyUI (6)
- nginx (2)
- zookeeper (6)
- Hadoop (41)
- cache (7)
- shiro (3)
- HBase (12)
- Hive (8)
- Spark (15)
- Scala (16)
- YARN (3)
- Kafka (5)
- Sqoop (2)
- Pig (3)
- Vue (6)
- sprint boot (19)
- dubbo (2)
- mongodb (2)
最新评论
package rock.lee.wordcount; import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MyWordCount { /** * @author Rock Lee * * @Description * LongWritable,输入 * key类型 Text, * 输入value类型 * Text, 输出key类型 * IntWritable,输出vlaue类型 */ static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final IntWritable ONE = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException { //读取每行的数据 String lineValue = value.toString(); //对每行数据进行分割\r\n\t StringTokenizer stzer = new StringTokenizer(lineValue); Text text = new Text(); while (stzer.hasMoreTokens()) { //获取分割后的每个值 String val = stzer.nextToken(); //key值 text.set(val); //key-->value context.write(text, ONE); } } } /** * * @author Rock Lee * * @Description */ static class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values,Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum+= val.get(); } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { //获取配置信息 Configuration configuration = new Configuration(); //创建任务,设置名称 Job job = new Job(configuration,"WC"); //设置任务运行类 job.setJarByClass(MyWordCount.class); //设置Mapper和Reducer类 job.setMapperClass(MyMapper.class); job.setReducerClass(MyReduce.class); //设置输入/输出路径 FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //设置输出结果key/value类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //提交任务,等待运行结果,并在客户端显示运行信息 boolean success = job.waitForCompletion(true); System.exit(success?0:1); } }
运行wc.jar
[root@centos data]# hadoop jar wc.jar /opt/wc/input/ /opt/wc/output Warning: $HADOOP_HOME is deprecated. 15/06/11 04:29:10 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 15/06/11 04:29:10 INFO input.FileInputFormat: Total input paths to process : 2 15/06/11 04:29:10 INFO util.NativeCodeLoader: Loaded the native-hadoop library 15/06/11 04:29:10 WARN snappy.LoadSnappy: Snappy native library not loaded 15/06/11 04:29:10 INFO mapred.JobClient: Running job: job_201506110402_0006 15/06/11 04:29:11 INFO mapred.JobClient: map 0% reduce 0% 15/06/11 04:29:32 INFO mapred.JobClient: map 50% reduce 0% 15/06/11 04:29:42 INFO mapred.JobClient: map 100% reduce 0% 15/06/11 04:30:05 INFO mapred.JobClient: map 100% reduce 100% 15/06/11 04:30:05 INFO mapred.JobClient: Job complete: job_201506110402_0006 15/06/11 04:30:05 INFO mapred.JobClient: Counters: 29 15/06/11 04:30:05 INFO mapred.JobClient: Job Counters 15/06/11 04:30:05 INFO mapred.JobClient: Launched reduce tasks=1 15/06/11 04:30:05 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=40074 15/06/11 04:30:05 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 15/06/11 04:30:05 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 15/06/11 04:30:05 INFO mapred.JobClient: Launched map tasks=2 15/06/11 04:30:05 INFO mapred.JobClient: Data-local map tasks=2 15/06/11 04:30:05 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=21707 15/06/11 04:30:05 INFO mapred.JobClient: File Output Format Counters 15/06/11 04:30:05 INFO mapred.JobClient: Bytes Written=30 15/06/11 04:30:05 INFO mapred.JobClient: FileSystemCounters 15/06/11 04:30:05 INFO mapred.JobClient: FILE_BYTES_READ=96 15/06/11 04:30:05 INFO mapred.JobClient: HDFS_BYTES_READ=260 15/06/11 04:30:05 INFO mapred.JobClient: FILE_BYTES_WRITTEN=160215 15/06/11 04:30:05 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=30 15/06/11 04:30:05 INFO mapred.JobClient: File Input Format Counters 15/06/11 04:30:05 INFO mapred.JobClient: Bytes Read=44 15/06/11 04:30:05 INFO mapred.JobClient: Map-Reduce Framework 15/06/11 04:30:05 INFO mapred.JobClient: Map output materialized bytes=102 15/06/11 04:30:05 INFO mapred.JobClient: Map input records=4 15/06/11 04:30:05 INFO mapred.JobClient: Reduce shuffle bytes=102 15/06/11 04:30:05 INFO mapred.JobClient: Spilled Records=16 15/06/11 04:30:05 INFO mapred.JobClient: Map output bytes=74 15/06/11 04:30:05 INFO mapred.JobClient: CPU time spent (ms)=820 15/06/11 04:30:05 INFO mapred.JobClient: Total committed heap usage (bytes)=413466624 15/06/11 04:30:05 INFO mapred.JobClient: Combine input records=0 15/06/11 04:30:05 INFO mapred.JobClient: SPLIT_RAW_BYTES=216 15/06/11 04:30:05 INFO mapred.JobClient: Reduce input records=8 15/06/11 04:30:05 INFO mapred.JobClient: Reduce input groups=4 15/06/11 04:30:05 INFO mapred.JobClient: Combine output records=0 15/06/11 04:30:05 INFO mapred.JobClient: Physical memory (bytes) snapshot=313032704 15/06/11 04:30:05 INFO mapred.JobClient: Reduce output records=4 15/06/11 04:30:05 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1127878656 15/06/11 04:30:05 INFO mapred.JobClient: Map output records=8
转自:http://mvplee.iteye.com/blog/2218989
发表评论
文章已被作者锁定,不允许评论。
-
Hadoop namenode的fsimage与editlog详解
2017-05-19 10:04 1125Namenode主要维护两个文件,一个是fsimage,一个是 ... -
Hadoop HBase建表时预分区(region)的方法学习
2017-05-15 11:18 1145如果知道Hbase数据表的key的分布情况,就可以在建表的时候 ... -
Hadoop HBase行健(rowkey)设计原则学习
2017-05-15 10:34 1084Hbase是三维有序存储的,通过rowkey(行键),colu ... -
Hadoop HBase中split原理学习
2017-05-12 13:38 2204在Hbase中split是一个很重 ... -
Hadoop HBase中Compaction原理学习
2017-05-12 10:34 950HBase Compaction策略 RegionServer ... -
Hadoop HBase性能优化学习
2017-05-12 09:15 648一、调整参数 入门级的调优可以从调整参数开始。投入小,回报快 ... -
Hadoop 分布式文件系统学习
2017-05-10 15:34 461一. 分布式文件系统 分布式文件系统,在整个分布式系统体系中处 ... -
Hadoop MapReduce处理wordcount代码分析
2017-04-28 14:25 547package org.apache.hadoop.exa ... -
Hadoop YARN完全分布式配置学习
2017-04-26 10:27 536版本及配置简介 Java: J ... -
Hadoop YARN各个组件和流程的学习
2017-04-24 19:04 601一、基本组成结构 * 集 ... -
Hadoop YARN(Yet Another Resource Negotiator)详细解析
2017-04-24 18:30 1091带有 MapReduce 的 Apache Had ... -
Hive 注意事项与扩展特性
2017-04-06 19:31 6981. 使用HIVE注意点 字符集 Hadoop和Hive都 ... -
Hive 元数据和QL基本操作学习整理
2017-04-06 14:36 963Hive元数据库 Hive将元数据存储在RDBMS 中,一般常 ... -
Hive 文件压缩存储格式(STORED AS)
2017-04-06 09:35 2226Hive文件存储格式包括以下几类: 1.TEXTFILE ... -
Hive SQL自带函数总结
2017-04-05 19:25 1098字符串长度函数:length ... -
Hive 连接查询操作(不支持IN查询)
2017-04-05 19:16 662CREATE EXTERNAL TABLE IF NOT ... -
Hive优化学习(join ,group by,in)
2017-04-05 18:48 1759一、join优化 Join ... -
Hive 基础知识学习(语法)
2017-04-05 15:51 847一.Hive 简介 Hive是基于 Hadoop 分布式文件 ... -
Hive 架构与基本语法(OLAP)
2017-04-05 15:16 1204Hive 是什么 Hive是建立在Hadoop上的数据仓库基础 ... -
Hadoop MapReduce操作Hbase范例学习(TableMapReduceUtil)
2017-03-24 15:37 1130Hbase里的数据量一般都 ...
相关推荐
Hadoop 用mapreduce实现Wordcount实例,绝对能用
<groupId>com.hadoop.mapreduce</groupId> <artifactId>wordcount <version>0.0.1-SNAPSHOT <packaging>jar <name>wordcount <url>http://maven.apache.org</url> <project.build.sourceEncoding>UTF-8 ...
myeclipse +maven 搭建的hadoop mapreduce 例子项目,运行了单机wordcount
wordcount-mapreduce Hadoop MapReduce WordCount 示例应用程序
hadoop 框架下 mapreduce源码例子 wordcount ,eclipse下,hadoop 2.2 可以运行
用java的MapReduce写了个demo,用于计算文档单词出现个数
MapReduceExample 下建立新包 com.xijing.mapreduce,模仿内置的 WordCount 示例,自己编写一个 WordCount 程序,最后打包成 JAR 形式并在 Hadoop 集群上运行该 MR-App,查看运行结果。 4 分别在自编 MapReduce 程序...
(2)编写MepReduce程序 (3)调试和运行MepReduce程序 (4)完成上课老师演示的内容 二、实验环境 Windows 10 VMware Workstation Pro虚拟机 Hadoop环境 Jdk1.8 二、实验内容 1.单词计数实验(wordcount) (1)输入...
这是一个wordcount的一个简单实例jar包,仅仅用来做测试。...map类:org.apache.hadoop.wordcount.WordCountMapReduce$WordCountMapper reduce类 org.apache.hadoop.wordcount.WordCountMapReduce$WordCountReducer
Ubuntu环境下新手学习Hadoop,从配置Hadoop开始,实现mapreduce,过程详细,也是自己学习过程的一个记录。
Hadoop MapReduce WordCount 操作数据库
windows eclipse 下开发hadoop mapreduce,比wordcount复杂一点,入门。
一个自己写的Hadoop MapReduce实例源码,网上看到不少网友在学习MapReduce编程,但是除了wordcount范例外实例比较少,故上传自己的一个。包含完整实例源码,编译配置文件,测试数据,可执行jar文件,执行脚本及操作...
mapreduce wordcount hadoop
文章链接:http://t.csdnimg.cn/Ia8zW Hadoop MapReduce程序,用于实现词频统计任务。通过Hadoop集群来处理大量的文本数据
使用python实现MapReduce的wordcount实例
hadoop中的demo,...# 在容器里运行WordCount程序,该程序需要2个参数:HDFS输入目录和HDFS输出目录 hadoop jar hadoop-mapreduce-examples-2.7.1-sources.jar org.apache.hadoop.examples.WordCount input output
字数 使用Java的Hadoop MapReduce字数统计 运行: hadoop jar wordcount.jar "input_folder" "output_folder" “ input_folder”和“ output_folder”是HDFS上的文件夹。
Hadoop搭建 MapReduce之Wordcount代码实现 代码讲解,通俗易懂。