更多IT互联网学习资源,尽在“通通学 - 知识学习与分享平台”
学习过Hadoop的都知道中,里面有一个经典实例就是统计文档每个单词出现的次数,即WordCount实例。这里利用Executor框架及带返回值的多线程实现Word?Count实例。
以下是核心代码:
WordCountMapper.java
package com.tongtongxue.wordcount; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.InputStreamReader; import java.util.HashMap; import java.util.Map; import java.util.StringTokenizer; import java.util.concurrent.Callable; public class WordCountMapper implements Callable<Map> { private int start; private int end; private File[] files; public WordCountMapper() { } public WordCountMapper(File[] files, int start, int end) { this.files = files; this.start = start; this.end = end; } @Override public Map call() throws Exception { BufferedReader reader = null; Map result = new HashMap(); String line = null; for (int i = start; i < end; i++) { File file = files[i]; try { reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "utf-8")); while ((line = reader.readLine()) != null) { StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { String word = tokenizer.nextToken(); if (result.containsKey(word)) { result.put(word, result.get(word) + 1L); } else { result.put(word, 1L); } } } } finally { if (reader != null) { reader.close(); } } } return result; } }
WordCount.java
package com.tongtongxue.wordcount; import java.io.File; import java.io.FileFilter; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.FutureTask; public class WordCount { private ExecutorService executorService; private int threadNum; private List<Future<Map>> tasks = new ArrayList<Future<Map>>(); private File[] txtFiles; public WordCount() { // 以cup的个数,作为线程个数 threadNum = Runtime.getRuntime().availableProcessors(); executorService = Executors.newFixedThreadPool(threadNum); } public WordCount(int threadNum) { this.threadNum = threadNum; executorService = Executors.newFixedThreadPool(threadNum); } public void count(String dirPath) throws Exception { File dir = new File(dirPath); txtFiles = dir.listFiles(new FileFilter() { @Override public boolean accept(File file) { String fileName = file.getName(); if (fileName.endsWith(".txt") || fileName.endsWith(".TXT")) { return true; } return false; } }); int size = txtFiles.length; for (int i = 0; i size) { end = size; } WordCountMapper mapper = new WordCountMapper(txtFiles, start, end); FutureTask<Map> futureTask = new FutureTask<Map>(mapper); tasks.add(futureTask); if (!executorService.isShutdown()) { executorService.submit(futureTask); } } showResult(); } public void close() { executorService.shutdown(); } public void showResult() throws Exception { Map map = new HashMap(); for (Future<Map> task : tasks) { Map result = task.get(); for (Entry entry : result.entrySet()) { String word = entry.getKey(); Long num = entry.getValue(); if (map.containsKey(word)) { map.put(word, map.get(word) + num); } else { map.put(word, num); } } } System.out.println(map.size()); for (Entry entry : map.entrySet()) { System.out.println(entry.getKey() + " ------> " + entry.getValue()); } } }
转载本文链接为:http://www.tongtongxue.com/archives/1141.html
相关推荐
包含配置hadoop过程中遇到的一些问题的解决办法和成功运行wordcount实例的步骤
使用hadoop实现WordCount详细实验报告,配有环境变量配置截图以及实验运行及结果详细过程描述与截图
关于Hadoop的WordCount实例代码,能够实现实现单词计数。
Hadoop 用mapreduce实现Wordcount实例,绝对能用
这是一个wordcount的一个简单实例jar包,仅仅用来做测试。 map类:org.apache.hadoop.wordcount.WordCountMapReduce$WordCountMapper reduce类 org.apache.hadoop.wordcount.WordCountMapReduce$WordCountReducer
ubuntu上面运行hadoop自带的wordcount实例
大数据实验报告Hadoop编程实现wordcount单词统计程序附源码.doc
一套eclipse中的hadoop开发环境搭建教程,附带了eclipse中的hadoop的两个demo,一个是wordcount、一个是sort,大家一起学习进步!
主要介绍了hadoop的wordcount实例代码,分享了相关代码示例,小编觉得还是挺不错的,具有一定借鉴价值,需要的朋友可以参考下
Hadoop开发WordCount源码程序详细讲解,每一行都带注释说明。
hadoop-wordcount测试程序,jar包,单词统计的不二之选
该代码为hadoop的经典wordcount代码,java实现。代码里有详细注解,适合于入学者。
Hadoop环境搭建及wordcount实例运行.pdf
Hadoop集群-WordCount运行详解.pdf Hadoop,Hadoop集群,WordCount,pdf
hadoop wordcount2.0 包含省略标点,忽略大小写等内容
<artifactId>wordcount <version>0.0.1-SNAPSHOT <packaging>jar <name>wordcount <url>http://maven.apache.org</url> <project.build.sourceEncoding>UTF-8 <groupId>org.apache.hadoop ...
通过hadoop 来进行感情分析,代码可以实现将几个GB大小的数据,来统计词的数量
hadoop1.2.1修改WordCount并编译 在其中加入控制台输出
hadoop wordcount 打包部署