[实验]hadoop例子在线用户分析 -

GQM

浏览: 24251 次
性别:
来自: 上海

最近访客更多访客>>

wafer1021

melin

萝__卜

leoeco2000

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

[实验]hadoop例子在线用户分析

博客分类：

hadoop

一个简单的业务场景和例子。由wordcount例子改写。

业务场景：
每个用户有在线事件，并带有日志。分析一段时间内的在线的用户以及他们的事件数。
备注：假设事件日志中以逗号分割字段，第5个字段为用户识别码

public class ActiveUserMapper extends Mapper<Object, Text, Text, IntWritable> {

	private final static IntWritable one = new IntWritable(1);
	private Text user = new Text();

	protected void map(Object key, Text value, Context context)
			throws IOException, InterruptedException {
		StringTokenizer itr = new StringTokenizer(value.toString(), ",");
		int index = 0;
		while (itr.hasMoreTokens()) {
			if (index == 4) {
				user.set(itr.nextToken());
				context.write(user, one);
				break;
			} else {
				itr.nextToken();
			}
			index++;
		}
	}
}

public class ActiveUserReducer extends
		Reducer<Text, IntWritable, Text, IntWritable> {

	private IntWritable events = new IntWritable();

	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,
			Context context) throws IOException, InterruptedException {
		int sum = 0;
		for (IntWritable val : values) {
			sum += val.get();
		}
		events.set(sum);
		context.write(key, events);
	}
}

public class ActiveUserMRDriver extends Configured implements Tool {

	@Override
	public int run(String[] args) throws Exception {
		if(args.length != 2){
			System.out.printf("Usage %s [generic options] <in> <out>\n", getClass().getName());
			ToolRunner.printGenericCommandUsage(System.out);
			return -1;
		}
		Configuration conf = new Configuration();
		conf.set("fs.default.name", "hdfs://node04vm01:9000");
		
		Job job = new Job(conf, "active user analyst");
	    job.setJarByClass(ActiveUserMRDriver.class);
	    job.setMapperClass(ActiveUserMapper.class);
	    job.setCombinerClass(ActiveUserReducer.class);
	    job.setReducerClass(ActiveUserReducer.class);
	    
	    job.setOutputKeyClass(Text.class);
	    job.setOutputValueClass(IntWritable.class);
	    
	    FileInputFormat.setInputPaths(job, new Path(args[0]));
	    FileOutputFormat.setOutputPath(job, new Path(args[1]));

		return job.waitForCompletion(true) ? 0 : 1;
	}
	
	
	public static void main(String[] args) throws Exception {
		int exitCode = ToolRunner.run(new ActiveUserMRDriver(), args);
		System.exit(exitCode);
	}
}

job报告部分：
13/08/30 15:25:50 INFO mapred.JobClient: Job complete: job_local206120026_0001
13/08/30 15:25:50 INFO mapred.JobClient: Counters: 22
13/08/30 15:25:50 INFO mapred.JobClient:   File Output Format Counters
13/08/30 15:25:50 INFO mapred.JobClient:     Bytes Written=40450120
13/08/30 15:25:50 INFO mapred.JobClient:   FileSystemCounters
13/08/30 15:25:50 INFO mapred.JobClient:     FILE_BYTES_READ=907603353
13/08/30 15:25:50 INFO mapred.JobClient:     HDFS_BYTES_READ=4244630128
13/08/30 15:25:50 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1520436699
13/08/30 15:25:50 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=40450120
13/08/30 15:25:50 INFO mapred.JobClient:   File Input Format Counters
13/08/30 15:25:50 INFO mapred.JobClient:     Bytes Read=612273464
13/08/30 15:25:50 INFO mapred.JobClient:   Map-Reduce Framework
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce input groups=2886293
13/08/30 15:25:50 INFO mapred.JobClient:     Map output materialized bytes=103629708
13/08/30 15:25:50 INFO mapred.JobClient:     Combine output records=12122417
13/08/30 15:25:50 INFO mapred.JobClient:     Map input records=8895828
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/08/30 15:25:50 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce output records=2886293
13/08/30 15:25:50 INFO mapred.JobClient:     Spilled Records=17879555
13/08/30 15:25:50 INFO mapred.JobClient:     Map output bytes=126802892
13/08/30 15:25:50 INFO mapred.JobClient:     CPU time spent (ms)=0
13/08/30 15:25:50 INFO mapred.JobClient:     Total committed heap usage (bytes)=8510898176
13/08/30 15:25:50 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/08/30 15:25:50 INFO mapred.JobClient:     Combine input records=15261107
13/08/30 15:25:50 INFO mapred.JobClient:     Map output records=8895828
13/08/30 15:25:50 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1340
13/08/30 15:25:50 INFO mapred.JobClient:     Reduce input records=5757138

分享到：

[笔记]avro 介绍及官网例子 | [笔记]hadoop mapred InputFormat分析

2013-08-30 15:54
浏览 856
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

[实验]hadoop例子在线用户分析

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

[实验]hadoop例子 在线用户分析

评论

发表评论

相关推荐

[实验]avro与non-avro的mapred例子-wordcount改写

[实验]hadoop例子 trackinfo数据清洗的改写

[笔记]hadoop tutorial - Reducer

[实验]hadoop例子 trackinfo数据清洗

[环境] hadoop 开发环境maven管理

[笔记]avro 介绍及官网例子

[笔记]hadoop mapred InputFormat分析

[笔记]hdfs namenode FSNamesystem分析

[笔记]hdfs namenode FSImage分析1

[实验]集群hadoop配置

[实验]单机hadoop配置

[问题解决]hadoop eclipse plugin

最近访客更多访客>>

[实验]hadoop例子在线用户分析