Mapreduce《案例之内连接》 -

bigSeven

浏览: 40669 次
性别:
来自: 深圳

最近访客更多访客>>

icedcoco

hackeryutu

yokoboy

锋之弥漫

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

Mapreduce《案例之内连接》

博客分类：

分布式

mapreduce 自连接 hadoop Demo

Mapreduce《案例之内连接》

数据源：

child parent

Tom Lucy

Tom Jack

Jone Lucy

Jone Jack

Lucy Mary

Lucy Ben

Jack Alice

Jack Jesse

Terry Alice

Terry Jesse

Philip Terry

Philip Alma

Mark Terry

Mark Alma

输出结果为：

grandChildgrandParent

TomAlice

TomJesse

JoneAlice

JoneJesse

TomMary

TomBen

JoneMary

JoneBen

PhilipAlice

PhilipJesse

MarkAlice

MarkJesse

===================================JAVA CODE ======================

package gq;

import java.io.IOException;

import java.util.Iterator;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**

* Class Description:自连接，爷找孙子测试类

* Author:gaoqi

* Date:2015年6月5日下午2:03:08

public class OwnerJoin {

public static int TIME = 0;

public static class Map extends Mapper<Object, Text, Text, Text>{

public void map(Object key,Text value,Context context) throws IOException, InterruptedException{

String line = value.toString();

String childname = "";

String parentname="";

String flag = "";

StringTokenizer stk = new StringTokenizer(line);

String[] _values = new String[2];

int i= 0;

while(stk.hasMoreElements()){

_values[i]=stk.nextToken();

i++;

}

if(_values[0].compareTo("child") !=0){

childname = _values[0];

parentname = _values[1];

flag = "1";//left table

context.write(new Text(parentname), new Text(flag+"-"+childname+"-"+parentname));

flag="2";//right table

context.write(new Text(childname), new Text(flag+"-"+childname+"-"+parentname));

}

public static class Reduce extends Reducer<Text, Text, Text, Text>{

public void reduce(Text key,Iterable<Text> values,Context context) throws IOException, InterruptedException{

String[] pArray = new String[10];

String[] cArray = new String[10];

int pnum = 0;

int cnum = 0;

Iterator<Text> vals = values.iterator();

while(vals.hasNext()){

String recod = vals.next().toString();

String[] ss = recod.split("-");

String flag = ss[0];

String childname = ss[1];

String parentname = ss[2];

System.out.println(flag+"-"+childname+"--"+parentname);

if(flag.equals("1")){

cArray[cnum] = childname;

cnum++;

}elseif(flag.equals("2")){

pArray[pnum] = parentname;

pnum++;

}

if(TIME ==0){

context.write(new Text("grandChild"), new Text("grandParent"));

TIME++;

}

if(pnum != 0 && cnum !=0 ){

for(int j=0;j<cnum;j++){

for(int k=0;k<pnum;k++){

context.write(new Text(cArray[j]), new Text(pArray[k]));

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = new Job(conf,"OwnerJoin");

job.setJarByClass(OwnerJoin.class);

job.setMapperClass(Map.class);

//job.setCombinerClass(Reduce.class);

job.setReducerClass(Reduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

FileInputFormat.addInputPath(job, new Path("hdfs://h0:9000/user/tallqi/in/inputOwnerJoin"));

FileOutputFormat.setOutputPath(job, new Path("hdfs://h0:9000/user/tallqi/in/outputOwnerJoin"));

System.exit(job.waitForCompletion(true)?0:1);

}

分享到：

Mapreduce《案例之两表连接》 | Mapreduce《案例之倒排索引》

2015-08-15 16:40
浏览 628
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Mapreduce《案例之内连接》

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

Mapreduce《案例之内连接》

评论

发表评论

相关推荐

Mapreduce《案例之数据去重复》

Mapreduce《案例之平均分》

Mapreduce《案例之两表连接》

Mapreduce《案例之倒排索引》

hadoop0.20.2完全分布式安装和配置

Zookeeper简介

最近访客更多访客>>