Add third party jars in a job

ylzhj02

浏览: 242809 次
性别:
来自: 成都

最近访客更多访客>>

daqin

bbpopeye

也许on

learnmore

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hue
oozie
Hadoop

When I submit a java job (include some map/reduce jobs) in the hue UI using oozie Editor, the third party jars are not loaded correctly.

1. the only success way i used is to build a fat jar which contains all dependency classes in thrid party jars.

2.If I export my application jar from eclipse with option including the thrid party jars in the jar's lib dir, hadoop still

can't find these jars, although some posts show that hadoop will auto extract these jars when run job.

And in oozie document, it says that if set related configuration using uber jar. The raw content is below, I followed

it, but fails too.

------------

For Map-Reduce jobs (not including streaming or pipes), additional jar files can also be included via an uber jar. An uber jar is a jar file that contains additional jar files within a "lib" folder (see Workflow Functional Specification for more information). Submitting a workflow with an uber jar requires at least Hadoop 2.2.0 or 1.2.0. As such, using uber jars in a workflow is disabled by default. To enable this feature, use the oozie.action.mapreduce.uber.jar.enable property in the oozie-site.xml (and make sure to use a supported version of Hadoop).

<configuration>
    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>true</value>
    </property>
</configuration>

----------

3.I put the third party jars in oozie deploy dir, some works but anothers doesn't work.

4.when i design a job, add the third party jars in file option, see below. But the result likes 3

--------------

There are multiple ways to add the jars to the lib path.
I would suggest you try one of the following

1) in your job.properties, add the path to your jar files (comma separated)
to *oozie.libpath *property
2) create a *lib* directory under your application's directory and put your
jars under lib so oozie can pick them up.

-------------------------------------------------

The same problem is http://qnalist.com/questions/4588838/how-to-manage-dependencies-for-m-r-job-which-is-executed-using-oozie-java-action

The solution is :http://pangool.net/executing_examples.html

a. in java driver class which sumbit a map/reduce job, You should use

Job job = HadoopUtil.prepareJob(inputPath, outputPath, inputFormat,
				mapper, mapperKey, mapperValue, reducer, reducerKey,
				reducerValue, outputFormat, getConf());

instead of using

// Path input = new Path(userContactNumberDir);
// String redisServer = getOption("redisServer");
// String zsetkey = getOption("zsetkey") + "_user";
//		
// Configuration conf = new Configuration();
// Job job = new Job(conf, "push topk users to reids zset");
// job.setJarByClass(TopKItemHotToRedisJob.class);
// job.setOutputKeyClass(NullWritable.class);
// job.setOutputValueClass(ItemPropertyWritable.class);
//
// job.setMapperClass(TopKItemHotToRedisJob.Map.class);
//
// job.setReducerClass(TopKItemHotToRedisJob.Reduce.class);
// job.setNumReduceTasks(1);
// job.setInputFormatClass(TextInputFormat.class);
// job.setOutputFormatClass(RedisZSetOutputFormat.class);
//
// RedisZSetOutputFormat.setRedisHosts(job,redisServer);
// RedisZSetOutputFormat.setRedisKey(job, zsetkey);
//
// FileInputFormat.addInputPath(job,input);
//
// job.getConfiguration().setInt(TopKItemHotToRedisJob.NUMK,
// Integer.parseInt(getOption("numKUser")));
//
// job.waitForCompletion(true);

The complete alternative code likes

Path input = new Path(userContactNumberDir);
Path output = new Path(notUsedDir);
HadoopUtil.delete(getConf(), output);
			
Job getTopKUsersJob = prepareJob(input, output,TextInputFormat.class,
					TopKItemHotToRedisJob.Map.class, NullWritable.class,
					ItemPropertyWritable.class,
					TopKItemHotToRedisJob.Reduce.class, IntWritable.class,
					DoubleWritable.class,RedisZSetOutputFormat.class);
getTopKUsersJob.setJobName("Get topk user and write to redis");
getTopKUsersJob.setNumReduceTasks(1);

RedisZSetOutputFormat.setRedisHosts(getTopKUsersJob,getOption("redisServer"));
RedisZSetOutputFormat.setRedisKey(getTopKUsersJob, getOption("zsetkey") + "_user");
	getTopKUsersJob.getConfiguration().setInt(TopKItemHotToRedisJob.NUMK,
	Integer.parseInt(getOption("numKUser")));
			
getTopKUsersJob.waitForCompletion(true);

b. when design java action, you should add the following arg

-libjars hdfs://192.168.0.131:2014/user/inok/inok3/conf/thirdlib.jar

and add the thirdlib.jar as file

------