`

Add third party jars in a job

 
阅读更多

When I submit a java job (include some  map/reduce jobs) in the hue UI using oozie Editor,  the third party jars are not loaded correctly.

 

1. the only success way i used is to build a fat jar which contains all dependency classes in thrid party jars.



  

 

 

2.If I export my application jar from eclipse with option including the thrid party jars in the jar's lib dir,  hadoop still

can't find these jars, although some posts show that hadoop will auto extract these jars when run job.

And in oozie document, it says that if set related configuration using uber jar. The raw content is below, I followed

it, but fails too. 

------------

For Map-Reduce jobs (not including streaming or pipes), additional jar files can also be included via an uber jar. An uber jar is a jar file that contains additional jar files within a "lib" folder (see Workflow Functional Specification for more information). Submitting a workflow with an uber jar requires at least Hadoop 2.2.0 or 1.2.0. As such, using uber jars in a workflow is disabled by default. To enable this feature, use the oozie.action.mapreduce.uber.jar.enable property in the oozie-site.xml (and make sure to use a supported version of Hadoop).

<configuration>
    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>true</value>
    </property>
</configuration>

----------

 

 

 

 

3.I put the third party jars in oozie deploy dir, some works but anothers doesn't work.

 

 

4.when i design a job, add the third party jars in file option, see below. But the result likes 3



 

 

 

 

--------------

 

There are multiple ways to add the jars to the lib path.
I would suggest you try one of the following

1) in your job.properties, add the path to your jar files (comma separated)
to *oozie.libpath *property
2) create a *lib* directory under your application's directory and put your
jars under lib so oozie can pick them up.

 -------------------------------------------------

The same problem is http://qnalist.com/questions/4588838/how-to-manage-dependencies-for-m-r-job-which-is-executed-using-oozie-java-action

 

The solution is :http://pangool.net/executing_examples.html

 

a. in java driver class which sumbit a map/reduce job, You should use

 

Job job = HadoopUtil.prepareJob(inputPath, outputPath, inputFormat,
				mapper, mapperKey, mapperValue, reducer, reducerKey,
				reducerValue, outputFormat, getConf());
 

 

instead of using

 

// Path input = new Path(userContactNumberDir);
// String redisServer = getOption("redisServer");
// String zsetkey = getOption("zsetkey") + "_user";
//		
// Configuration conf = new Configuration();
// Job job = new Job(conf, "push topk users to reids zset");
// job.setJarByClass(TopKItemHotToRedisJob.class);
// job.setOutputKeyClass(NullWritable.class);
// job.setOutputValueClass(ItemPropertyWritable.class);
//
// job.setMapperClass(TopKItemHotToRedisJob.Map.class);
//
// job.setReducerClass(TopKItemHotToRedisJob.Reduce.class);
// job.setNumReduceTasks(1);
// job.setInputFormatClass(TextInputFormat.class);
// job.setOutputFormatClass(RedisZSetOutputFormat.class);
//
// RedisZSetOutputFormat.setRedisHosts(job,redisServer);
// RedisZSetOutputFormat.setRedisKey(job, zsetkey);
//
// FileInputFormat.addInputPath(job,input);
//
// job.getConfiguration().setInt(TopKItemHotToRedisJob.NUMK,
// Integer.parseInt(getOption("numKUser")));
//
// job.waitForCompletion(true);
 The complete alternative code likes

 

 

Path input = new Path(userContactNumberDir);
Path output = new Path(notUsedDir);
HadoopUtil.delete(getConf(), output);
			
Job getTopKUsersJob = prepareJob(input, output,TextInputFormat.class,
					TopKItemHotToRedisJob.Map.class, NullWritable.class,
					ItemPropertyWritable.class,
					TopKItemHotToRedisJob.Reduce.class, IntWritable.class,
					DoubleWritable.class,RedisZSetOutputFormat.class);
getTopKUsersJob.setJobName("Get topk user and write to redis");
getTopKUsersJob.setNumReduceTasks(1);

RedisZSetOutputFormat.setRedisHosts(getTopKUsersJob,getOption("redisServer"));
RedisZSetOutputFormat.setRedisKey(getTopKUsersJob, getOption("zsetkey") + "_user");
	getTopKUsersJob.getConfiguration().setInt(TopKItemHotToRedisJob.NUMK,
	Integer.parseInt(getOption("numKUser")));
			
getTopKUsersJob.waitForCompletion(true);
 

 

 

b. when design java action, you should add the following arg

-libjars  hdfs://192.168.0.131:2014/user/inok/inok3/conf/thirdlib.jar

and add the thirdlib.jar as file



 ------

But I thought there otherway is to avoid step b is upload thirdlib.jar to   path/to/wf-deployment-dir/lib/ .

 and set 

oozie.wf.application.path

 

 but the actual oozie.wf.application.path  is



 

 This means that in hue, the oozie.wf.application.path cann't be owerwrited

 

 

 

 

References

http://hadoopi.wordpress.com/2014/06/05/hadoop-add-third-party-libraries-to-mapreduce-job/

http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

  • 大小: 43.5 KB
  • 大小: 37.2 KB
  • 大小: 27.9 KB
  • 大小: 29 KB
  • 大小: 28.1 KB
分享到:
评论

相关推荐

    addjars-maven-plugin-1.0.1.jar

    官方版本,亲测可用

    addjars-maven-plugin-1.0.jar

    官方版本,亲测可用

    javaEE6jar包

    Java EE(Java Platform, Enterprise Edition)是Oracle公司提供的一个企业级应用开发平台,它定义了一系列标准和API,用于构建分布式、多层的企业级应用程序。Java EE 6是这个平台的一个重要版本,发布于2009年,它...

    j2ee-1.4.jar

    j2ee-1.4.jar j2ee-1.4.jar j2ee-1.4.jar j2ee-1.4.jar

    解决Eclipse add external jars运行出现java.lang.NoClassDefFoundError的方法

    Eclipse是一款流行的集成开发环境(IDE),被广泛用于Java等语言的开发。在开发过程中,常常需要将一些外部的JAR(Java Archive)文件添加到Eclipse项目中以便使用这些JAR包中定义的类和资源。但是在某些情况下,...

    Required jars by JCS

    What jars are required by JCS? As of verison 1.2.7.0, the core of JCS (the LRU memory cache, the indexed disk cache, the TCP lateral, and the RMI remote server) requires only two other jars. ...

    维基百科javaAPI----JWPL

    In order to run the jars, you have to add the jars in the dependency folder to your classpath and define the main class. These jars are mainly intended for integration in your projects if you are not ...

    apache jars

    Apache JARs是一系列由Apache软件基金会开发的Java库,它们是Java应用程序和Web应用程序中常用的组件。这些JAR(Java Archive)文件包含了各种模块化的功能,为开发者提供了丰富的工具和功能,使得开发过程更加高效...

    jdbc jars.rar

    标题中的"jdbc jars.rar"表明这是一个与Java数据库连接(JDBC)相关的压缩文件,其中包含了多个JDBC驱动程序的jar包。这些驱动程序是Java应用程序与各种数据库进行交互的桥梁,使得开发人员能够使用标准的Java API来...

    ArcGIS Desktop扩展方式Addin

    ### ArcGIS Desktop 扩展方式 Add-in:深度解析与应用 #### 一、Add-in:开启ArcGIS Desktop 功能拓展的新篇章 Add-in是Esri为ArcGIS Desktop用户设计的一种创新性的扩展方式,旨在使软件的功能拓展变得更为简便、...

    JARS-14323

    本文发表在国际知名遥感杂志JARS上,内容涉及合成孔径雷达(SAR)和地面运动目标检测(GMTI),对偏置天线相位中心技术(DPCA)和沿航迹干涉技术(ATI)进行了理论建模和系统全面地性能对比。具有相当的参考价值,欢迎...

    Android代码-致Android开发 灵活的Class替换插件

    if you encounter some bugs on the third-party jars, the ClassPlugin will be the best way to solve it. Installation To use ClassPlugin in a module, add the plugin to your buildscript: buildscript { ...

    hive1_2_1jars.zip

    标题“hive1_2_1jars.zip”指的是一个包含Hive 1.2.1版本相关库的压缩文件,这些库可能用于支持Spark 3.x版本与Hive的交互。在Spark 3中,如果你需要连接到Hive元存储进行数据操作,你需要正确配置Spark的`spark.sql...

    C:\Users\Administrator\.gradle\caches\jars-1\cache.properties (系统找不到指定文件)解决办法

    ### 解决"C:\Users\Administrator\.gradle\caches\jars-1\cache.properties"错误的方法 在进行Android开发时,我们经常会使用Gradle作为构建工具。Gradle通过一个缓存机制来存储已下载的依赖项和其他资源,以提高...

    java开发中经常用到的jars包

    在Java开发过程中,JAR(Java Archive)文件是不可或缺的一部分,它们封装了编译后的类文件和其他资源,便于代码的组织、分发和执行。以下是一些Java开发中经常用到的JAR包及其功能详解: 1. **JSON数据转换**: ...

    javaee6 jar

    JavaEE6是Java企业版的一个重要版本,它提供了一套标准的API和框架,用于开发分布式、企业级的Web应用程序。这个压缩包“javaee6 jar”包含了开发JavaEE6项目所需的一些核心库,适用于在MyEclipse10及其以下版本的...

    addjars-maven-plugin-1.0.5.jar

    官方版本,亲测可用

    addjars-maven-plugin-1.0.4.jar

    官方版本,亲测可用

    addjars-maven-plugin-1.0.3.jar

    官方版本,亲测可用

    addjars-maven-plugin-1.0.2.jar

    官方版本,亲测可用

Global site tag (gtag.js) - Google Analytics