- 浏览: 214471 次
- 性别:
- 来自: 北京
文章分类
- 全部博客 (114)
- hbase (3)
- akka (7)
- hdfs (6)
- mapreduce (1)
- hive (0)
- zookeeper (8)
- storm (0)
- geese (0)
- leaf (0)
- stormbase (0)
- scala (2)
- oozie (11)
- zeromq (1)
- netty (3)
- mongodb (0)
- sqoop (2)
- flume (3)
- mahout (1)
- redis (0)
- lucene (1)
- solr (1)
- ganglia (3)
- 分布式理论 (2)
- hadoop (42)
- others (14)
- mq (1)
- clojure (3)
- flume ng (1)
- linux (1)
- esper (0)
最新评论
-
javalogo:
[b][i][u]引用[list]
[*][*][flash= ...
什么是Flume -
leibnitz:
what are they meanings
Hadoop Ganglia Metric Item -
di1984HIT:
没用过啊。
akka 介绍-Actor 基础 -
di1984HIT:
写的不错。
Hadoop管理-集群维护 -
developerinit:
很好,基本上介绍了
什么是Flume
Map-Reduce行为
A map-reduce action can be configured to perform file system cleanup and directory creation before starting the map reduce job. This capability enables Oozie to retry a Hadoop job in the situation of a transient failure (Hadoop checks the non-existence of the job output directory and then creates it when the Hadoop job is starting, thus a retry without cleanup of the job output directory would fail).
The workflow job will wait until the Hadoop map/reduce job completes before continuing to the next action in the workflow execution path.
The counters of the Hadoop job and job exit status (=FAILED=, KILLED or SUCCEEDED ) must be available to the workflow job after the Hadoop jobs ends. This information can be used from within decision nodes and other actions configurations.
The map-reduce action has to be configured with all the necessary Hadoop JobConf properties to run the Hadoop map/reduce job.
Hadoop JobConf properties can be specified in a JobConf XML file bundled with the workflow application or they can be indicated inline in the map-reduce action configuration.
The configuration properties are loaded in the following order, streaming , job-xml and configuration , and later values override earlier values.
Streaming and inline property values can be parameterized (templatized) using EL expressions.
The Hadoop mapred.job.tracker and fs.default.name properties must not be present in the job-xml and inline configuration.
3.2.2.1 Adding Files and Archives for the Job
The file , archive elements make available, to map-reduce jobs, files and archives. If the specified path is relative, it is assumed the file or archiver are within the application directory, in the corresponding sub-path. If the path is absolute, the file or archive it is expected in the given absolute path.
Files specified with the file element, will be symbolic links in the home directory of the task.
If a file is a native library (an '.so' or a '.so.#' file), it will be symlinked as and '.so' file in the task running directory, thus available to the task JVM.
To force a symlink for a file on the task running directory, use a '#' followed by the symlink name. For example 'mycat.sh#cat'.
Refer to Hadoop distributed cache documentation for details more details on files and archives.
3.2.2.2 Streaming
Streaming information can be specified in the streaming element.
The mapper and reducer elements are used to specify the executable/script to be used as mapper and reducer.
User defined scripts must be bundled with the workflow application and they must be declared in the files element of the streaming configuration. If the are not declared in the files element of the configuration it is assumed they will be available (and in the command PATH) of the Hadoop slave machines.
Some streaming jobs require Files found on HDFS to be available to the mapper/reducer scripts. This is done using the file and archive elements described in the previous section.
The Mapper/Reducer can be overridden by a mapred.mapper.class or mapred.reducer.class properties in the job-xml file or configuration elements.
3.2.2.3 Pipes
Pipes information can be specified in the pipes element.
A subset of the command line options which can be used while using the Hadoop Pipes Submitter can be specified via elements - map , reduce , inputformat , partitioner , writer , program .
The program element is used to specify the executable/script to be used.
User defined program must be bundled with the workflow application.
Some pipe jobs require Files found on HDFS to be available to the mapper/reducer scripts. This is done using the file and archive elements described in the previous section.
Pipe properties can be overridden by specifying them in the job-xml file or configuration element.
3.2.2.4 Syntax
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<action name="[NODE-NAME]">
<map-reduce>
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<streaming>
<mapper>[MAPPER-PROCESS]</mapper>
<reducer>[REDUCER-PROCESS]</reducer>
<record-reader>[RECORD-READER-CLASS]</record-reader>
<record-reader-mapping>[NAME=VALUE]</record-reader-mapping>
...
<env>[NAME=VALUE]</env>
...
</streaming>
<!-- Either streaming or pipes can be specified for an action, not both -->
<pipes>
<map>[MAPPER]</map>
<reduce>[REDUCER]</reducer>
<inputformat>[INPUTFORMAT]</inputformat>
<partitioner>[PARTITIONER]</partitioner>
<writer>[OUTPUTFORMAT]</writer>
<program>[EXECUTABLE]</program>
</pipes>
<job-xml>[JOB-XML-FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
</map-reduce> <ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The prepare element, if present, indicates a list of path do delete before starting the job. This should be used exclusively for directory cleanup for the job to be executed. The delete operation will be performed in the fs.default.name filesystem.
The job-xml element, if present, must refer to a Hadoop JobConf job.xml file bundled in the workflow application. The job-xml element is optional and if present it can be only one.
The configuration element, if present, contains JobConf properties for the Hadoop job.
Properties specified in the configuration element override properties specified in the file specified in the job-xml element.
The file element, if present, must specify the target sybolic link for binaries by separating the original file and target with a # (file#target-sym-link). This is not required for libraries.
The mapper and reducer process for streaming jobs, should specify the executable command with URL encoding. e.g. '%' should be replaced by '%25'.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="myfirstHadoopJob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="hdfs://foo:9000/usr/tucu/output-data"/>
</prepare>
<job-xml>/myfirstjob.xml</job-xml>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/usr/tucu/input-data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/tucu/input-data</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>
</map-reduce>
<ok to="myNextAction"/>
<error to="errorCleanup"/>
</action>
...
</workflow-app>
In the above example, the number of Reducers to be used by the Map/Reduce job has to be specified as a parameter of the workflow job configuration when creating the workflow job.
Streaming Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="firstjob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="${output}"/>
</prepare>
<streaming>
<mapper>/bin/bash testarchive/bin/mapper.sh testfile</mapper>
<reducer>/bin/bash testarchive/bin/reducer.sh</reducer>
</streaming>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${output}</value>
</property>
<property>
<name>stream.num.map.output.key.fields</name>
<value>3</value>
</property>
</configuration>
<file>/users/blabla/testfile.sh#testfile</file>
<archive>/users/blabla/testarchive.jar#testarchive</archive>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
Pipes Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="firstjob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="${output}"/>
</prepare>
<pipes>
<program>bin/wordcount-simple#wordcount-simple</program>
</pipes>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${output}</value>
</property>
</configuration>
<archive>/users/blabla/testarchive.jar#testarchive</archive>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
A map-reduce action can be configured to perform file system cleanup and directory creation before starting the map reduce job. This capability enables Oozie to retry a Hadoop job in the situation of a transient failure (Hadoop checks the non-existence of the job output directory and then creates it when the Hadoop job is starting, thus a retry without cleanup of the job output directory would fail).
The workflow job will wait until the Hadoop map/reduce job completes before continuing to the next action in the workflow execution path.
The counters of the Hadoop job and job exit status (=FAILED=, KILLED or SUCCEEDED ) must be available to the workflow job after the Hadoop jobs ends. This information can be used from within decision nodes and other actions configurations.
The map-reduce action has to be configured with all the necessary Hadoop JobConf properties to run the Hadoop map/reduce job.
Hadoop JobConf properties can be specified in a JobConf XML file bundled with the workflow application or they can be indicated inline in the map-reduce action configuration.
The configuration properties are loaded in the following order, streaming , job-xml and configuration , and later values override earlier values.
Streaming and inline property values can be parameterized (templatized) using EL expressions.
The Hadoop mapred.job.tracker and fs.default.name properties must not be present in the job-xml and inline configuration.
3.2.2.1 Adding Files and Archives for the Job
The file , archive elements make available, to map-reduce jobs, files and archives. If the specified path is relative, it is assumed the file or archiver are within the application directory, in the corresponding sub-path. If the path is absolute, the file or archive it is expected in the given absolute path.
Files specified with the file element, will be symbolic links in the home directory of the task.
If a file is a native library (an '.so' or a '.so.#' file), it will be symlinked as and '.so' file in the task running directory, thus available to the task JVM.
To force a symlink for a file on the task running directory, use a '#' followed by the symlink name. For example 'mycat.sh#cat'.
Refer to Hadoop distributed cache documentation for details more details on files and archives.
3.2.2.2 Streaming
Streaming information can be specified in the streaming element.
The mapper and reducer elements are used to specify the executable/script to be used as mapper and reducer.
User defined scripts must be bundled with the workflow application and they must be declared in the files element of the streaming configuration. If the are not declared in the files element of the configuration it is assumed they will be available (and in the command PATH) of the Hadoop slave machines.
Some streaming jobs require Files found on HDFS to be available to the mapper/reducer scripts. This is done using the file and archive elements described in the previous section.
The Mapper/Reducer can be overridden by a mapred.mapper.class or mapred.reducer.class properties in the job-xml file or configuration elements.
3.2.2.3 Pipes
Pipes information can be specified in the pipes element.
A subset of the command line options which can be used while using the Hadoop Pipes Submitter can be specified via elements - map , reduce , inputformat , partitioner , writer , program .
The program element is used to specify the executable/script to be used.
User defined program must be bundled with the workflow application.
Some pipe jobs require Files found on HDFS to be available to the mapper/reducer scripts. This is done using the file and archive elements described in the previous section.
Pipe properties can be overridden by specifying them in the job-xml file or configuration element.
3.2.2.4 Syntax
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
...
<action name="[NODE-NAME]">
<map-reduce>
<job-tracker>[JOB-TRACKER]</job-tracker>
<name-node>[NAME-NODE]</name-node>
<prepare>
<delete path="[PATH]"/>
...
<mkdir path="[PATH]"/>
...
</prepare>
<streaming>
<mapper>[MAPPER-PROCESS]</mapper>
<reducer>[REDUCER-PROCESS]</reducer>
<record-reader>[RECORD-READER-CLASS]</record-reader>
<record-reader-mapping>[NAME=VALUE]</record-reader-mapping>
...
<env>[NAME=VALUE]</env>
...
</streaming>
<!-- Either streaming or pipes can be specified for an action, not both -->
<pipes>
<map>[MAPPER]</map>
<reduce>[REDUCER]</reducer>
<inputformat>[INPUTFORMAT]</inputformat>
<partitioner>[PARTITIONER]</partitioner>
<writer>[OUTPUTFORMAT]</writer>
<program>[EXECUTABLE]</program>
</pipes>
<job-xml>[JOB-XML-FILE]</job-xml>
<configuration>
<property>
<name>[PROPERTY-NAME]</name>
<value>[PROPERTY-VALUE]</value>
</property>
...
</configuration>
<file>[FILE-PATH]</file>
...
<archive>[FILE-PATH]</archive>
...
</map-reduce> <ok to="[NODE-NAME]"/>
<error to="[NODE-NAME]"/>
</action>
...
</workflow-app>
The prepare element, if present, indicates a list of path do delete before starting the job. This should be used exclusively for directory cleanup for the job to be executed. The delete operation will be performed in the fs.default.name filesystem.
The job-xml element, if present, must refer to a Hadoop JobConf job.xml file bundled in the workflow application. The job-xml element is optional and if present it can be only one.
The configuration element, if present, contains JobConf properties for the Hadoop job.
Properties specified in the configuration element override properties specified in the file specified in the job-xml element.
The file element, if present, must specify the target sybolic link for binaries by separating the original file and target with a # (file#target-sym-link). This is not required for libraries.
The mapper and reducer process for streaming jobs, should specify the executable command with URL encoding. e.g. '%' should be replaced by '%25'.
Example:
<workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="myfirstHadoopJob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="hdfs://foo:9000/usr/tucu/output-data"/>
</prepare>
<job-xml>/myfirstjob.xml</job-xml>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>/usr/tucu/input-data</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>/usr/tucu/input-data</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>${firstJobReducers}</value>
</property>
</configuration>
</map-reduce>
<ok to="myNextAction"/>
<error to="errorCleanup"/>
</action>
...
</workflow-app>
In the above example, the number of Reducers to be used by the Map/Reduce job has to be specified as a parameter of the workflow job configuration when creating the workflow job.
Streaming Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="firstjob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="${output}"/>
</prepare>
<streaming>
<mapper>/bin/bash testarchive/bin/mapper.sh testfile</mapper>
<reducer>/bin/bash testarchive/bin/reducer.sh</reducer>
</streaming>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${output}</value>
</property>
<property>
<name>stream.num.map.output.key.fields</name>
<value>3</value>
</property>
</configuration>
<file>/users/blabla/testfile.sh#testfile</file>
<archive>/users/blabla/testarchive.jar#testarchive</archive>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
Pipes Example:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
...
<action name="firstjob">
<map-reduce>
<job-tracker>foo:9001</job-tracker>
<name-node>bar:9000</name-node>
<prepare>
<delete path="${output}"/>
</prepare>
<pipes>
<program>bin/wordcount-simple#wordcount-simple</program>
</pipes>
<configuration>
<property>
<name>mapred.input.dir</name>
<value>${input}</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${output}</value>
</property>
</configuration>
<archive>/users/blabla/testarchive.jar#testarchive</archive>
</map-reduce>
<ok to="end"/>
<error to="kill"/>
</action>
...
</workflow-app>
发表评论
-
oozie-E0902 解决办法
2012-05-28 09:46 1181oozie 版本:3.1.3 使用oozie提交命令,得到如 ... -
oozie-调度器用例
2012-05-18 15:47 965转发:https://github.com/yahoo/ooz ... -
oozie-工作流应用部署
2012-05-18 11:33 11641.创建工作流应用的目录,目录结构 --<wor ... -
oozie-工作流例子
2012-05-17 15:16 1932Fork and Join Example The foll ... -
oozie-工作流行为节点概述
2012-05-17 11:35 1363Action节点是被工作流触发的计算任务 Action基础 ... -
oozie-工作流控制节点详述
2012-05-16 17:57 6329控制节点包括start、end、kill、decision、f ... -
oozie-工作流定义
2012-05-16 17:39 993工作流定义由控制节点及动作节点组成,控制节点包括start、e ... -
oozie-深入探讨工作流
2012-05-16 17:12 1562一个工作流应用主要是靠DAG来协调actions,如M/R、P ... -
oozie 工作流概念
2012-05-16 16:13 1123定义: Action: 一个计算任务(如M/R,Pig,Hi ... -
oozie 介绍
2012-05-16 15:53 2208Oozie 是一个管理hadoop任务的工作流/协调系统 O ...
相关推荐
oozie-4.1源码。github下载的。 oozie-4.1源码。github下载的。
Apache Oozie-5.2.1源码编译包
oozie-5.1.0.tar.gz 编译结果,受限上传大小,文件分3部分
oozie-4.3.1.tar.gz 源码,可以利用该tar包进行编译安装oozie
oozie-5.1.0.tar.gz 编译结果,受限上传大小,文件分3部分
oozie 4.1.0 linux安装包
linux.64 下的 oozie-4.3.0.tar.gz 源码包 解压后编译即可 注意对应的版本 cd bin ./mkdistro.sh -Phadoop-2 -Dhadoop.auth.version=2.8.2 -Ddistcp.version=2.8.2 -Dhadoop.version=2.8.2 -Dsqoop.version=1.4.6 -...
oozie-core
2019-06-25 最新oozie5.0.0.tar.gz基于工作流调度hadoop作业web工具
之前公司需要结合hadoop-2.7.2搭建oozie-4.2.0的时候,一直不知从何下手,官网下的包需要结合hadoop版本进行二次编译,手动编译很多次都一直中断, 这个包结合hadoop-2.7.2进行编译的,希望对需要搭建oozie-4.2.0的...
oozie-5.1.0.tar.gz 编译结果,受限上传大小,文件分3部分
oozie-4.2.0
oozie-client安装npm安装oozie-client得到帮助节点app.js-帮助选项: -s,--save保存参数-undefined,--cluster hdinsight群集名称(期望值)-undefined,--user用户(期望值)-undefined,--pass password(期望值...
[atguigu@hadoop102 oozie-4.0.0-cdh5.3.6]$ tar -zxvf oozie-examples.tar.gz 2)创建工作目录 [atguigu@hadoop102 oozie-4.0.0-cdh5.3.6]$ mkdir oozie-apps/ 3)拷贝任务模板到oozie-apps/目录 [atguigu@hadoop...
http://archive.cloudera.com/cdh5/cdh/5/oozie-4.1.0-cdh5.5.2.tar.gz 文件较大,1.6G 还需要下载ExjJS,这是扩展的JavaScript的UI桌面框架。 必须是2.2版本的,这是官网指定的,已经写死在oozied.sh中。下载地址:...
oozie 提交任务参数传递到下一个任务节点 oozie 提交任务参数传递到下一个任务节点
oozie 权威图书。pdf原版 2015-05-08: First Release Mohammad Kamrul Islam & Aravind Srinivasan
andlaz/hadoop-oozie su oozie -c 'oozie-setup.sh sharelib create -fs hdfs://namenode:8020' 启动Ooozie docker run -d --name oozie -p 0.0.0.0:11000 -p 0.0.0.0:11001:11001 \ andlaz/hadoop-oozie su oozie ...
Oozie是一种框架,它让我们可以把多个Map/Reduce作业组合到一个逻辑工作单元中。
oozie-graphite包含一些有用的粘合剂,用于将操作数据从 oozie 包/协调器和/或 oozie-internal 仪器推送到石墨中。 兼容性 版本 1.0 + 版本 1.1.0 + 如何构建 使用 ,只需使用捆绑和预配置的 gradlew 包装器。 ...