oozie概述:oozie能干什么
oozie格式:怎么用oozie
oozie执行:怎么运行oozie
oozie概述:
oozie是基于hadoop的调度器,以xml的形式写调度流程,可以调度mr,pig,hive,shell,jar等等。
主要的功能有
Workflow: 顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)
Coordinator,定时触发workflow
Bundle Job,绑定多个coordinator
oozie格式:
写一个oozie,有两个是必要的:job.properties 和 workflow.xml(coordinator.xml,bundle.xml)
一、job.properties里定义环境变量
nameNode | hdfs://xxx5:8020 | hdfs地址 |
jobTracker | xxx5:8034 | jobTracker地址 |
queueName | default | oozie队列 |
examplesRoot | examples | 全局目录 |
oozie.usr.system.libpath | true | 是否加载用户lib库 |
oozie.libpath | share/lib/user | 用户lib库 |
oozie.wf.appication.path | ${nameNode}/user/${user.name}/... | oozie流程所在hdfs地址 |
注意:
workflow:oozie.wf.application.path
coordinator:oozie.coord.application.path
bundle:oozie.bundle.application.path
二、XML
1.workflow:
- <workflow-app xmlns="uri:oozie:workflow:0.2" name="wf-example1">
- <start to="pig-node">
- <action name="pig-node">
- <pig>
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <prepare>
- <delete path="hdfs://xxx5/user/hadoop/appresult" />
- </prepare>
- <configuration>
- <property>
- <name>mapred.job.queue.name</name>
- <value>default</value>
- <property>
- <property>
- <name>mapred.compress.map.output</name>
- <value>true</value>
- <property>
- <property>
- <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>
- <value>false</value>
- <property>
- </configuration>
- <script>test.pig</script>
- <param>filepath=${filpath}</param>
- </pig>
- <ok to="end">
- <error to="fail">
- </action>
- <kill name="fail">
- <message>
- Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
- </message>
- </kill>
- <end name="end"/>
- </workflow-app>
2.coordinator
- <coordinator-app name="cron-coord" frequence="${coord:hours(6)}" start="${start}" end="${end}"
- timezoe="UTC" xmlns="uri:oozie:coordinator:0.2">
- <action>
- <workflow>
- <app-path>${nameNode}/user/{$coord:user()}/${examplesRoot}/wpath</app-path>
- <configuration>
- <property>
- <name>jobTracker</name>
- <value>${jobTracker}</value>
- </property>
- <property>
- <name>nameNode</name>
- <value>${nameNode}</value>
- </property>
- <property>
- <name>queueName</name>
- <value>${queueName}</value>
- </property>
- </configuration>
- </workflow>
- </action>
注意:coordinator设置的UTC,比北京时间晚8个小时,所以你要是把期望执行时间减8小时
coordinator里面传值给workflow,example,时间设置为亚洲
- <coordinator-app name="gwk-hour-log-coord" frequency="${coord:hours(1)}" start="${hourStart}" end="${hourEnd}" timezone="Asia/Shanghai"
- xmlns="uri:oozie:coordinator:0.2">
- <action>
- <workflow>
- <app-path>${workflowHourLogAppUri}/gwk-workflow.xml</app-path>
- <configuration>
- <property>
- <name>yyyymmddhh</name>
- <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(),-1,'HOUR'), 'yyyyMMddHH')}</value>
- </property>
- </configuration>
- </workflow>
- </action>
- </coordinator-app>
3.bundle
- <bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>
- <controls>
- <kick-off-time>${kickOffTime}</kick-off-time>
- </controls>
- <coordinator name='coordJobFromBundle1' >
- <app-path>${appPath}</app-path>
- <configuration>
- <property>
- <name>startTime1</name>
- <value>${START_TIME}</value>
- </property>
- <property>
- <name>endTime1</name>
- <value>${END_TIME}</value>
- </property>
- </configuration>
- </coordinator>
- <coordinator name='coordJobFromBundle2' >
- <app-path>${appPath2}</app-path>
- <configuration>
- <property>
- <name>startTime2</name>
- <value>${START_TIME2}</value>
- </property>
- <property>
- <name>endTime2</name>
- <value>${END_TIME2}</value>
- </property>
- </configuration>
- </coordinator>
- </bundle-app>
oozie hive
- <action name="hive-app">
- <hive xmlns="uri:oozie:hive-action:0.2">
- <job-tracker>${jobTracker}</job-tracker>
- <name-node>${nameNode}</name-node>
- <job-xml>hive-site.xml</job-xml>
- <script>hivescript.q</script>
- <param>yyyymmdd=${yyyymmdd}</param>
- <param>yesterday=${yesterday}</param>
- <param>lastmonth=${lastmonth}</param>
- </hive>
- <ok to="result-stat-join"/>
- <error to="fail"/>
- </action>
oozie运行
启动任务:
- oozie job -oozie http://xxx5:11000/oozie -config job.properties -run
停止任务:
oozie job -oozie http://localhost:8080/oozie -kill 14-20090525161321-oozie-joe
注意:在停止任务的时候,有的时候会出现全线问题,需要修改oozie-site.xml文件
hadoop.proxyuser.oozie.groups *
hadoop.proxyuser.oozie.hosts *
oozie.server.ProxyUserServer.proxyuser.hadoop.hosts *
oozie.server.ProxyUserServer.proxyuser.hadoop.groups *
http://blackproof.iteye.com/blog/1928122
相关推荐
大数据技术之Oozie入门到精通
oozie 入门 oozie概述:oozie能干什么 oozie格式:怎么用oozie oozie执行:怎么运行oozie
从零开始讲解大数据调度系统构成,集成大数据计算任务构建大数据工作流,基于Oozie构建实现企业级自动化任务开发 课程亮点 1,知识体系完备,从小白到大神各阶段读者均能学有所获。 2,生动形象,化繁为简,讲解...
Hadoop oozie报错:Table ‘oozie.VALIDATE_CONN’ doesn’t exist 。 oozie建表sql,直接下载执行
Oozie 部署 1 1、Maven 安装 1 3、整合Oozie和Hadoop 8 Oozie提交MapReduce任务事例 8 1上传lib和wf到hdfs上 8 2修改job.properties文件 8 3:修改workflow.xml 9 4:执行oozie客户端命令执行mapreduce 10 配置oozie...
oozie调用hive介绍,在华为大数据平台下通过oozie调用hive,并解决认证问题
oozie配置mysql所需表结构。Apache Oozie是用于Hadoop平台的一种工作流调度引擎。
oozie 提交任务参数传递到下一个任务节点 oozie 提交任务参数传递到下一个任务节点
CDH安装oozie后不能看console,必须把ext依赖放入oozie的libex
oozie-core
extjs2.2 oozie可以使用的 oozie需要使用extjs2.2
hadoop oozie启动或运行报错:Table ‘oozie.VALIDATE_CONN’ doesn’t exist
oozie-3.0.0-distro.tar.gz
oozie mysql数据库表结构。文档里是oozie mysql的建表语句
oozie配置文件
oozie介绍及使用详解
大数据Oozie架构原理.pdf
oozie-4.2.0
Apache Oozie Essentials starts off with the basics right from installing and configuring Oozie from source code on your Hadoop cluster to managing your complex clusters. You will learn how to create ...
oozie工具使用