Oozie分布式任务的工作流——邮件篇

简介:

在大数据的当下,各种spark和hadoop的框架层出不穷。各种高端的计算框架,分布式任务如乱花般迷眼。你是否有这种困惑!——有了许多的分布式任务,但是每天需要固定时间跑任务,自己写个调度,既不稳定,又没有可靠的通知。

想要了解Oozie的基础知识,可以参考这里

那么你应该是在找——Oozie。

Oozie是一款支持分布式任务调度的开源框架,它支持很多的分布式任务,比如map reduce,spark,sqoop,pig甚至shell等等。你可以以各种方式调度它们,把它们组成工作流。每个工作流节点可以串行也可以并行执行。

如果你定义好了一系列的任务,就可以开启工作流,设置一个coordinator调度器进行定时的调度了。

有了这些工作以后,还需要一个很重要的环节—— 就是邮件提醒。不管是任务执行成功还是失败,都可以发送邮件提醒。这样每天晚上收到任务成功的消息,就可以安心睡觉了。

因此,本篇就带你来看看如何在Oozie中使用Email。

邮箱服务

Email Action

在Oozie中每个工作流的环节都被设计成一个Action,email就是其中的一个Action.

Email action可以在oozie中发送信息,在email action中必须指定接收的地址,主题subject和内容body。在接收地址参数中支持使用逗号分隔,添加多个邮箱地址。

email action是同步执行的,因此必须等到邮件发出后,这个action才算完成,才能执行下一个action。

email action里面的所有参数都可以使用EL表达式。

语法规则

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="[NODE-NAME]">
        <email xmlns="uri:oozie:email-action:0.2">
            <to>[COMMA-SEPARATED-TO-ADDRESSES]</to>
            <cc>[COMMA-SEPARATED-CC-ADDRESSES]</cc> <!-- cc is optional -->
            <subject>[SUBJECT]</subject>
            <body>[BODY]</body>
            <content_type>[CONTENT-TYPE]</content_type> <!-- content_type is optional -->
            <attachment>[COMMA-SEPARATED-HDFS-FILE-PATHS]</attachment> <!-- attachment is optional -->
        </email>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>

to和cc命令指定了谁来接收邮件。可以通过逗号分隔来指定多个邮箱地址。to是必填项,cc是可选的。

主题subject和正文body用于指定邮件的标题和正文,email-action:0.2支持text/html这种格式的正文,默认是普通的文本"text/plain"

attachment用于在邮件中添加一个hdfs文件的附件,也可以通过逗号分隔符指定多个附件。如果路径声明的不全,那么也会被当做hdfs中的文件。本地文件是不能添加到附件中的。

配置

email action需要在oozie-site.xml中配置SMTP服务器配置。下面是需要配置的值:

oozie.email.smtp.host

这个值是SMTP服务器的地址,默认是loalhost

oozie.email.smtp.port

是SMTP服务器的端口号,默认是25.

oozie.email.from.address

发送邮件的地址,默认是oozie@localhost

oozie.email.smtp.auth

是否开启认证,默认不开启

oozie.email.smtp.username

如果开启认证,登录的用户名,默认是空

oozie.email.smtp.password

如果开启认证,用户对应的密码,默认是空

PS. 在linux可以通过find -name oozie-site.xml在当前目录下查找。在我们的CDH版本中这个文件在./etc/oozie/conf.dist/oozie-site.xml

遇到的问题

很多人会遇到无法发邮件的问题,首先是要开启SMTP服务,查看是否开启可以使用telnet localhost 25

另外,如果使用的是企业邮箱,需要注意发件人的格式,必须符合企业邮箱的设置。并且收件人只能是企业邮箱的地址。

在Cloudera Mnager中的配置如下图:

样例

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="an-email">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>bob@initech.com,the.other.bob@initech.com</to>
            <cc>will@initech.com</cc>
            <subject>Email notifications for ${wf:id()}</subject>
            <body>The wf ${wf:id()} successfully completed.</body>
        </email>
        <ok to="myotherjob"/>
        <error to="errorcleanup"/>
    </action>
    ...
</workflow-app>

上面的例子中,邮件发给了bob,the.other.bob以及抄送给will,并指定了邮件的标题和正文以及workflow的id。

附录

为了更多的了解Oozie,这里直接给出了Oozie相关的重要配置

oozie-site.xml配置

<?xml version="1.0"?>
<configuration>
    <!--oozie-default.xml文件是默认的配置-->
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
    </property>
</configuration>

oozie-defualt.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<configuration>

    <!-- ************************** VERY IMPORTANT  ************************** -->
    <!-- This file is in the Oozie configuration directory only for reference. -->
    <!-- It is not loaded by Oozie, Oozie uses its own privatecopy.            -->
    <!-- ************************** VERY IMPORTANT  ************************** -->

    <property>
        <name>oozie.output.compression.codec</name>
        <value>gz</value>
        <description>
            The name of the compression codec to use.
            where codec class implements the interface org.apache.oozie.compression.CompressionCodec.
            If oozie.compression.codecs is not specified, gz codec implementation is used by default.
        </description>
    </property>

    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>false</value>
        <description>
            which specify the oozie.mapreduce.uber.jar configuration property will fail.
        </description>
    </property>

    <property>
        <name>oozie.processing.timezone</name>
        <value>UTC</value>
        <description>
            is changed, note that GMT(+/-)#### timezones do not observe DST changes.
        </description>
    </property>

    <!-- Base Oozie URL: <SCHEME>://<HOST>:<PORT>/<CONTEXT> -->

    <property>
        <name>oozie.base.url</name>
        <value>http://localhost:8080/oozie</value>
        <description>
             Base Oozie URL.
        </description>
    </property>

    <!-- Services -->

    <property>
        <name>oozie.system.id</name>
        <value>oozie-${user.name}</value>
        <description>
            The Oozie system ID.
        </description>
    </property>

    <property>
        <name>oozie.systemmode</name>
        <value>NORMAL</value>
        <description>
            System mode for  Oozie at startup.
        </description>
    </property>

    <property>
        <name>oozie.delete.runtime.dir.on.shutdown</name>
        <value>true</value>
        <description>
            If the runtime directory should be kept after Oozie shutdowns down.
        </description>
    </property>

    <property>
        <name>oozie.services</name>
        <value>
            org.apache.oozie.service.SchedulerService,
            org.apache.oozie.service.InstrumentationService,
            org.apache.oozie.service.MemoryLocksService,
            org.apache.oozie.service.UUIDService,
            org.apache.oozie.service.ELService,
            org.apache.oozie.service.AuthorizationService,
            org.apache.oozie.service.UserGroupInformationService,
            org.apache.oozie.service.HadoopAccessorService,
/email
            IMPORTANT: if the StoreServicePasswordService is active, it will reset this value with the
value given in
                       the console.
        </description>
    </property>

    <property>
        <name>oozie.service.JPAService.pool.max.active.conn</name>
        <value>10</value>
        <description>
             Max number of connections.
        </description>
    </property>

   <!-- SchemaService -->

    <property>
        <name>oozie.service.SchemaService.wf.schemas</name>
        <value>
            oozie-workflow-0.1.xsd,oozie-workflow-0.2.xsd,oozie-workflow-0.2.5.xsd,oozie-workflow-0.3.x
sd,oozie-workflow-0.4.xsd,
            oozie-workflow-0.4.5.xsd,oozie-workflow-0.5.xsd,
            shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,
            email-action-0.1.xsd,email-action-0.2.xsd,
            hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,hive-action
-0.6.xsd,
            sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,
            ssh-action-0.1.xsd,ssh-action-0.2.xsd,
            distcp-action-0.1.xsd,distcp-action-0.2.xsd,
            oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,
            hive2-action-0.1.xsd, hive2-action-0.2.xsd,
            spark-action-0.1.xsd,spark-action-0.2.xsd
        </value>
        <description>
            List of schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.wf.ext.schemas</name>
        <value> </value>
        <description>
            List of additional schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.coord.schemas</name>
/email
        <description>
             Base console URL for a workflow job.
        </description>
    </property>


    <!-- ActionService -->

    <property>
        <name>oozie.service.ActionService.executor.classes</name>
        <value>
            org.apache.oozie.action.decision.DecisionActionExecutor,
            org.apache.oozie.action.hadoop.JavaActionExecutor,
            org.apache.oozie.action.hadoop.FsActionExecutor,
            org.apache.oozie.action.hadoop.MapReduceActionExecutor,
            org.apache.oozie.action.hadoop.PigActionExecutor,
            org.apache.oozie.action.hadoop.HiveActionExecutor,
            org.apache.oozie.action.hadoop.ShellActionExecutor,
            org.apache.oozie.action.hadoop.SqoopActionExecutor,
            org.apache.oozie.action.hadoop.DistcpActionExecutor,
            org.apache.oozie.action.hadoop.Hive2ActionExecutor,
            org.apache.oozie.action.ssh.SshActionExecutor,
            org.apache.oozie.action.oozie.SubWorkflowActionExecutor,
            org.apache.oozie.action.email.EmailActionExecutor,
            org.apache.oozie.action.hadoop.SparkActionExecutor
        </value>
        <description>
            List of ActionExecutors classes (separated by commas).
            Only action types with associated executors can be used in workflows.
        </description>
    </property>

    <property>
        <name>oozie.service.ActionService.executor.ext.classes</name>
        <value> </value>
        <description>
            List of ActionExecutors extension classes (separated by commas). Only action types with ass
ociated
            executors can be used in workflows. This property is a convenience property to add extensio
ns to the built
            in executors without having to include all the built in ones.
        </description>
    </property>

    <!-- ActionCheckerService -->

    <property>
        <name>oozie.service.ActionCheckerService.action.check.interval</name>
/email
        <description>
            Comma separated AUTHORITY=SPARK_CONF_DIR, where AUTHORITY is the HOST:PORT of
            the ResourceManager of a YARN cluster. The wildcard '*' configuration is
            used when there is no exact match for an authority. The SPARK_CONF_DIR contains
            the relevant spark-defaults.conf properties file. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute.  This is only used
            when the Spark master is set to either "yarn-client" or "yarn-cluster".
        </description>
    </property>

    <property>
        <name>oozie.service.SparkConfigurationService.spark.configurations.ignore.spark.yarn.jar</name>
        <value>true</value>
        <description>
            If true, Oozie will ignore the "spark.yarn.jar" property from any Spark configurations spec
ified in
            oozie.service.SparkConfigurationService.spark.configurations.  If false, Oozie will not ign
ore it.  It is recommended
            to leave this as true because it can interfere with the jars in the Spark sharelib.
        </description>
    </property>

    <property>
        <name>oozie.email.attachment.enabled</name>
        <value>true</value>
        <description>
            This value determines whether to support email attachment of a file on HDFS.
            Set it false if there is any security concern.
        </description>
    </property>

    <property>
        <name>oozie.actions.default.name-node</name>
        <value> </value>
        <description>
            The default value to use for the &lt;name-node&gt; element in applicable action types.  Thi
s value will be used when
            neither the action itself nor the global section specifies a &lt;name-node&gt;.  As expecte
d, it should be of the form
            "hdfs://HOST:PORT".
        </description>
    </property>

    <property>
        <name>oozie.actions.default.job-tracker</name>
        <value> </value>
        <description>
@                                                                                                      
search hit BOTTOM, continuing at TOP
            IMPORTANT: if the StoreServicePasswordService is active, it will reset this value with the
value given in
                       the console.
        </description>
    </property>

    <property>
        <name>oozie.service.JPAService.pool.max.active.conn</name>
        <value>10</value>
        <description>
             Max number of connections.
        </description>
    </property>

   <!-- SchemaService -->

    <property>
        <name>oozie.service.SchemaService.wf.schemas</name>
        <value>
            oozie-workflow-0.1.xsd,oozie-workflow-0.2.xsd,oozie-workflow-0.2.5.xsd,oozie-workflow-0.3.x
sd,oozie-workflow-0.4.xsd,
            oozie-workflow-0.4.5.xsd,oozie-workflow-0.5.xsd,
            shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,
            email-action-0.1.xsd,email-action-0.2.xsd,
            hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,hive-action
-0.6.xsd,
            sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,
            ssh-action-0.1.xsd,ssh-action-0.2.xsd,
            distcp-action-0.1.xsd,distcp-action-0.2.xsd,
            oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,
            hive2-action-0.1.xsd, hive2-action-0.2.xsd,
            spark-action-0.1.xsd,spark-action-0.2.xsd
        </value>
        <description>
            List of schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.wf.ext.schemas</name>
        <value> </value>
        <description>
            List of additional schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.coord.schemas</name>
/email
        <description>
             Base console URL for a workflow job.
        </description>
    </property>


    <!-- ActionService -->

    <property>
        <name>oozie.service.ActionService.executor.classes</name>
        <value>
            org.apache.oozie.action.decision.DecisionActionExecutor,
            org.apache.oozie.action.hadoop.JavaActionExecutor,
            org.apache.oozie.action.hadoop.FsActionExecutor,
            org.apache.oozie.action.hadoop.MapReduceActionExecutor,
            org.apache.oozie.action.hadoop.PigActionExecutor,
            org.apache.oozie.action.hadoop.HiveActionExecutor,
            org.apache.oozie.action.hadoop.ShellActionExecutor,
            org.apache.oozie.action.hadoop.SqoopActionExecutor,
            org.apache.oozie.action.hadoop.DistcpActionExecutor,
            org.apache.oozie.action.hadoop.Hive2ActionExecutor,
            org.apache.oozie.action.ssh.SshActionExecutor,
            org.apache.oozie.action.oozie.SubWorkflowActionExecutor,
            org.apache.oozie.action.email.EmailActionExecutor,
            org.apache.oozie.action.hadoop.SparkActionExecutor
        </value>
        <description>
            List of ActionExecutors classes (separated by commas).
            Only action types with associated executors can be used in workflows.
        </description>
    </property>

    <property>
        <name>oozie.service.ActionService.executor.ext.classes</name>
        <value> </value>
        <description>
            List of ActionExecutors extension classes (separated by commas). Only action types with ass
ociated
            executors can be used in workflows. This property is a convenience property to add extensio
ns to the built
            in executors without having to include all the built in ones.
        </description>
    </property>

    <!-- ActionCheckerService -->

    <property>
        <name>oozie.service.ActionCheckerService.action.check.interval</name>
/email
        <description>
            used when there is no exact match for an authority. The SPARK_CONF_DIR contains
            the relevant spark-defaults.conf properties file. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute.  This is only used
            when the Spark master is set to either "yarn-client" or "yarn-cluster".
        </description>
    </property>

    <property>
        <name>oozie.service.SparkConfigurationService.spark.configurations.ignore.spark.yarn.jar</name>
        <value>true</value>
        <description>
            If true, Oozie will ignore the "spark.yarn.jar" property from any Spark configurations spec
ified in
            oozie.service.SparkConfigurationService.spark.configurations.  If false, Oozie will not ign
ore it.  It is recommended
            to leave this as true because it can interfere with the jars in the Spark sharelib.
        </description>
    </property>

    <property>
        <name>oozie.email.attachment.enabled</name>
        <value>true</value>
        <description>
            This value determines whether to support email attachment of a file on HDFS.
            Set it false if there is any security concern.
        </description>
    </property>

    <property>
        <name>oozie.actions.default.name-node</name>
        <value> </value>
        <description>
            The default value to use for the &lt;name-node&gt; element in applicable action types.  Thi
s value will be used when
            neither the action itself nor the global section specifies a &lt;name-node&gt;.  As expecte
d, it should be of the form
            "hdfs://HOST:PORT".
        </description>
    </property>

    <property>
        <name>oozie.actions.default.job-tracker</name>
        <value> </value>
        <description>
            The default value to use for the &lt;job-tracker&gt; element in applicable action types.  T
his value will be used when
            neither the action itself nor the global section specifies a &lt;job-tracker&gt;.  As expec
ted, it should be of the form
            "HOST:PORT".
        </description>
    </property>

</configuration>
本文转自博客园xingoo的博客,原文链接:Oozie分布式任务的工作流——邮件篇,如需转载请自行联系原博主。
相关文章
|
7月前
|
缓存 算法 NoSQL
【分布式详解】一致性算法、全局唯一ID、分布式锁、分布式事务、 分布式缓存、分布式任务、分布式会话
分布式系统通过副本控制协议,使得从系统外部读取系统内部各个副本的数据在一定的约束条件下相同,称之为副本一致性(consistency)。副本一致性是针对分布式系统而言的,不是针对某一个副本而言。强一致性(strong consistency):任何时刻任何用户或节点都可以读到最近一次成功更新的副本数据。强一致性是程度最高的一致性要求,也是实践中最难以实现的一致性。单调一致性(monotonic consistency):任何时刻,任何用户一旦读到某个数据在某次更新后的值,这个用户不会再读到比这个值更旧的值。
676 0
|
1月前
|
消息中间件 监控 数据可视化
Apache Airflow 开源最顶级的分布式工作流平台
Apache Airflow 是一个用于创作、调度和监控工作流的平台,通过将工作流定义为代码,实现更好的可维护性和协作性。Airflow 使用有向无环图(DAG)定义任务,支持动态生成、扩展和优雅的管道设计。其丰富的命令行工具和用户界面使得任务管理和监控更加便捷。适用于静态和缓慢变化的工作流,常用于数据处理。
Apache Airflow 开源最顶级的分布式工作流平台
|
1月前
|
存储 NoSQL Java
Java调度任务如何使用分布式锁保证相同任务在一个周期里只执行一次?
【10月更文挑战第29天】Java调度任务如何使用分布式锁保证相同任务在一个周期里只执行一次?
99 1
|
5月前
|
消息中间件 NoSQL Java
使用Java实现分布式任务调度器
使用Java实现分布式任务调度器
|
2月前
|
分布式计算 资源调度 Hadoop
Hadoop-05-Hadoop集群 集群WordCount 超详细 真正的分布式计算 上传HDFS MapReduce计算 YRAN查看任务 上传计算下载查看
Hadoop-05-Hadoop集群 集群WordCount 超详细 真正的分布式计算 上传HDFS MapReduce计算 YRAN查看任务 上传计算下载查看
57 1
|
4月前
|
资源调度 Java 调度
项目环境测试问题之Schedulerx2.0通过分布式分片任务解决单机计算瓶颈如何解决
项目环境测试问题之Schedulerx2.0通过分布式分片任务解决单机计算瓶颈如何解决
项目环境测试问题之Schedulerx2.0通过分布式分片任务解决单机计算瓶颈如何解决
|
4月前
|
人工智能 监控 虚拟化
操作系统的演变:从单任务到多任务,再到并发和分布式
随着计算技术的发展,操作系统经历了从简单的单任务处理到复杂的多任务、并发处理,再到现代的分布式系统的转变。本文将探索这一演变过程中的关键里程碑,以及它们如何塑造我们今天使用的计算机系统的架构和性能。
|
5月前
|
人工智能 分布式计算 物联网
操作系统的演变:从单任务到多任务再到并发和分布式
在数字时代的浪潮中,操作系统作为计算机硬件与应用程序之间的桥梁,其发展史是一部技术革新与需求演进的史诗。本文将带领读者穿梭于操作系统的时空隧道,从早期简单而原始的单任务系统出发,一路见证它如何逐步进化为支持多任务、并发执行乃至分布式计算的复杂系统。我们将一探究竟,是什么推动了这些转变,它们又是如何影响我们日常的技术实践与生活的。
69 1
|
5月前
|
Web App开发 物联网 Unix
操作系统的演变:从单任务到多任务再到并发与分布式
本文旨在探讨操作系统的发展历程,着重分析其从处理单一任务的原始阶段,经历多任务处理能力的增强,直至支持并发计算和分布式架构的现代转型。我们将追溯关键时间节点,审视技术创新如何塑造了今日操作系统的复杂性与多样性,并预测未来可能的发展趋势。
|
7月前
|
存储 Kubernetes Cloud Native
云原生离线工作流编排利器 -- 分布式工作流 Argo 集群
云原生离线工作流编排利器 -- 分布式工作流 Argo 集群
105256 2

热门文章

最新文章