createFlows.rst

135 lines | 3.914 kB Blame History Raw Download
.. _CreatingFlows:

Creating Flows
=============

This section covers how to create your Azkaban flows using Azkaban Flow 2.0.
Flow 1.0 will be deprecated in the future.


*****
Flow 2.0 Basics
*****

########
Using Azkaban Flow 2.0 can be simple and straightforward. Just start with below severals steps:

Step 1:
######## 
Create a simple file called ``flow20.project``. Add ``azkaban-flow-version`` to indicate this is a Flow 2.0 Azkaban project:
::
  azkaban-flow-version: 2.0
 
Step 2:
######## 
Create another file called ``basic.flow``. Add a section called ``nodes``, which will contain all the jobs you want to run. You need to specify ``name`` and ``type`` for all the jobs. Most jobs will require the ``config`` section as well. We will talk more about it later. Below is a simple example of a command job.
:: 
  nodes:
    - name: jobA
      type: command
      config:
        command: echo "This is an echoed text."


Step 3:
########
Select the two files you've already created and right click to compress them into a zip file called ``Archive.zip``. You can also create a new directory with these two files and then ``cd`` into the new directory and compress: ``zip -r Archive.zip .`` Please do not zip the new directory directly.

Make sure you have already created a project on Azkaban ( See  :ref:`createProjects` ).
You can then upload Archive.zip to your project through Web UI ( See :ref:`uploadProjects` ).

Now you can click ``Execute Flow`` to test your first Flow 2.0 Azkaban project!

*****
Job Dependencies
*****
Jobs can have dependencies on each other.
You can use ``dependsOn`` section to list all the parent jobs. In the below example, after jobA and jobB run successfully, jobC will start to run. 
::

  nodes:
    - name: jobC
      type: noop
      # jobC depends on jobA and jobB
      dependsOn:
        - jobA
        - jobB

    - name: jobA
      type: command
      config:
        command: echo "This is an echoed text."

    - name: jobB
      type: command
      config:
        command: pwd

You can zip the new ``basic.flow`` and ``flow20.project`` again and then upload to Azkaban. Try to execute the flow and see the difference.

*****
Job Config
*****
Azkaban supports many job types. You just need to specify it in ``type``, and other job-related info goes to ``config`` section in the format of ``key: value`` pairs. Here is an example for a Pig job:
::
  nodes:
    - name: pigJob
      type: pig
      config:
        pig.script: sql/pig/script.pig

You need to write your own pig script and put it in your project zip and then specify the path for the pig.script in the config section.

*****
Flow Config
*****
Not only can you configure individual jobs, you can also config the flow parameters for the entire flow.
Simply add a ``config`` section at the beginning of the ``basic.flow`` file. For example:
::
  ---
  config:
    user.to.proxy: foo
    failure.emails: noreply@foo.com

  nodes:
    - name: jobA
      type: command
      config:
        command: echo "This is an echoed text."

When you execute the flow, the ``user.to.proxy`` and ``failure.emails`` flow parameters will apply to all jobs inside the flow.

*****
Embedded Flows
*****
Flows can have subflows inside the flow just like job nodes. To create embedded flows, specify the type of the node as ``flow``. For example:
::
  nodes:
    - name: embedded_flow
      type: flow
      config:
        prop: value
      nodes:
        - name: jobB 
          type: noop
          dependsOn:
            - jobA

        - name: jobA
          type: command
          config:
            command: pwd

*****
Download Examples
*****
You can download the simple Flow 2.0 project zip examples to start playing with Azkaban:

* `basicFlow20Project.zip <https://github.com/azkaban/azkaban/blob/master/az-examples/flow20-projects/basicFlow20Project.zip>`_

* `embeddedFlow20Project.zip <https://github.com/azkaban/azkaban/blob/master/az-examples/flow20-projects/embeddedFlow20Project.zip>`_