Azkaban Quickstart Manual

Introduction

Azkaban2 is a batch workflow job scheduler. It is the successor of the original Azkaban. Often times there is a need to run a set of jobs and processes in a particular order within a workflow. Azkaban will resolve the ordering through job dependencies and provide an easy to use web user interface to maintain and track your workflows.

Here are a few features:

  1. Web UI
  2. Easy workflow uploads
  3. Easy to set up job dependencies
  4. Schedule workflows
  5. Authentication/Authorization (permissions on jobs)
  6. Ability to kill and restart workflows
  7. Modular and pluginable
  8. Project workspaces
  9. Logging and auditing of workflow and jobs

Getting Started

Azkaban2 is fairly easy to set up, although has more moving pieces than its predecessor. There are twho servers and a database that need to be setup:

  1. MySQL instance - Azkaban uses MySQL to store projects and executions
  2. Azkaban Web Server - Azkaban Web Server is a Jetty server which acts as the controller as well as the web interface
  3. Azkaban Executor Server - Azkaban Executor Server executes submitted workflow.

Download

There are three download packages for Azkaban: the web server, the executor server and the MySQL setup scripts.

The package download page will be maintained on the github wiki:

https://github.com/azkaban/azkaban2/wiki/Download-Packages

Setting up the DB

Currently, Azkaban2 only uses MySQL as its data store. Installation of MySQL DB is not covered in this guide.

  1. Download the azkaban-sql-script tar. Contained in this archive are table creation scripts.
  2. Run the scripts on the MySQL instance to create your tables.

Getting the JDBC Connector jar

For various licensing reasons, Azkaban does not distribute the MySQL JDBC connector jar. You can download the jar from this link: http://www.mysql.com/downloads/connector/j/. This jar will be needed for both the web server and the executor server.

Setup the Web Server

Download and Install

  1. Download the azkaban-web-server tar. Extract it into your azkaban web install directory.
  2. Copy the jdbc jar into ./extlib directory. Azkaban will automatically look to this directory for external (non-distributed) jars.

In the conf dir, there should be several files:

File

Description

azkaban.properties

Used by Azkaban for runtime properties

global.properties

Global static properties that are passed as shared properties to every workflow and job.

azkaban-users.xml

Used to add users and roles for authentication. This file is not used if the XmLUserManager is not set up to use it.

Setting up SSL

Azkaban uses SSL socket connectors, which means a keystore will have to be available. You call follow the steps provided at this link (http://docs.codehaus.org/display/JETTY/How+to+configure+SSL) to create one. Once a keystore file has been created, Azkaban must be given its location and password. Within azkaban.properties, the following properties should be overridden.

  jetty.keystore=keystore
 jetty.password=password
 jetty.keypassword=password
 jetty.truststore=keystore
 jetty.trustpassword=password

Setting up the UserManager

Azkaban uses the UserManager to provide authentication and user roles. By default, Azkaban includes and uses the XmlUserManager which gets username/passwords and roles from the azkaban-users.xml as can be seen in the azkaban.properties file.

  1. user.manager.class=azkaban.user.XmlUserManager
  2. user.manager.xml.file=conf/azkaban-users.xml

The following is an example of the contents of the azkaban-users.xml file.

<azkaban-users>    
   <user username="azkaban" password="azkaban" roles="admin" groups="azkaban"/>    
    <role name="admin" permissions="ADMIN" />    
</azkaban-user>


It is possible to override the UserManager to use other methods of authentication (i.e. DB, JNDI, LDAP etc) by including your own implementation of the
azkaban.user.UserManager interface and changing the user.manager.class property.

Setting up the DB

To point Azkaban web client to the MySQL instance, you will need to configure the following properties in azkaban.properties.

   database.type=mysql
   mysql.port=3306
   mysql.host=localhost
   mysql.database=azkaban2
   mysql.user=azkaban
   mysql.password=azkaban
   mysql.numconnections=100


Currently MySQL is the only data store type supported in Azkaban. So database.type should always be mysql.

Executor Host and Port

The web client will need to know the executor server host and port. Use the following configuration settings in azkaban.properties. The executor.host is not necessary if the executor server is running locally to the web client.

executor.port=12321

executor.host=<url>

Setting up the Web Client

Azkaban was built as a Jetty server. You are able to specify the ports and connection number that Azkaban will use. Please refer to Azkaban Web Server Settings section under Jetty settings.

The color theme and name of the install can be customized as well, and is expecially useful in identifying multiple installed version of Azkaban.

Installing Viewer Plugins

Previous version of Azkaban contained an HDFS browser. This is now optional in newer Azkaban, and can be added as a plugin. It can be grabbed from github azkaban/azkaban-plugins.

To install a viewer plugin, download and extract the plugin into the ./plugin/viewer directory.

More instructions to come.

Running Web Server

The bin directory should contain a azkaban-web-start.sh. Use it to start up Azkaban Web Server. Use azkaban-web-shutdown.sh to shut azkaban down.

Azkaban Web Server Settings

General Properties

Property

Description

Default

azkaban.name

The name of the azkaban instance that will show up in the UI. Useful if you run more than one Azkaban instance.

Local

azkaban.label

A label to describe the Azkaban instance.

My Local Azkaban

azkaban.color

Hex value that allows you to set a style color for the Azkaban UI.

#FF3601 (red)

web.resource.dir

Sets the directory for the ui’s css and javascript files

src/web

default.timezone

The timezone that will be displayed by Azkaban.

America/Los_Angeles

user.manager.class

The user manager that is used to authenticate a user. The default is an XML user manager, but it can be overwritten to support other authethentication methods, such as JDNI.

azkaban.user.XmlUserManager

mail.sender

The email address that azkaban uses to send emails.

mail.host

The email server host machine

mail.user

The email server user name

mail.password

The email password user name

azkaban.should.proxy

Used by the HDFS browser. Set to true if using Hadoop 1.0+ with security turned on. Will soon be removed.

false

proxy.keytab.location

Used by the HDFS browser. Set to true if using Hadoop 1.0+ with security turned on. Will soon be removed.

proxy.user

The proxy user

viewer.plugin.dir

Directory where viewer plugins will be installed

plugins/viewer

Jetty Properties

Property

Description

Default

jetty.maxThreads

Max request threads

25

jetty.ssl.port

The ssl port

8443

jetty.keystore

The keystore file

keystore

jetty.password

The jetty password

password

jetty.keypassword

The keypassword

password

jetty.truststore

The trust store

keystore

jetty.trustpassword

The trust password

password

MySQL Connection Properties

Property

Description

Default

database.type

The database type. Currently, the only database supported is mysql.

mysql

mysql.port

The port to the mysql db

3306

mysql.host

The mysql host

localhost

mysql.database

The mysql database

azkaban2

mysql.user

The mysql user

azkaban

mysql.password

The mysql password

password

mysql.numconnections

The number of connections that Azkaban web client can open to the database

100

Executor Server Properties

Properties

Description

Default

executor.port

The port for the azkaban executor server

12321

executor.host

The host for azkaban executor server

localhost

Setup the Executor Server

Download and Install

  1. Download the azkaban-executor-server tar. Extract it into your azkaban executor install directory.
  2. Copy the jdbc jar into ./extlib directory. Azkaban will automatically look to this directory for external (non-distributed) jars.

In the conf dir, there should be several files:

File

Description

azkaban.properties

Used by Azkaban for runtime properties

global.properties

Global static properties that are passed as shared properties to every workflow and job.

Much like the Web Server, both the port and the database needs to be set up.

Setting up the DB

Azkaban executor server will need to be set up to connect to the MySQL instance. You will need to configure the following properties in azkaban.properties.

   database.type=mysql
  mysql.port=3306
  mysql.host=localhost
  mysql.database=azkaban2
  mysql.user=azkaban
  mysql.password=azkaban
  mysql.numconnections=100


Currently MySQL is the only data store type supported in Azkaban. So
database.type should always be mysql.

Executor Host and Port

The executor port will need. Use the following configuration settings in azkaban.properties.

executor.maxThreads=50

executor.port=12321

Azkaban Executor Server Settings

MySQL Connection Properties

Property

Description

Default

database.type

The database type. Currently, the only database supported is mysql.

mysql

mysql.port

The port to the mysql db

3306

mysql.host

The mysql host

localhost

mysql.database

The mysql database

azkaban2

mysql.user

The mysql user

azkaban

mysql.password

The mysql password

password

mysql.numconnections

The number of connections that Azkaban web client can open to the database

100

Executor Server Properties

Properties

Description

Default

executor.port

The port for the azkaban executor server

12321

executor.maxThreads

The maximum number flows that are accepted by the executor

50

azkaban.jobtype.plugin.dir

Dire

plugins/jobtypes

Installing JobType Plugins

Azkaban Executor can run command, javaprocess type jobs. Other jobs types can be added by dropping them into the jobtypes plugin directory. Common job types like ‘java’ type and ‘pig’ type can be downloaded from github azkaban/azkaban-plugins.

Upgrading Azkaban

Having azkaban as a web server and a executor server in separate processes gives us the ability to roll the upgrade and not shut down long lived jobs.

To do this, install the newer executor server and change the executor port. You’ll need to also update the executor server port for the web client. Restarting the web client should point to the new executor server.

Any running jobs in the old executor should be auto detected by the web client. When the old executor finishes running its flow, it should be safe to shut the executor server down.

Note that the scheduler runs in the web server. If you shut down the web server, you may skip any scheduled jobs during this time.