Azkaban2 is a batch workflow job scheduler. It is the successor of the original Azkaban. Often times there is a need to run a set of jobs and processes in a particular order within a workflow. Azkaban will resolve the ordering through job dependencies and provide an easy to use web user interface to maintain and track your workflows.
Here are a few features:
Azkaban2 is fairly easy to set up, although has more moving pieces than its predecessor. There are twho servers and a database that need to be setup:
There are three download packages for Azkaban: the web server, the executor server and the MySQL setup scripts.
The package download page will be maintained on the github wiki:
https://github.com/azkaban/azkaban2/wiki/Download-Packages
Currently, Azkaban2 only uses MySQL as its data store. Installation of MySQL DB is not covered in this guide.
For various licensing reasons, Azkaban does not distribute the MySQL JDBC connector jar. You can download the jar from this link: http://www.mysql.com/downloads/connector/j/. This jar will be needed for both the web server and the executor server.
In the conf dir, there should be several files:
File | Description |
azkaban.properties | Used by Azkaban for runtime properties |
global.properties | Global static properties that are passed as shared properties to every workflow and job. |
azkaban-users.xml | Used to add users and roles for authentication. This file is not used if the XmLUserManager is not set up to use it. |
Azkaban uses SSL socket connectors, which means a keystore will have to be available. You call follow the steps provided at this link (http://docs.codehaus.org/display/JETTY/How+to+configure+SSL) to create one. Once a keystore file has been created, Azkaban must be given its location and password. Within azkaban.properties, the following properties should be overridden.
jetty.keystore=keystore |
Azkaban uses the UserManager to provide authentication and user roles. By default, Azkaban includes and uses the XmlUserManager which gets username/passwords and roles from the azkaban-users.xml as can be seen in the azkaban.properties file.
The following is an example of the contents of the azkaban-users.xml file.
<azkaban-users> |
It is possible to override the UserManager to use other methods of authentication (i.e. DB, JNDI, LDAP etc) by including your own implementation of the azkaban.user.UserManager interface and changing the user.manager.class property.
To point Azkaban web client to the MySQL instance, you will need to configure the following properties in azkaban.properties.
database.type=mysql |
Currently MySQL is the only data store type supported in Azkaban. So database.type should always be mysql.
The web client will need to know the executor server host and port. Use the following configuration settings in azkaban.properties. The executor.host is not necessary if the executor server is running locally to the web client.
executor.port=12321 executor.host=<url> |
Azkaban was built as a Jetty server. You are able to specify the ports and connection number that Azkaban will use. Please refer to Azkaban Web Server Settings section under Jetty settings.
The color theme and name of the install can be customized as well, and is expecially useful in identifying multiple installed version of Azkaban.
Previous version of Azkaban contained an HDFS browser. This is now optional in newer Azkaban, and can be added as a plugin. It can be grabbed from github azkaban/azkaban-plugins.
To install a viewer plugin, download and extract the plugin into the ./plugin/viewer directory.
More instructions to come.
The bin directory should contain a azkaban-web-start.sh. Use it to start up Azkaban Web Server. Use azkaban-web-shutdown.sh to shut azkaban down.
General Properties
Property | Description | Default |
azkaban.name | The name of the azkaban instance that will show up in the UI. Useful if you run more than one Azkaban instance. | Local |
azkaban.label | A label to describe the Azkaban instance. | My Local Azkaban |
azkaban.color | Hex value that allows you to set a style color for the Azkaban UI. | #FF3601 (red) |
web.resource.dir | Sets the directory for the ui’s css and javascript files | src/web |
default.timezone | The timezone that will be displayed by Azkaban. | America/Los_Angeles |
user.manager.class | The user manager that is used to authenticate a user. The default is an XML user manager, but it can be overwritten to support other authethentication methods, such as JDNI. | azkaban.user.XmlUserManager |
mail.sender | The email address that azkaban uses to send emails. | |
mail.host | The email server host machine | |
mail.user | The email server user name | |
mail.password | The email password user name | |
azkaban.should.proxy | Used by the HDFS browser. Set to true if using Hadoop 1.0+ with security turned on. Will soon be removed. | false |
proxy.keytab.location | Used by the HDFS browser. Set to true if using Hadoop 1.0+ with security turned on. Will soon be removed. | |
proxy.user | The proxy user | |
viewer.plugin.dir | Directory where viewer plugins will be installed | plugins/viewer |
Jetty Properties
Property | Description | Default |
jetty.maxThreads | Max request threads | 25 |
jetty.ssl.port | The ssl port | 8443 |
jetty.keystore | The keystore file | keystore |
jetty.password | The jetty password | password |
jetty.keypassword | The keypassword | password |
jetty.truststore | The trust store | keystore |
jetty.trustpassword | The trust password | password |
MySQL Connection Properties
Property | Description | Default |
database.type | The database type. Currently, the only database supported is mysql. | mysql |
mysql.port | The port to the mysql db | 3306 |
mysql.host | The mysql host | localhost |
mysql.database | The mysql database | azkaban2 |
mysql.user | The mysql user | azkaban |
mysql.password | The mysql password | password |
mysql.numconnections | The number of connections that Azkaban web client can open to the database | 100 |
Executor Server Properties
Properties | Description | Default |
executor.port | The port for the azkaban executor server | 12321 |
executor.host | The host for azkaban executor server | localhost |
In the conf dir, there should be several files:
File | Description |
azkaban.properties | Used by Azkaban for runtime properties |
global.properties | Global static properties that are passed as shared properties to every workflow and job. |
Much like the Web Server, both the port and the database needs to be set up.
Azkaban executor server will need to be set up to connect to the MySQL instance. You will need to configure the following properties in azkaban.properties.
database.type=mysql |
Currently MySQL is the only data store type supported in Azkaban. So database.type should always be mysql.
The executor port will need. Use the following configuration settings in azkaban.properties.
executor.maxThreads=50 executor.port=12321 |
MySQL Connection Properties
Property | Description | Default |
database.type | The database type. Currently, the only database supported is mysql. | mysql |
mysql.port | The port to the mysql db | 3306 |
mysql.host | The mysql host | localhost |
mysql.database | The mysql database | azkaban2 |
mysql.user | The mysql user | azkaban |
mysql.password | The mysql password | password |
mysql.numconnections | The number of connections that Azkaban web client can open to the database | 100 |
Executor Server Properties
Properties | Description | Default |
executor.port | The port for the azkaban executor server | 12321 |
executor.maxThreads | The maximum number flows that are accepted by the executor | 50 |
azkaban.jobtype.plugin.dir | Dire | plugins/jobtypes |
Azkaban Executor can run command, javaprocess type jobs. Other jobs types can be added by dropping them into the jobtypes plugin directory. Common job types like ‘java’ type and ‘pig’ type can be downloaded from github azkaban/azkaban-plugins.
Upgrading Azkaban
Having azkaban as a web server and a executor server in separate processes gives us the ability to roll the upgrade and not shut down long lived jobs.
To do this, install the newer executor server and change the executor port. You’ll need to also update the executor server port for the web client. Restarting the web client should point to the new executor server.
Any running jobs in the old executor should be auto detected by the web client. When the old executor finishes running its flow, it should be safe to shut the executor server down.
Note that the scheduler runs in the web server. If you shut down the web server, you may skip any scheduled jobs during this time.