Azkaban 2

Getting started

Azkaban needs 3 different components to run.

  1. There is the Web UI, which is where users will upload, schedule and submit jobs to run.
  2. The Executor Server runs the jobs and may be remote.
  3. A MySQL db is where the projects are uploaded, as well as the state of executing jobs are updated.

1. Setup the DB.

Azkaban uses MySQL to run. You will need to download the mysql jdbc connector jar since Azkaban doesn’t distribute it. Download it here: http://www.mysql.com/downloads/connector/j/

Once the MySQL DB is set up, you will need to create the tables that Azkaban will use.  Extract the azkaban-sql-script archive. Use the *.sql scripts to create the tables.

2. Setup the Web UI

Extract the azkaban-web-server archive to the install directory. Copy the mysql jdbc connector to the ./extlibs directory.

Azkaban uses SSL, and so a keystore will need to be created. Follow the following instructions on how to configure Azkaban jetty for SSL. http://docs.codehaus.org/display/JETTY/How+to+configure+SSL

In the ./conf directory, there are several settings files. The azkaban.properties file is used by azkaban for its general settings. The azkaban-users.xml is used by the XmlUserManager for authentication, and the global.properties are the properties that are passed as shared properties to every workflow and job.

By default, the method of authentication is the azkaban.user.XmlUserManager class, and uses the azkaban-users.xml. By implementing the azkaban.user.UserManager interface, you can override the authentication method.

To use the XmlUserManager, just add a user entry to the xml file. Note that you’ll have to restart to server to pick up new users.

Example.

<azkaban-users>

            <user username="azkaban" password="azkaban" roles="admin" groups="azkaban"/>

            <role name="admin" permissions="ADMIN" />

</azkaban-users>

For hadoop, you will want to start Azkaban with the environtment variable HADOOP_HOME pointing to the hadoop cluster.

The settings that will need to be set are the mysql settings to connect to the mysql db, and the azkaban executor server settings. When setting up the ExecutorServer, the ports must be the same.

The following are the properties you may be able to set.

General Properties

Property

Description

Default

azkaban.name

The name of the azkaban instance that will show up in the UI. Useful if you run more than one Azkaban instance.

Local

azkaban.label

A label to describe the Azkaban instance.

My Local Azkaban

azkaban.color

Hex value that allows you to set a style color for the Azkaban UI.

#FF3601 (red)

web.resource.dir

Sets the directory for the ui’s css and javascript files

src/web

default.timezone

The timezone that will be displayed by Azkaban.

America/Los_Angeles

user.manager.class

The user manager that is used to authenticate a user. The default is an XML user manager, but it can be overwritten to support other authethentication methods, such as JDNI.

azkaban.user.XmlUserManager

mail.sender

The email address that azkaban uses to send emails.

mail.host

The email server host machine

mail.user

The email server user name

mail.password

The email password user name

azkaban.should.proxy

Used by the HDFS browser. Set to true if using Hadoop 1.0+ with security turned on. Will soon be removed.

false

proxy.keytab.location

Used by the HDFS browser. Set to true if using Hadoop 1.0+ with security turned on. Will soon be removed.

proxy.user

The proxy user

Jetty Properties

Property

Description

Default

jetty.maxThreads

Max request threads

25

jetty.ssl.port

The ssl port

8443

jetty.keystore

The keystore file

keystore

jetty.password

The jetty password

password

jetty.keypassword

The keypassword

password

jetty.truststore

The trust store

keystore

jetty.trustpassword

The trust password

password

MySQL Connection Properties

Property

Description

Default

database.type

The database type. Currently, the only database supported is mysql.

mysql

mysql.port

The port to the mysql db

3306

mysql.host

The mysql host

localhost

mysql.database

The mysql database

azkaban2

mysql.user

The mysql user

azkaban

mysql.password

The mysql password

password

mysql.numconnections

The number of connections that Azkaban web client can open to the database

100

Executor Server Properties

Properties

Description

Default

executor.port

The port for the azkaban executor server

12321

executor.host

The host for azkaban executor server

localhost

There are two pieces of Azkaban that must be installed: the web client and the executor server.

3. Setting up the Executor Server

Download the azkaban executor tarball and extract it to the executor install directory.

Just like setting up the Web Server, you will need to copy the mysql jdbc connector jar to the ./extlib directory. You will also need to change the conf/azkaban.properties to set the executor port and the mysql db settings information.

You may need to also set the azkaban proxy if using hadoop 1.0 security.

4. Running everything

Start both the web server and the executor server to run azkaban.

Upgrading Azkaban

Having azkaban as a web server and a executor server in separate processes gives us the ability to roll the upgrade and not shut down long lived jobs.

To do this, install the newer executor server and change the executor port. You’ll need to also update the executor server port for the web client. Restarting the web client should point to the new executor server.

Any running jobs in the old executor should be auto detected by the web client. When the old executor finishes running its flow, it should be safe to shut the executor server down.

Note that the scheduler runs in the web server. If you shut down the web server, you may skip any scheduled jobs during this time.