azkaban-aplcache

Changes

docs/getStarted.rst 332(+329 -3)

Details

docs/getStarted.rst 332(+329 -3)

diff --git a/docs/getStarted.rst b/docs/getStarted.rst
index 27d1627..5b2e285 100644
--- a/docs/getStarted.rst
+++ b/docs/getStarted.rst
@@ -67,7 +67,7 @@ Follow these steps to get started:
 2. Build Azkaban and create an installation package:
 ::
   cd azkaban; ./gradlew build installDist
-3. Start the solo server: 
+3. Start the solo server:
 ::
   cd azkaban-solo-server/build/install/azkaban-solo-server; bin/azkaban-solo-start.sh
 Azkaban solo server should be all set, by listening to ``8081`` port at default to accept incoming network request. So, open a web browser and check out ``http://localhost:8081/``
@@ -172,10 +172,10 @@ We suggest users to opt for **Mysql** as Azkaban database, because we build up a
 
 - Create the Azkaban Tables
 
-  Run individual table creation scripts from `latest table statements <https://github.com/azkaban/azkaban/tree/master/azkaban-db/src/main/sql>`_ on the MySQL instance to create your tables. 
+  Run individual table creation scripts from `latest table statements <https://github.com/azkaban/azkaban/tree/master/azkaban-db/src/main/sql>`_ on the MySQL instance to create your tables.
 
   Alternatively, run create-all-sql-<version>.sql generated by build process. The location is the file is at ``/Users/latang/LNKDRepos/azkaban/azkaban-db/build/distributions/azkaban-db-<version>``, after you build `azkaban-db` module by ::
- 
+
     cd azkaban-db; ../gradlew build installDist
 
 Installing Azkaban Executor Server
@@ -261,7 +261,333 @@ Then run ::
 Then, a multi-executor Azkaban instance is ready for use. Open a web browser and check out ``http://localhost:8081/``
 You are all set to login to Azkaban UI.
 
+*****
+Set up Azkaban Plugins
+*****
+
+Azkaban is designed to make non-core functionalities plugin-based, so
+that
+
+#. they can be selectively installed/upgraded in different environments
+   without changing the core Azkaban, and
+#. it makes Azkaban very easy to be extended for different systems.
+
+Right now, Azkaban allows for a number of different plugins. On web
+server side, there are
+
+-  viewer plugins that enable custom web pages to add features to
+   Azkaban. Some of the known implementations include HDFS filesystem
+   viewer, and Reportal.
+-  trigger plugins that enable custom triggering methods.
+-  user manager plugin that enables custom user authentication methods.
+   For instance, in LinkedIn we have LDAP based user authentication.
+-  alerter plugins that enable different alerting methods to users, in
+   addition to email based alerting.
+
+On executor server side
+
+-  pluggable job type executors on AzkabanExecutorServer, such as job
+   types for hadoop ecosystem components.
+
+We recommend installing these plugins for the best usage of Azkaban.
+Below are instructions of how to install these plugins to work with
+Azkaban.
+
+User Manager Plugins
+########
+
+By default, Azkaban ships with the XMLUserManager class which
+authenticates users based on a xml file, which is located at
+``conf/azkaban-users.xml``.
+
+This is not secure and doesn't serve many users. In real production
+deployment, you should rely on your own user manager class that suits
+your need, such as a LDAP based one. The ``XMLUserManager`` can still be
+used for special user accounts and managing user roles. You can find
+examples of these two cases in the default ``azkaban-users.xml`` file.
+
+To install your own user manager class, specify in
+``Azkaban-web-server-install-dir/conf/azkaban.properties``:
+
+::
+
+   user.manager.class=MyUserManagerClass
+
+and put the containing jar in ``plugins`` directory.
+
+Viewer Plugins
+########
+
+HDFS Viewer Plugin
+**********************
+
+HDFS Viewer Plugin should be installed in AzkabanWebServer plugins
+directory, which is specified in AzkabanWebServer's config file, for
+example, in ``Azkaban-web-server-install-dir/conf/azkaban.properties``:
+::
+
+   viewer.plugins=hdfs
+
+This tells Azkaban to load hdfs viewer plugin from
+``Azkaban-web-server-install-dir/plugins/viewer/hdfs``.
+
+Extract the ``azkaban-hdfs-viewer`` archive to the AzkabanWebServer
+``./plugins/viewer`` directory. Rename the directory to ``hdfs``, as
+specified above.
+
+Depending on if the hadoop installation is turned on:
+
+#. If the Hadoop installation does not have security turned on, the
+   default config is good enough. One can simply restart
+   ``AzkabanWebServer`` and start using the HDFS viewer.
+#. If the Hadoop installation does have security turned on, the
+   following configs should be set differently than their default
+   values, in plugin's config file:
+
++-----------------------------------+-----------------------------------+
+| Parameter                         | Description                       |
++===================================+===================================+
+| ``azkaban.should.proxy``          | Whether Azkaban should proxy as   |
+|                                   | another user to view the hdfs     |
+|                                   | filesystem, rather than Azkaban   |
+|                                   | itself, defaults to ``true``      |
++-----------------------------------+-----------------------------------+
+| ``hadoop.security.manager.class`` | The security manager to be used,  |
+|                                   | which handles talking to secure   |
+|                                   | hadoop cluster, defaults to       |
+|                                   | ``azkaban.security.HadoopSecurity |
+|                                   | Manager_H_1_0``                   |
+|                                   | (for hadoop 1.x versions)         |
++-----------------------------------+-----------------------------------+
+| ``proxy.user``                    | The Azkaban user configured with  |
+|                                   | kerberos and hadoop. Similar to   |
+|                                   | how oozie should be configured,   |
+|                                   | for secure hadoop installations   |
++-----------------------------------+-----------------------------------+
+| ``proxy.keytab.location``         | The location of the keytab file   |
+|                                   | with which Azkaban can            |
+|                                   | authenticate with Kerberos for    |
+|                                   | the specified ``proxy.user``      |
++-----------------------------------+-----------------------------------+
+
+For more Hadoop security related information, see :ref:`hadoopsecuritymanager`.
+
+Job Type Plugins
+########
+
+Azkaban has a limited set of built-in job types to run local unix
+commands and simple java programs. In most cases, you will want to
+install additional job type plugins, for example, hadoopJava, Pig, Hive,
+VoldemortBuildAndPush, etc. Some of the common ones are included in
+azkaban-jobtype archive. Here is how to install:
+
+Job type plugins should be installed with AzkabanExecutorServer's
+plugins directory, and specified in AzkabanExecutorServer's config file.
+For example, in
+``Azkaban-exec-server-install-dir/conf/azkaban.properties``:
+
+::
+
+   azkaban.jobtype.plugin.dir=plugins/jobtypes
+
+This tells Azkaban to load all job types from
+``Azkaban-exec-server-install-dir/plugins/jobtypes``. Extract the
+archive into AzkabanExecutorServer ``./plugins/`` directory, rename it
+to ``jobtypes`` as specified above.
+
+The following setting is often needed when you run Hadoop Jobs:
+
++-----------------------------------+-----------------------------------+
+| Parameter                         | Description                       |
++===================================+===================================+
+| ``hadoop.home``                   | Your ``$HADOOP_HOME`` setting.    |
++-----------------------------------+-----------------------------------+
+| ``jobtype.global.classpath``      | The cluster specific hadoop       |
+|                                   | resources, such as hadoop-core    |
+|                                   | jar, and hadoop conf (e.g.        |
+|                                   | ``${hadoop.home}/hadoop-core-1.0. |
+|                                   | 4.jar,${hadoop.home}/conf``)      |
++-----------------------------------+-----------------------------------+
+
+Depending on if the hadoop installation is turned on:
+
+-  If the hadoop installation does not have security turned on, you can
+   likely rely on the default settings.
+-  If the Hadoop installation does have kerberos authentication turned
+   on, you need to fill out the following hadoop settings:
+
++-----------------------------------+-----------------------------------+
+| Parameter                         | Description                       |
++===================================+===================================+
+| ``hadoop.security.manager.class`` | The security manager to be used,  |
+|                                   | which handles talking to secure   |
+|                                   | hadoop cluster, defaults to       |
+|                                   | ``azkaban.security.HadoopSecurity |
+|                                   | Manager_H_1_0``                   |
+|                                   | (for hadoop 1.x versions)         |
++-----------------------------------+-----------------------------------+
+| ``proxy.user``                    | The Azkaban user configured with  |
+|                                   | kerberos and hadoop. Similar to   |
+|                                   | how oozie should be configured,   |
+|                                   | for secure hadoop installations   |
++-----------------------------------+-----------------------------------+
+| ``proxy.keytab.location``         | The location of the keytab file   |
+|                                   | with which Azkaban can            |
+|                                   | authenticate with Kerberos for    |
+|                                   | the specified proxy.user          |
++-----------------------------------+-----------------------------------+
+
+For more Hadoop security related information, see :ref:`hadoopsecuritymanager`.
+
+Finally, start the executor, watch for error messages and check executor
+server log. For job type plugins, the executor should do minimum testing
+and let you know if it is properly installed.
+
+--------------
+
+*****
+Property Overrides
+*****
+
+Azkaban job is specified with a set of key-value pairs we call
+properties. There are multiple sources for deciding which properties
+will finally be a part of job execution. Following table lists out all
+the sources of properties and their priorities. Please note that if a
+property occur in multiple sources, then its value from high property
+source will be used
+
+Following properties are visible to the users. These are the same
+properties which are merged to form ``jobProps`` in
+``AbstractProcessJob.java``
+
++-----------------------+-----------------------+-----------------------+
+| PropertySource        | Description           | Priority              |
++=======================+=======================+=======================+
+| ``global.properties`` | These are admin       | Lowest (0)            |
+| in ``conf`` directory | configured properties |                       |
+|                       | during Azkaban setup. |                       |
+|                       | Global to all         |                       |
+|                       | jobtypes.             |                       |
++-----------------------+-----------------------+-----------------------+
+| ``common.properties`` | These are admin       | 1                     |
+| in ``jobtype``        | configured properties |                       |
+| directory             | during Azkaban setup. |                       |
+|                       | Global to all         |                       |
+|                       | jobtypes.             |                       |
++-----------------------+-----------------------+-----------------------+
+|``plugin.properties``  | These are admin       | 2                     |
+| in                    | configured properties |                       |
+| ``jobtype/<jobtype-na | during Azkaban setup. |                       |
+| me>``                 | Restricted to a       |                       |
+| directory             | specific jobtype.     |                       |
++-----------------------+-----------------------+-----------------------+
+| ``common.properties`` | These are user        | 3                     |
+| in project zip        | specified property    |                       |
+|                       | which apply to all    |                       |
+|                       | jobs in sibling or    |                       |
+|                       | descendent            |                       |
+|                       | directories           |                       |
++-----------------------+-----------------------+-----------------------+
+| Flow properties       | These are user        | 4                     |
+| specified while       | specified property.   |                       |
+| triggering flow       | These can be          |                       |
+| execution             | specified from UI or  |                       |
+|                       | Ajax call but cannot  |                       |
+|                       | be saved in project   |                       |
+|                       | zip.                  |                       |
++-----------------------+-----------------------+-----------------------+
+| ``{job-name}.job``    | These are user        | Highest (5)           |
+| job specification     | specified property in |                       |
+|                       | actual job file       |                       |
++-----------------------+-----------------------+-----------------------+
+
+Following properties are not visible to the users. Depending on jobtype
+implementation these properties are used for constraining user jobs and
+properties. These are the same properties which are merged to form
+``sysProps`` in ``AbstractProcessJob.java``
+
++-----------------------+-----------------------+-----------------------+
+| PropertySource        | Description           | Priority              |
++=======================+=======================+=======================+
+| commonprivate.prope   | These are admin       | Lowest (0)            |
+|rties                  | configured properties |                       |
+| in jobtype            | during Azkaban setup. |                       |
+| directory             | Global to all         |                       |
+|                       | jobtypes.             |                       |
++-----------------------+-----------------------+-----------------------+
+| private.properties    | These are admin       | Highest (1)           |
+|                       | configured properties |                       |
+| in                    | during Azkaban setup. |                       |
+|  jobtype/{jobtype-na  | Restricted to a       |                       |
+|me}                    | specific jobtype.     |                       |
+| directory             |                       |                       |
++-----------------------+-----------------------+-----------------------+
+
+``azkaban.properties`` is another type of properties which are only used
+for controlling Azkaban webserver and execserver configuration. Please
+note that ``jobProps``, ``sysProps`` and ``azkaban.properties`` are 3
+different types of properties and are not merged in general (depends on
+jobtype implementation).
+
+--------------
+
+*****
+Upgrading DB from 2.1
+*****
+
+If installing Azkaban from scratch, you can ignore this document. This
+is only for those who are upgrading from 2.1 to 2.5.
+
+The ``update_2.1_to_3.0.sql`` needs to be run to alter all the tables.
+This includes several table alterations and a new table.
+
+Here are the changes:
+
+-  Alter project_properties table'
+
+   -  Modify 'name' column to be 255 characters
+
+-  Create new table triggers
+
+Importing Existing Schedules from 2.1
+########
+
+In 3.0, the scheduling system is merged into the new triggering system.
+The information will be persisted in ``triggers`` table in DB. We have a
+simple tool to import your existing schedules into this new table.
+
+After you download and install web server, please run this command
+**once** from web server install directory:
+
+::
+
+   $ bash bin/schedule2trigger.sh
+
+--------------
+
+*****
+Upgrading DB from 2.7.0
+*****
+
+If installing Azkaban from scratch, you can ignore this document. This
+is only for those who are upgrading from 2.7 to 3.0.
+
+The ``create.executors.sql``, ``update.active_executing_flows.3.0.sql``,
+``update.execution_flows.3.0.sql``, and ``create.executor_events.sql``
+needs to be run to alter all the tables. This includes several table
+alterations and two new table.
+
+Here are the changes:
+
+-  Alter active_executing_flows table'
+
+   -  Deleting 'port' column
+   -  Deleting 'host' column
 
+-  Alter execution_flows table'
 
+   -  Adding an 'executor_id' column
 
+-  Create new executors table
+-  Create new executor events table