azkaban-aplcache

validate flow trigger definition when parsing the flow trigger …

2/6/2018 5:05:25 PM

section in flow yaml file (#1630)

The following validation steps are performed against flow trigger when parsing flow yaml file:

if specified max wait min must >= 1, if not, allowed max wait min(10 days) is applied.
flow trigger schedule cannot be null
flow trigger schedule type must be cron
flow trigger schedule value must be a valid cron expression
flow trigger schedule section contains type and value only
dependency name must be unique at flow trigger level
dependency config(type + params) must be unique at flow trigger level.
if max wait min >= allowed max wait min, it will be automatically set to allowed one(10 days)
dependency name/type are required.

Cheng Ren

Commit: 74b78a8

Tree: c8cda12

Parents: 809cc63

fix JDBC test failure reported in #1614 (#1628) This PR is …

2/6/2018 4:33:41 PM

mainly a refactor, in order to fix JDBC intermittent test failure reported in #1614. Looks like JdbcTriggerImplTest uses a deprecated class to construct h2 test db. The fix is to replace it with the one which all XXXDaoTest currently are depending on.

Tested by ./gradlew :azkaban-common:test

Liang Tang

Commit: 809cc63

Tree: acc3024

Parents: 2c5f72c

flow trigger service (#1627) This PR added flow trigger service, …

2/6/2018 3:06:33 AM

a singleton class in the AZ web server to process all flow trigger-related operations. Externally it provides following operations -

Create a trigger instance based on trigger definition.
Cancel a trigger instance.
Query running and historic trigger instances.
Recover incomplete trigger instances.
Internally, it

maintains the list of running trigger instance in memory.
updates status, starttime/endtime of trigger instance.
persists trigger instance to DB.
FlowTriggerService will be leveraged by Quartz scheduler, our new AZ scheduler to schedule starting triggers.

Flow Trigger Service internally contains a single-threaded executor service to process all trigger instance related operation to avoid complicated cases of race condition.

Cheng Ren

Commit: 2c5f72c

Tree: 5da7838

Parents: 9ed7aa8

Clarify the ELK instruction logging message (#1621) People …

1/31/2018 10:44:39 PM

always get confused when they read

>See logs at: https://blahblahXXX://

This PR adds another log message before this message, in order to instruct users how to set job ELK logger.

Liang Tang

Commit: 9ed7aa8

Tree: 37e1750

Parents: bb3607b

dependency type plugin manager (#1620) This PR consists …

1/31/2018 9:15:44 PM

of the plugin management module for flow trigger dependency type.

We implemented data dependency interface and keep interface code inside main Azkaban open source repo and allow both internal and OSS developers to implement the interface for their own dependency type as plugin.

Plugin management module loads plugin property files from plugin directory and created corresponding DependencyCheck objects based on dependency plugin properties upon plugin management initialization or upon request(when doing plugin deployment).

Cheng Ren

Commit: bb3607b

Tree: 598895f

Parents: 3867292

Trigger Instance and dependency instance status update processor …

1/29/2018 10:24:23 PM

3.41.0

(#1619)

This PR added two classes to handle status change of trigger instance and dependency instance:
DependencyInstanceProcess/TriggerInstanceProcessor, handling the work post status update. E.x, execute a flow when trigger instance becomes successful, shoot an email when trigger instance is cancelled, persist status update into database.

Cheng Ren

Commit: 3867292

Tree: 536045a

Parents: eae1f9d

flow trigger JDBC loader (#1615) The PR added JDBC loader …

1/25/2018 10:34:38 PM

for flow trigger. The loader provides all DB related operations for flow trigger instance and dependency instance(#1611) to interact with dependency instance table(#1612).

The PR also added toString(), hashCode() and equals() to several existing classes for complementary purpose.

Cheng Ren

Commit: eae1f9d

Tree: c184f42

Parents: 8c6872c

mysql table for dependency instance (#1612) added sql script …

1/23/2018 10:01:23 PM

to create table to keep execution of all dependency instances. There will be no table specifically for trigger instance since trigger instance is nothing but the collection of dependency instances. So trigger instance can be easily constructed from dependency instances. The table will be used to keep the historic executions for users to query in the UI and to recover the running triggers from web server restart.

Cheng Ren

Commit: 8c6872c

Tree: bd1d65c

Parents: afb269d

flow trigger/dependency instance (#1611) PR added two core …

1/23/2018 9:18:32 PM

classes for flow trigger feature: TriggerInstance and DependencyInstance.

Conceptually trigger Instance represents an execution of a flow trigger. It holds execution context such as a historically unique trigger execution id, execution status, start/end time, and a list of dependency instances.

Similarly dependency Instance denotes an execution of a dependency performing availability check on a particular data dependency. It holds execution context such as execution status and start/end time, all dependency Instances within the same trigger instance share the same execution id as trigger instance.

A trigger instance and its dependency instances will be created on scheduled trigger start time. The trigger instance will wait until all dependency instances are available, at which point a new flow instance is created, or the maximum allowed wait time is exceeded. Once successfully created, trigger instance will equipped with an UUID to identify itself, initial status as running, and a list of dependency instances whose status are running as well.

Trigger instance can be cancelled manually by user, by timeout when max waiting time is exceeded, or by internal dependency failure.

Trigger/Dependency instance has following status(Running, Cancelling, Cancelled, Succeeded).
Since trigger instance is nothing but a collection of dependency instances, it doesn’t have variables to keep status, start or endtime, all of which can be quickly inferred from its belonging dependencies’ status, start/endtime, which unburdens us from maintaining extra variables.

Cancellation cause will be attached to dependency instance for better user experience when cancelled for whatever reason.

Valid cancellation cause includes:
TIMEOUT, // cancellation is issued due to exceeding max wait time
MANUAL, // cancellation is issued by user
FAILURE, // cancellation is issued by internal dependency instance failure
CASCADING // cancelled by cascading failure(peer dependency is cancelled)

Cheng Ren

Commit: afb269d

Tree: d7a19a0

Parents: e89e7ad

Add debug logging for upload project issue. (#1613)

1/23/2018 7:59:56 PM

jamiesjc

Commit: e89e7ad

Tree: dd612eb

Parents: e3aafc6