azkaban-aplcache

TriggerManager needs to persist updated trigger when updating …

8/10/2017 8:04:13 PM

trigger (#1328)

Previously TriggerManager won't persist updated schedule when schedule gets updated(e.x. add new SLA rule to a schedule). It gets persisted only when schedule gets triggered. So if we add a SLA rule to a schedule or do a reschedule, and restart the web server after it, these updates will be lost.

Cheng Ren

Commit: 2b7b009

Tree: a80d99e

Parents: f8b1a0e

apply save action plugin on TriggerManager (#1327)

8/10/2017 7:27:55 PM

Cheng Ren

Commit: f8b1a0e

Tree: 8a235d2

Parents: 0525185

New Status API for obtaining runtime information about Azkaban …

8/10/2017 4:52:15 PM

instance. (#1320)

Current features:
- version
- used Memory
- Max memory: XMX value
- Database availability: SELECT 1 check
- Map of current executors

Example:
```
{ "version": "3.33.0-11-g1ac90f03", "installationPath": "/full/path/to/azkaban-solo-server/lib/azkaban-web-server-3.33.0-11-g1ac90f03.jar", "usedMemory": 65144664, "xmx": 3817865216, "isDatabaseUp": true, "executorStatusMap": { "104": { "id": 104, "host": "abc.def.company.com", "port": 15462, "isActive": true } }
}
```

Suvodeep Pyne

Commit: 0525185

Tree: ec40120

Parents: cc25e39

make job retry log more clear (#1321) previously if a job retries, …

8/9/2017 9:40:47 PM

log looks like this:
Starting job 1 attempt 1 at 1502324648910

it makes people misinterpret this as the 1st attempt of running the job, this PR changes the message:
Starting job 1 retry 1 at 1502324648910

Cheng Ren

Commit: cc25e39

Tree: e852f30

Parents: 85e04d6

Running SaveAction on ProcessJob (#1319) Working on a bug …

8/9/2017 5:13:59 PM

related to this class and I don't want to make a PR with both changes, so sending off a quick PR for just running the SaveAction command on the ProcessJob class first.

Charlie Summers

Commit: 85e04d6

Tree: 6b9eaad

Parents: 96ddff6

Refactoring Web Server and related code. (#1314) The motivation …

8/9/2017 1:01:32 AM

is to make it easy to add more routes to the web server. A lot of static code exists which makes it difficult to access dependent classes directly.

Refactor details: - `prepareAndStartServer` and related methods were converted to instance method - deprecating the `app` member which is a static reference to `AzkabanWebServer`. Not removing it because it may be used by downstream code.

Suvodeep Pyne

Commit: 96ddff6

Tree: b7f2c92

Parents: c9cf64d

Extracted loadPluginCheckerAndActions method to PluginCheckerAndActionsLoader …

8/8/2017 11:20:23 PM

class (#1316)

No logic has changed

Suvodeep Pyne

Commit: c9cf64d

Tree: 4c48094

Parents: 452da49

Refactoring loadTriggerPlugins() method to `TriggerPluginLoader` …

8/8/2017 11:08:38 PM

class (#1317)

No Logic change introduced.

Suvodeep Pyne

Commit: 452da49

Tree: 733dd2d

Parents: 4c29af2

Refactoring the `ServerProvider` logic into separate methods …

8/8/2017 10:32:46 PM

(#1315)

Refactoring logic into separate smaller methods. The only logic change is a different logging statement compared to the previous one.

Suvodeep Pyne

Commit: 4c29af2

Tree: 7f034c2

Parents: 79f1f99

Fix for failing to kill a job due to a race condition (#1310) * …

8/8/2017 7:37:00 PM

Fix for failing to kill a job due to a race condition

Problem:

After a flow starts, kill the flow quickly.
The flow and the job will show as killed. However the job actually runs
to completion.

Analysis:

Jobrunner thread runs: azkaban.jobExecutor.utils.process
.AzkabanProcess#run

A jetty thread processes the kill command:

azkaban.jobExecutor.ProcessJob#cancel
azkaban.jobExecutor.utils.process.AzkabanProcess#softKill
azkaban.jobExecutor.utils.process.AzkabanProcess#checkStarted
here it throws exception, because the job process has not been created
yet at this point.

This exception is caught and ignored.
Users are informed that the kill action is completed and reflected in
the job and flow pages.

Fix:

Synchronize the killing thread and the jobrunner thread.
The jetty thread will wait for the job process to be created if needed
before killing the job process.

If the jetty thread cancels the job before the job runner thread checks the killed flag, simply set the flag and allow the job runner thread to abort itself. ====

This is a fix built on the idea proposed in #1289
The previous fix #1253 for the same race condition is not complete.

* Fix a typo in comments

HappyRay

Commit: 79f1f99

Tree: cdbb446

Parents: c64cc26