azkaban-aplcache

Move flowSetupTimer from CommonMetrics to ExecMetrics (#2138) * …

3/5/2019 11:36:59 PM

Move flowSetupTimer from CommonMetrics to ExecMetrics

* indentation

* code review comments: variable names swapped

* shorten wait

* resolve merge conflicts

* modify tests to juse assertj

edwinalu

Commit: 3d13a95

Tree: fc8d38e

Parents: 774cfaa

Optimize gradle build time (#2113) #1231 configured gradle …

3/5/2019 3:47:15 PM

to parallelize test runs. However, in practice enabling parallel tests actually makes the full azkaban test run slower on some machines. There hasn't been any noticeable speed-up on Travis. Hence, changing back to no parallel tests.

Instead of removing entirely, keeping the setting with value 1 & adding a comment to warn about it, so that it will be seen instead of enabling it again in the future without proper validation.

./gradlew cleanTest test took 3m 42s on my machine. That seemed way too much. I was running tests of each module in IDEA and for sure the total time wouldn't be that long. It lead to think that the command-line gradle runner must be doing something that's not optimal.

I knew that idea runs all tests sequentially, so I tried disabling parallelism entirely also for the command-line test runner. It was ~4 times faster:

maxParallelForks = 1
-> ./gradlew cleanTest test 49s

I have 4 cores on my machine so I also gave this a try, but not much help:
maxParallelForks = 4
-> ./gradlew cleanTest test 2m 50s

To make sure it's not just random fluctuation I tested the "fix" once more:
maxParallelForks = 1
-> ./gradlew cleanTest test 1m 4s

And one more go with the original setting:
maxParallelForks = 12
-> ./gradlew cleanTest test 2m 0s

It could be that the forked execution was even slower when my machine had other heavy workloads going on (hence the big variation from 2m to almost 4m).

Maybe the tests interfere with each other.. Maybe the additional forking is just expensive because every test needs to load some heavy classes / static members again.. Whatever the reason is, there's no point in having parallelism on if makes the total build time longer.

Juho Autio

Commit: 774cfaa

Tree: 6a4c4f0

Parents: e37d11c

Project cache hit ratio metrics enhancement (#2137) This …

3/4/2019 11:48:42 PM

PR replaced old project cache hit ratio metrics impl. with a new one which only keep tracks of last 100 cache accesses' hit ratio instead of all past access since Azkaban executor starts. In this way, this metrics is more sensitive to project cache setting change (e.g, cache size is changed on the fly) since hit ratio won't be offset by too much old historical data.

Cheng Ren

Commit: e37d11c

Tree: 08a1583

Parents: ecc7606

block deactivating executor until all flow preparation work …

3/1/2019 9:20:17 PM

3.69.0

completes (#2133)

This PR makes deactivation process block until all flow preparation work finishes.

When deploying new executor, old running executor will be deactivated before new one is activated and only one executor is allowed to delete/hard-linking project directories to avoid race condition(see #2130). So to make deactivation process block until flow preparation work finishes guarantees the old executor won't access FlowPreparer#setup after deactivation.

Cheng Ren

Commit: ecc7606

Tree: ee8f754

Parents: c1f2503

Issue #2135 - Improve Getting Started document page Property …

3/1/2019 1:22:30 PM

Overrides table format (#2136)

taox

Commit: c1f2503

Tree: 26a0707

Parents: 87b4cb3

Implement "useExecutor" feature for new dispatching logic …

2/28/2019 10:16:39 PM

(#2129)

* Implement "useExecutor" feature for new Dispatching Logic (Poll model)

When launching a new execution, an Azkaban admin can choose the executor for it by specifying a useExecutor parameter in the request.
New executions are inserted into execution_flows table in the database. A newly added column to this table, "use_executor", will hold the executor id passed as a parameter in the request. Active executors will poll executions with (use_executor == null or use_executor == pollingExecutorId) Inactive executors will only poll executions with their ids (use_executor == pollingExecutorId)

Yeni Bermudez

Commit: 87b4cb3

Tree: 45accea

Parents: ce8191b

Adding an Ajax call endpoint to check if user has WRITE access …

2/27/2019 3:14:44 AM

to the project or not (#2131)

In this PR changes are done to add an AJAX endpoint which will provide the information if the user have WRITE access to a given project. This API will be useful for **TuneIn** which is a part of **Dr.Elephant**. Currently, TuneIn show many details like suggested parameters for the job, algorithm etc on the job page and the user would be able to modify these parameters. But since these properties are used by Azkaban(one of the supported Workflow managers by Dr.Elephant), so the user must be authorized to change these properties. For this purpose, Dr.Elephant will call this API with **session_id**(provided by Azkaban after successful authentication) and **project_name** as query params.

This API will have Azkaban user session_id and the project name as query params and as a result a Boolean value will be returned determining if the user whose session_id was passed as query param have WRITE access to the project which was also passed as a query param. So to know if some user have Write permission in the respective project the client calling the API must have the user's session_id. In this manner, this API cannot be used by any client to expose other Azkaban users' access to a project.

There are some other existing APIs like `getPermission` and `fetchprojectusers` which provide users and their permissions in a project, but these APIs doesn't provide information about `user who is not owner/user of project but is a part of group which has WRITE or greater permissions` in the project, so eventually this user will also be able to WRITE in the project but we won't be able to determine with these mentioned APIs.

ShubhamGupta29

Commit: ce8191b

Tree: 3b07043

Parents: 41dd197

Refactor and bug fix on Job History page pagination (#2122) Fixes …

2/27/2019 3:07:23 AM

a pagination issue on Job History page where it would show an additional empty page if the last page was full. For example:
-with page size 10 and 30 elements it would show 4 pages with the last one being empty.

Simplifies implementation by delegating common pagination functionality to https://github.com/josecebe/twbs-pagination jQuery pagination plugin instead of doing it manually.

With this change we are still loading the entire page every time a user interacts with the pagination controls. Ultimately we want to create an API endpoint that returns pages as data and that we only need to update the view with the new data, but this will not be done now because we are planning to redesign existing APIs soon.

Yeni Bermudez

Commit: 41dd197

Tree: d9091fb

Parents: 2273b77

Unify jvm memory settings validation & improve upload error …

2/26/2019 11:02:12 PM

message (#2111)

If there's an empty value like `"job.max.Xmx="` in some properties, upload fails with this error:

> Installation Failed.
> For input string: ""

So it doesn't help much because it doesn't even tell the name of the problematic property.

This PR improves it by returning a better error message ie. one that includes the property name.

Currently user would need to grab the stack trace, get azkaban source code, and find the line, to know which property caused the upload failure.

Juho Autio

Commit: 2273b77

Tree: 5eebe20

Parents: 77fcd8f

Add metrics for submission and time in queue (#2128) * Add …

2/26/2019 7:49:48 PM

metrics for submission and time in queue

Add the following new metrics:
- submit flow success, fail and skip
- queue wait time (time between when a flow is submitted, to when an executor starts executing)
- flow setup time (time to setup a flow, before executing).

* Add metrics for submission and time in queue

The time that a flow spends in PREPARING state is queue wait time + flow setup time. These metrics
will help give more insight into how much time is spent in preparing state, and in which phases.

Flow submission is when a user requests a flow to be executed, or when a flow is scheduled to run.
Flow submission will add the flow to the queue. Flow dispatch is when the flow is assigned to an
executor; currently this time also includes the time to setup the flow.

* Add metrics for submission and time in queue

The time that a flow spends in PREPARING state is queue wait time + flow setup time. These metrics
will help give more insight into how much time is spent in preparing state, and in which phases.

* FlowPreparer refactor (#2130)

This PR refactors FlowPreparer in various aspects:

1. Allow multi-threading project download for more concurrent flow preparation.
2. Synchronize on project cache clean-up/create execution directory by hard-linking from project directory to avoid avoid complicated race conditions which could arise when multiple threads are deleting/hard-linking the same project. (Note: it doesn't prevent multiple executor processes interfering with each other triggering race conditions. So it's important to operationally make sure that only one executor process is setting up flow execution against the shared project directory.
3. Move project cache cleaning logic to a separate class for better testability.
4. Move log4j to slf4j.

* Add metrics for submission and time in queue

The time that a flow spends in PREPARING state is queue wait time + flow setup time. These metrics
will help give more insight into how much time is spent in preparing state, and in which phases.

edwinalu

Commit: 77fcd8f

Tree: e8cc3b1

Parents: a75deeb