azkaban-developers

<< < 1 2 3 4 5 > >>

Remove unused method fetchActiveFlowByExecId (#1832)

6/29/2018 6:05:29 PM

Juho Autio

Commit: f8f47f8

Tree: 3c1792e

Parents: 4a847bb

Remove the single end node restriction for a flow. (#1821)

6/28/2018 2:02:01 PM

Jamie Sun

Commit: 4a847bb

Tree: d1dd1c7

Parents: 7efe67b

when trigger is still running, endtime should show "-" (#1814) Before …

6/22/2018 10:00:47 PM

endtime was showing current time when trigger instance is running.

Cheng Ren

Commit: 7efe67b

Tree: f457cc0

Parents: 21dc5e9

fix node level computation of a flow (#1794) * fix node level …

6/21/2018 5:24:38 PM

computation of a flow

level++ returns the level variable BEFORE it is incremented. This means
the level of a node is never incremented and is always zero.

Hence, we need to increment the value by one before passing it to the
function.

Fixes #1793.

* followup: add tests for node level computation

* followup: refactor test according to review comments

Sami Jaktholm

Commit: 21dc5e9

Tree: ec0fe34

Parents: 41a1e35

Make testCancelAfterJobProcessCreation more reliable (#1810) Issue: The …

6/20/2018 9:15:46 PM

test failed today on trunk

azkaban.jobExecutor.ProcessJobTest > testCancelAfterJobProcessCreation FAILED org.junit.ComparisonFailure: expected:<[fals]e> but was:<[tru]e> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at azkaban.jobExecutor.ProcessJobTest.testCancelAfterJobProcessCreation(ProcessJobTest.java:228)

Cause:

I suspect it is because of the travis CI machine being too busy to cancel
the job before it finishes.

Solution:

Increase the running time of the job from 1 second to 5 seconds.
This will not increase the test running time when the test is successful
since the job will be canceled as soon as possible.
However it will increase the test running time if the cancel logic
doesn't interrupt the job as expected.

Also removed the printout of the stacktrace from the job. This information
doesn't deliver much value and only makes the log noisier.

HappyRay

Commit: 41a1e35

Tree: 3aa03c9

Parents: fb2e72e

adding text to indicate that columns are sortable (#1795) As …

6/20/2018 9:00:02 PM

mentioned in issue #1778, having a tooltip on the table would let the user know that it can be sorted when the header is clicked. This pull request will add the tooltip on scheduling page and history page which are being mostly used.

Sample screenshot:
image

This is a duplicate PR (#1780) but from a different branch, I moved these changes from master->sort-text in my repo.

Ryan

Commit: fb2e72e

Tree: 168cf86

Parents: e9dfd50

address feedback from #1803 (#1812) see comments from #1803: …

6/20/2018 5:33:56 PM

3.48.0

change configurable key name
change default value for max cache size to 128GB which is a more standard number.
change default value for stop cleanup threshold to 60 to allow more space to be freed.
change data type of stop cleanup threshold from double to int.
rename the parameter name(projectDirMaxSizeInMB - > projectDirMaxSizeInMb) to align with coding standard.

Cheng Ren

Commit: e9dfd50

Tree: 412df6e

Parents: f0e4a6d

Clean shared project directory when disk usage is too high (#1803) …

6/19/2018 6:16:35 PM

This PR implements LRU project purging to prevent project files eating up disk space on executor.

We will encapsulate following logic inside the background cleanup thread:
if(disk space consumed by projects >= predefined threshold) List all project files and sort them by creation time in ascending order Iterate over the project file list, delete the file if the project is not running until disk size of projects drops down below a predefined lower threshold or no more projects to delete.

Why using creation time not last access time:
Directory last access time is not maintained by most file systems so we cannot rely on file system API to get last access time. An alternative considered is to modify the project dir every time the code reads it, then last modification time would be equivalent of last access time. The associate cons are code complexity and overhead of disk IO. We will use last creation time as the indicator of oldness of the project. Although creation time might not be as indicative as last access time to determine the hotness of a project, we are still ok with using it based on the assumption that the older the project is, more likely it’s not being used.

This PR also changes execution dir retention from 1 day to 2 hours. Since execution dir is hard linked to project dir, so disk space will be released only when no reference to project dir exists. That's why we want to shorten the execution dir retention time so that disk pressure can be alleviated sooner.

Project cleanup is executed by active executor only to prevent ensure only one thread will be performing deletion. Currently it relies on FlowRunnerManager#isExecutorActive to determine whether itself is active. But isExecutorActive is reliable only when azkaban admin uses activate API to active the executor. If admin manually updates the executors table in database and call reloadExecutors API, then the flag won't be set which inactive executor to perform deletion. So improvement need to be done here to make sure isExecutorActive is authentic.

Cheng Ren

Commit: f0e4a6d

Tree: 4807f49

Parents: 484b3d7

Check duplicate names in DagBuilder before adding the nodeBuilder …

6/19/2018 4:49:46 PM

(#1808)

This is the response from a comment in #1759

HappyRay

Commit: 484b3d7

Tree: 37d87c7

Parents: bff791b

ExecutorManager refactor (#1804) 1. Simplify logic 2. …

6/15/2018 6:07:47 PM

Change activeExecutors from HashSet to ImmutableSet to guarantee thread safety.

Cheng Ren

Commit: bff791b

Tree: a24eefc

Parents: 5ef0b39