azkaban-developers

Test cleanup: FlowRunnerTestBase (#1154) Simplify and …

6/4/2017 5:16:59 PM

remove duplicate code

Juho Autio

Commit: 0aeced8

Tree: e33def2

Parents: e70459e

Fix JobRunner bug when job is cancelled early (#1148) This …

6/4/2017 5:01:01 PM

bug was revealed by JobRunnerTest#testDelayedExecutionCancelledJob:

If job was cancelled early, there was no initial "upload" for the ExecutableNode, just an eventual "update". I believe this would also fail when running against the DB, because there would be an UPDATE without a preceding INSERT.

* Added assert that caught the error before fix in JobRunner

Juho Autio

Commit: e70459e

Tree: 8882bde

Parents: 69c4140

Test stability: FlowRunnerTest2 (#1158) Calling ExecutableFlowBase#isFlowFinished …

6/4/2017 4:56:26 PM

from the test thread wasn't a good idea.

In rare cases it failed with this exception (see #1157 (comment)):

azkaban.execapp.FlowRunnerTest2 > testCancel FAILED
java.lang.NullPointerException
at azkaban.executor.ExecutableFlowBase.isFlowFinished(ExecutableFlowBase.java:398)
at azkaban.execapp.FlowRunnerTestBase.lambda$assertThreadShutDown$0(FlowRunnerTestBase.java:32)
at azkaban.execapp.FlowRunnerTestBase$$Lambda$9/1157486615.apply(Unknown Source)
at azkaban.execapp.FlowRunnerTestBase.waitFlowRunner(FlowRunnerTestBase.java:42)
at azkaban.execapp.FlowRunnerTestBase.assertThreadShutDown(FlowRunnerTestBase.java:31)
at azkaban.execapp.FlowRunnerTest2.testCancel(FlowRunnerTest2.java:729)
Extract from ExecutableFlowBase (this method is actually deleted in this PR): /** * Only returns true if the status of all finished nodes is true. */ public boolean isFlowFinished() { for (final String end : getEndNodes()) { final ExecutableNode node = getExecutableNode(end); // THIS IS WHERE THE NPE HAPPENED, so node was null if (!Status.isStatusFinished(node.getStatus())) { return false; } } return true; }
It must be that the NPE happened because this method was being called from the test main thread, and there is no synchronization on the "end nodes". So it was somehow possible that the test got a name for an end node but then couldn't get the ExecutableNode to match it.

I used isFlowFinished() (it existed previously, I didn't add it) in the test because it seemed like a handy way to check that, but seems that it didn't work reliably. And now I realized that this method is not being used at all otherwise, so it can be just removed.

Now checking flow completion in a thread-safe way.

Also delete unused methods in ExecutableFlowBase:

isFlowFinished
findNextJobsToRun

Juho Autio

Commit: 69c4140

Tree: e8c3f77

Parents: 02d36bc

Make ExecutableNode.status volatile (#1159) This gives …

6/4/2017 4:34:25 PM

guarantees especially for unit tests that need to check for statuses in a multi-threaded environment. But also FlowRunner thread vs. JobRunner threads benefit from this, although I haven't been able to prove that there would be any critical issues there.

Juho Autio

Commit: 02d36bc

Tree: 2f8af42

Parents: 3eb3ddb

Ignore two flaky FlowRunner related tests (#1156) * Ignore …

6/4/2017 3:21:28 PM

three flaky FlowRunner related tests

azkaban.execapp.FlowRunnerTest2.testPauseFailFinishAll
and
azkaban.execapp.FlowRunnerTest2#testRetryOnFailure

testCancelOnFailure

Issue: #1155

* ignore testCancelOnFailure too

see #1157

HappyRay

Commit: 3eb3ddb

Tree: 00a0a52

Parents: 76970d1

Simplify tests: usage of EventCollectorListener#checkEventExists …

6/4/2017 2:29:11 PM

(#1152)

Using hamcrest matcher `is` to compare arrays, because it gives an informative error message, like:

java.lang.AssertionError:
Expected: is [<FLOW_STARTED>, <FLOW_FINISHED>] but: was [<FLOW_STARTED>]

Juho Autio

Commit: 76970d1

Tree: e1a24ba

Parents: 9d038b6

Consolidate the duplicate code into one prepareProject method …

6/4/2017 2:05:25 PM

(#1151)

* Consolidate the duplicate code into one prepareProject method

Issue:

Three test classes in execapp have the same duplicate prepareProject
logic.

Fix:

Consolidated into one method in a shared test utility class.

With this change, the FlowRunnerTest2 has no intellij warnings left
except warnings about typos.

* Update Copyright notices

HappyRay

Commit: 9d038b6

Tree: 6a1b710

Parents: 6d9e0a8

Remove recently finished flows cache (#1093) * Remove recentlyFinishedFlows …

6/4/2017 1:25:57 PM

cache from web server

* Fix one test case failure

* Added unit test for fetching recentlyFinishedFlows

* Modified recently finished lifetime

* Some improvement on recently finished flows cache change.

* Quick fix

* Fix merge and comments

jamiesjc

Commit: 6d9e0a8

Tree: 26f9262

Parents: 1a564dc

Update codestyle setup guide in the README (#1149) * Point …

6/4/2017 11:44:03 AM

to the intellij's guide.
* Users should be able to use import code style features instead.
* Advised against turning on "Remove unused suppress warning
annotation" option in the save actions plugin since it seems to be
unreliable.

HappyRay

Commit: 1a564dc

Tree: 102e4d4

Parents: 1a2055f

Fix bug in MockExecutorLoader (#1116) Fix concurrency bug …

6/3/2017 7:40:55 PM

in MockExecutorLoader: it shouldn't modify the state of actual flows, so make a clone of the flows, too.

Also log job logs in MockExecutorLoader instead of discarding them, so that it's possible to debug failures in tests.

The problem was this (originally revealed when FlowRunnerTest still used this mock class instead of a Mockito mock):

MockExecutorLoader kept a handle to the actual ExecutableFlow instance
MockExecutorLoader.updateExecutableFlow set the job statuses based on the current flow status, which is not synchronized with the job runner threads
So this wrote an old job status over the newer one in ExecutableNode
Stacktrace for that:

job4: setStatus SUCCEEDED -> QUEUED
at azkaban.executor.ExecutableNode.setStatus(ExecutableNode.java:141)
at azkaban.executor.ExecutableNode.applyUpdateObject(ExecutableNode.java:398)
at azkaban.executor.ExecutableFlowBase.applyUpdateObject(ExecutableFlowBase.java:352)
at azkaban.executor.ExecutableFlowBase.applyUpdateObject(ExecutableFlowBase.java:369)
at azkaban.executor.MockExecutorLoader.updateExecutableFlow(MockExecutorLoader.java:120)
at azkaban.execapp.FlowRunner.updateFlow(FlowRunner.java:322)
at azkaban.execapp.FlowRunner.updateFlow(FlowRunner.java:316)
at azkaban.execapp.FlowRunner.progressGraph(FlowRunner.java:552)
at azkaban.execapp.FlowRunner.runFlow(FlowRunner.java:413)
at azkaban.execapp.FlowRunner.doRun(FlowRunner.java:238)
at azkaban.execapp.FlowRunner.run(FlowRunner.java:215)

Overlapping updates by 2 different threads could look something like this:

2017/05/24 12:20:54.695 +0300 INFO [JobRunner-job3-1] [ExecutableNode] job4: applyUpdateObject: RUNNING-> QUEUED
2017/05/24 12:20:55.700 +0300 INFO [JobRunner-job4-1] [JobRunner] job4: changeStatus: QUEUED -> SUCCEEDED
2017/05/24 12:20:55.701 +0300 INFO [FlowRunner-exec-1] [ExecutableNode] job4: applyUpdateObject: SUCCEEDED-> QUEUED

Background discussion at #1105.

Juho Autio

Commit: 1a2055f

Tree: 656742a

Parents: 555c84a