6/4/2017 5:16:59 PM
remove duplicate code
|
6/4/2017 5:01:01 PM
bug was revealed by JobRunnerTest#testDelayedExecutionCancelledJob:
If job was cancelled early, there was no initial "upload" for the ExecutableNode, just an eventual "update". I believe this would also fail when running against the DB, because there would be an UPDATE without a preceding INSERT.
* Added assert that caught the error before fix in JobRunner
|
6/4/2017 4:56:26 PM
from the test thread wasn't a good idea.
In rare cases it failed with this exception (see #1157 (comment)):
azkaban.execapp.FlowRunnerTest2 > testCancel FAILED
java.lang.NullPointerException
at azkaban.executor.ExecutableFlowBase.isFlowFinished(ExecutableFlowBase.java:398)
at azkaban.execapp.FlowRunnerTestBase.lambda$assertThreadShutDown$0(FlowRunnerTestBase.java:32)
at azkaban.execapp.FlowRunnerTestBase$$Lambda$9/1157486615.apply(Unknown Source)
at azkaban.execapp.FlowRunnerTestBase.waitFlowRunner(FlowRunnerTestBase.java:42)
at azkaban.execapp.FlowRunnerTestBase.assertThreadShutDown(FlowRunnerTestBase.java:31)
at azkaban.execapp.FlowRunnerTest2.testCancel(FlowRunnerTest2.java:729)
Extract from ExecutableFlowBase (this method is actually deleted in this PR): /** * Only returns true if the status of all finished nodes is true. */ public boolean isFlowFinished() { for (final String end : getEndNodes()) { final ExecutableNode node = getExecutableNode(end); // THIS IS WHERE THE NPE HAPPENED, so node was null if (!Status.isStatusFinished(node.getStatus())) { return false; } } return true; }
It must be that the NPE happened because this method was being called from the test main thread, and there is no synchronization on the "end nodes". So it was somehow possible that the test got a name for an end node but then couldn't get the ExecutableNode to match it.
I used isFlowFinished() (it existed previously, I didn't add it) in the test because it seemed like a handy way to check that, but seems that it didn't work reliably. And now I realized that this method is not being used at all otherwise, so it can be just removed.
Now checking flow completion in a thread-safe way.
Also delete unused methods in ExecutableFlowBase:
isFlowFinished
findNextJobsToRun
|
6/4/2017 4:34:25 PM
guarantees especially for unit tests that need to check for statuses in a multi-threaded environment. But also FlowRunner thread vs. JobRunner threads benefit from this, although I haven't been able to prove that there would be any critical issues there.
|
6/4/2017 3:21:28 PM
three flaky FlowRunner related tests
azkaban.execapp.FlowRunnerTest2.testPauseFailFinishAll
and
azkaban.execapp.FlowRunnerTest2#testRetryOnFailure
testCancelOnFailure
Issue: #1155
* ignore testCancelOnFailure too
see #1157
|
6/4/2017 2:29:11 PM
(#1152)
Using hamcrest matcher `is` to compare arrays, because it gives an informative error message, like:
java.lang.AssertionError:
Expected: is [<FLOW_STARTED>, <FLOW_FINISHED>] but: was [<FLOW_STARTED>]
|
6/4/2017 2:05:25 PM
(#1151)
* Consolidate the duplicate code into one prepareProject method
Issue:
Three test classes in execapp have the same duplicate prepareProject
logic.
Fix:
Consolidated into one method in a shared test utility class.
With this change, the FlowRunnerTest2 has no intellij warnings left
except warnings about typos.
* Update Copyright notices
|
6/4/2017 1:25:57 PM
cache from web server
* Fix one test case failure
* Added unit test for fetching recentlyFinishedFlows
* Modified recently finished lifetime
* Some improvement on recently finished flows cache change.
* Quick fix
* Fix merge and comments
|
6/4/2017 11:44:03 AM
to the intellij's guide.
* Users should be able to use import code style features instead.
* Advised against turning on "Remove unused suppress warning
annotation" option in the save actions plugin since it seems to be
unreliable.
|
6/3/2017 7:40:55 PM
in MockExecutorLoader: it shouldn't modify the state of actual flows, so make a clone of the flows, too.
Also log job logs in MockExecutorLoader instead of discarding them, so that it's possible to debug failures in tests.
The problem was this (originally revealed when FlowRunnerTest still used this mock class instead of a Mockito mock):
MockExecutorLoader kept a handle to the actual ExecutableFlow instance
MockExecutorLoader.updateExecutableFlow set the job statuses based on the current flow status, which is not synchronized with the job runner threads
So this wrote an old job status over the newer one in ExecutableNode
Stacktrace for that:
job4: setStatus SUCCEEDED -> QUEUED
at azkaban.executor.ExecutableNode.setStatus(ExecutableNode.java:141)
at azkaban.executor.ExecutableNode.applyUpdateObject(ExecutableNode.java:398)
at azkaban.executor.ExecutableFlowBase.applyUpdateObject(ExecutableFlowBase.java:352)
at azkaban.executor.ExecutableFlowBase.applyUpdateObject(ExecutableFlowBase.java:369)
at azkaban.executor.MockExecutorLoader.updateExecutableFlow(MockExecutorLoader.java:120)
at azkaban.execapp.FlowRunner.updateFlow(FlowRunner.java:322)
at azkaban.execapp.FlowRunner.updateFlow(FlowRunner.java:316)
at azkaban.execapp.FlowRunner.progressGraph(FlowRunner.java:552)
at azkaban.execapp.FlowRunner.runFlow(FlowRunner.java:413)
at azkaban.execapp.FlowRunner.doRun(FlowRunner.java:238)
at azkaban.execapp.FlowRunner.run(FlowRunner.java:215)
Overlapping updates by 2 different threads could look something like this:
2017/05/24 12:20:54.695 +0300 INFO [JobRunner-job3-1] [ExecutableNode] job4: applyUpdateObject: RUNNING-> QUEUED
2017/05/24 12:20:55.700 +0300 INFO [JobRunner-job4-1] [JobRunner] job4: changeStatus: QUEUED -> SUCCEEDED
2017/05/24 12:20:55.701 +0300 INFO [FlowRunner-exec-1] [ExecutableNode] job4: applyUpdateObject: SUCCEEDED-> QUEUED
Background discussion at #1105.
|