6/5/2017 3:32:52 PM
Fixing DefaultCharset warnings in azkaban-common
* Used #1045 to base changes off of
|
|
|
6/4/2017 6:03:28 PM
previous test jobs so that .succeedJob() is called on the retried job instance.
So now we can remove all @Ignores.
How does this help?
When a job run is started, the created job type instance gets added into InteractiveTestJob's map. If a job is retried, another instance is created, but for InteractiveTestJob the key=jobname is the same.
My commit message says:
Clear previous test jobs so that .succeedJob() is called on the retried job instance.
So occasionally it happened that InteractiveTestJob.getTestJob("jobb:innerJobB").succeedJob was targeted at the original failed instance. After that the retry attempt instance was added into InteractiveTestJob's map, but succeedJob was never called on it, so it was left in RUNNING state.
|
6/4/2017 5:16:59 PM
remove duplicate code
|
6/4/2017 5:01:01 PM
bug was revealed by JobRunnerTest#testDelayedExecutionCancelledJob:
If job was cancelled early, there was no initial "upload" for the ExecutableNode, just an eventual "update". I believe this would also fail when running against the DB, because there would be an UPDATE without a preceding INSERT.
* Added assert that caught the error before fix in JobRunner
|
6/4/2017 4:56:26 PM
from the test thread wasn't a good idea.
In rare cases it failed with this exception (see #1157 (comment)):
azkaban.execapp.FlowRunnerTest2 > testCancel FAILED
java.lang.NullPointerException
at azkaban.executor.ExecutableFlowBase.isFlowFinished(ExecutableFlowBase.java:398)
at azkaban.execapp.FlowRunnerTestBase.lambda$assertThreadShutDown$0(FlowRunnerTestBase.java:32)
at azkaban.execapp.FlowRunnerTestBase$$Lambda$9/1157486615.apply(Unknown Source)
at azkaban.execapp.FlowRunnerTestBase.waitFlowRunner(FlowRunnerTestBase.java:42)
at azkaban.execapp.FlowRunnerTestBase.assertThreadShutDown(FlowRunnerTestBase.java:31)
at azkaban.execapp.FlowRunnerTest2.testCancel(FlowRunnerTest2.java:729)
Extract from ExecutableFlowBase (this method is actually deleted in this PR): /** * Only returns true if the status of all finished nodes is true. */ public boolean isFlowFinished() { for (final String end : getEndNodes()) { final ExecutableNode node = getExecutableNode(end); // THIS IS WHERE THE NPE HAPPENED, so node was null if (!Status.isStatusFinished(node.getStatus())) { return false; } } return true; }
It must be that the NPE happened because this method was being called from the test main thread, and there is no synchronization on the "end nodes". So it was somehow possible that the test got a name for an end node but then couldn't get the ExecutableNode to match it.
I used isFlowFinished() (it existed previously, I didn't add it) in the test because it seemed like a handy way to check that, but seems that it didn't work reliably. And now I realized that this method is not being used at all otherwise, so it can be just removed.
Now checking flow completion in a thread-safe way.
Also delete unused methods in ExecutableFlowBase:
isFlowFinished
findNextJobsToRun
|
6/4/2017 4:34:25 PM
guarantees especially for unit tests that need to check for statuses in a multi-threaded environment. But also FlowRunner thread vs. JobRunner threads benefit from this, although I haven't been able to prove that there would be any critical issues there.
|
6/4/2017 3:21:28 PM
three flaky FlowRunner related tests
azkaban.execapp.FlowRunnerTest2.testPauseFailFinishAll
and
azkaban.execapp.FlowRunnerTest2#testRetryOnFailure
testCancelOnFailure
Issue: #1155
* ignore testCancelOnFailure too
see #1157
|
6/4/2017 2:29:11 PM
(#1152)
Using hamcrest matcher `is` to compare arrays, because it gives an informative error message, like:
java.lang.AssertionError:
Expected: is [<FLOW_STARTED>, <FLOW_FINISHED>] but: was [<FLOW_STARTED>]
|