6/5/2017 5:23:50 PM
trace in error logging.
|
6/5/2017 3:40:41 PM
End Date feature to AZ schedules
Today, when we specify a flow schedule in Azkaban, the flow will run infinitely without termination as per the schedule. As #721 proposes, should schedule have an end date, the flow will have more flexibilities to be managed. In this PR, I implement the end date feature to allow users to specify the expiration date given a schedule. UI has not been changed yet in this patch (will not be implemented in near future), but people will be able to call API to leverage the new feature.
The API command sample:
`curl -k -d ajax=scheduleCronFlow -d projectName=wtwt -d flow=azkaban -d endSchedTime=14XXXXXXXXXX --data-urlencode cronExpression="0 * * ? * 6" -b "azkaban.browser.session.id=XXXXXXXXXXXXXX" http://localhost:8081/schedule`
I reuse BasicTimeChecker, as part of Expire Condition, in order to not break Old Azkaban flows. Also, Added one parameter `endSchedTime` to schedule API. If the users don't specify this parameter, Azkaban would use the default value for `01/01/2050`, and the schedule will be almost running infinitely. Besides, I add Unit tests as much as possible to guarantee that this change will not break out.
|
6/5/2017 3:32:52 PM
Fixing DefaultCharset warnings in azkaban-common
* Used #1045 to base changes off of
|
|
|
6/4/2017 6:03:28 PM
previous test jobs so that .succeedJob() is called on the retried job instance.
So now we can remove all @Ignores.
How does this help?
When a job run is started, the created job type instance gets added into InteractiveTestJob's map. If a job is retried, another instance is created, but for InteractiveTestJob the key=jobname is the same.
My commit message says:
Clear previous test jobs so that .succeedJob() is called on the retried job instance.
So occasionally it happened that InteractiveTestJob.getTestJob("jobb:innerJobB").succeedJob was targeted at the original failed instance. After that the retry attempt instance was added into InteractiveTestJob's map, but succeedJob was never called on it, so it was left in RUNNING state.
|
6/4/2017 5:16:59 PM
remove duplicate code
|
6/4/2017 5:01:01 PM
bug was revealed by JobRunnerTest#testDelayedExecutionCancelledJob:
If job was cancelled early, there was no initial "upload" for the ExecutableNode, just an eventual "update". I believe this would also fail when running against the DB, because there would be an UPDATE without a preceding INSERT.
* Added assert that caught the error before fix in JobRunner
|
6/4/2017 4:56:26 PM
from the test thread wasn't a good idea.
In rare cases it failed with this exception (see #1157 (comment)):
azkaban.execapp.FlowRunnerTest2 > testCancel FAILED
java.lang.NullPointerException
at azkaban.executor.ExecutableFlowBase.isFlowFinished(ExecutableFlowBase.java:398)
at azkaban.execapp.FlowRunnerTestBase.lambda$assertThreadShutDown$0(FlowRunnerTestBase.java:32)
at azkaban.execapp.FlowRunnerTestBase$$Lambda$9/1157486615.apply(Unknown Source)
at azkaban.execapp.FlowRunnerTestBase.waitFlowRunner(FlowRunnerTestBase.java:42)
at azkaban.execapp.FlowRunnerTestBase.assertThreadShutDown(FlowRunnerTestBase.java:31)
at azkaban.execapp.FlowRunnerTest2.testCancel(FlowRunnerTest2.java:729)
Extract from ExecutableFlowBase (this method is actually deleted in this PR): /** * Only returns true if the status of all finished nodes is true. */ public boolean isFlowFinished() { for (final String end : getEndNodes()) { final ExecutableNode node = getExecutableNode(end); // THIS IS WHERE THE NPE HAPPENED, so node was null if (!Status.isStatusFinished(node.getStatus())) { return false; } } return true; }
It must be that the NPE happened because this method was being called from the test main thread, and there is no synchronization on the "end nodes". So it was somehow possible that the test got a name for an end node but then couldn't get the ExecutableNode to match it.
I used isFlowFinished() (it existed previously, I didn't add it) in the test because it seemed like a handy way to check that, but seems that it didn't work reliably. And now I realized that this method is not being used at all otherwise, so it can be just removed.
Now checking flow completion in a thread-safe way.
Also delete unused methods in ExecutableFlowBase:
isFlowFinished
findNextJobsToRun
|
6/4/2017 4:34:25 PM
guarantees especially for unit tests that need to check for statuses in a multi-threaded environment. But also FlowRunner thread vs. JobRunner threads benefit from this, although I haven't been able to prove that there would be any critical issues there.
|