11/14/2018 1:26:42 AM
actions" plugin -- format changes only.
2. separate different token prefetching procedures into individual methods (this is done with intellij's method extract: https://www.jetbrains.com/help/idea/extract-method.html).
Follow-up -- to be done in next PR:
1. more logging for each token prefetching methods so that we know prefetching from which service is stuck.
2. see if removing "synchronized" for doPrefetch(HadoopSecurityManager_H_2_0#doPrefetch) is feasible.
|
11/13/2018 10:15:19 PM
could happen when multiple azkaban executor process are running:
It's possible when two azkaban executor process perform deletion in the same azkaban project dir even when one executor is inactive. E.g, flow run on any executor by using useExecutor label.
If so, the list of azkaban project dir in memory(installedProjects) kept by azkaban executor process, will be out of sync from what's on the disk.
Another case is race condition between one executor process is deleting a project dir while another executor process is creating execution dir based on the project dir.
This PR
removes the installedProjects from executor. So every time a project needs to be downloaded, a scan of every project dir and calculation of total disk usage sum will be done to decide whether purging is needed. This could takes tens of seconds when number of project dir is >= 5000 but a few seconds with inode cache.
make project dir cleanup(deleteProjectDirsIfNecessary) synchronized. Since the method is a check-then-act process which is vulnerable to race condition when multiple threads are doing deletion. An alternative is to synchronize on an interned string of project id+project version(https://stackoverflow.com/questions/133988/synchronizing-on-string-objects-in-java), however this is not that elegant as the linked post points out. Synchronization on the object level makes sense given flow setup is low frequency operation in most cases(<= 5 ops/mins in our production environment).
when project dir is created, another metadata file keeping the file count is created. The purpose of it is is to address the race condition between one executor process is deleting a project dir while another executor process is creating execution dir based on the project dir. A sanity check on the file count will be conducted against created execution dir. If execution dir's file count is not same as base project dir, then fail the flow setup and let azkaban web server dispatch it again.
Note even with this fix, there still could be race conditions. E.g, when two executor process are calling ProjectCacheDirCleaner#deleteProjectDirsIfNecessary, one might delete a dir while the other is loading the same dir.
A potential long term fix: #2020
Follow-up
add file count sanity check mentioned above.
|
11/13/2018 3:52:43 PM
(#2019)
* New ‘Expand/Collapse all Flows’ options on Flow, Flow Execution and Schedule/Execute views
Prevents users from having to unfold deeply nested flows by clicking one by one.
This improvement will allow users to browse flows and enable or disable job executions more easily.
* Fix according to review comments
* Remaining fixes according to review comments
|
11/9/2018 10:05:21 PM
3.61.0
(#2022)
|
11/6/2018 7:29:42 PM
execution if executor doesn't exist
* Fix according to review comments
* Swap missing executor warn messages according to review comment
|
11/5/2018 5:21:16 PM
commit shows the problem:
When trying to execute a cleaned project version, an exception is thrown that says that hash code comparison failed.
2nd commit improves the error message in case of trying to read a cleaned (deleted) version. It also fails faster (don't try to generate a hashcode from 0 chunks).
Move project.version.retention to Constants.java.
|
11/2/2018 6:03:38 PM
a flow is scheduled with concurrentOption=skip, it's perfectly normal that triggering of a schedule is skipped. This PR changes such ERROR lines in the server log to INFO level.
On a general level, in my opinion ERROR level should be only used for platform errors, ie. when Azkaban fails to do something that it promises to be able to do. If this rule holds, it will be easier to monitor that Azkaban is working correctly by checking that server logs don't contain errors.
Before:
2018/10/23 13:41:14.337 +0300 INFO [ExecuteFlowAction] Invoking flow test-project.test-flow
2018/10/23 13:41:14.338 +0300 ERROR [TriggerManager] Failed to do action Execute flow test-flow from project test-project for Trigger Id: 0, Description: Trigger from triggerLoader with trigger condition of ThresholdChecker.eval() and expire condition of EndTimeCheck_1.eval(), Execute flow test-flow from project test-project
java.lang.RuntimeException: azkaban.executor.ExecutorManagerException: Flow is already running. Skipping execution.
at azkaban.trigger.builtin.ExecuteFlowAction.doAction(ExecuteFlowAction.java:232)
at azkaban.trigger.TriggerManager$TriggerScannerThread.onTriggerTrigger(TriggerManager.java:363)
at azkaban.trigger.TriggerManager$TriggerScannerThread.checkAllTriggers(TriggerManager.java:343)
at azkaban.trigger.TriggerManager$TriggerScannerThread.run(TriggerManager.java:297)
Caused by: azkaban.executor.ExecutorManagerException: Flow is already running. Skipping execution.
at azkaban.trigger.builtin.ExecuteFlowAction.doAction(ExecuteFlowAction.java:229)
... 3 more
After:
2018/10/23 13:41:51.778 +0300 INFO [ExecuteFlowAction] Invoking flow test-project.test-flow
2018/10/23 13:41:51.779 +0300 INFO [TriggerManager] Skipped action [Execute flow test-flow from project test-project] for [Trigger Id: 0, Description: Trigger from triggerLoader with trigger condition of ThresholdChecker.eval() and expire condition of EndTimeCheck_1.eval(), Execute flow test-flow from project test-project] because: Flow is already running. Skipping execution.
|
10/31/2018 9:25:46 PM
(#2012)
To avoid possible race condition caused by two threads (FlowRunnerManager#Cleaner and FlowPreparer#setup), and to make project directory clean up logic simpler, cleaning up projects of old version method is removed. Project dir deletion will be handled in a single thread in FlowPreparer#setup.
a race condition example:
Old Executor(initial state: active) New Executor(initial state: inactive) start to delete the project of old version
set to inactive set to active, start to load the project dirs into in-memory list load project dir 1.1 into memory
delete project dir 1.1
deletion completes loading completes then new executor's in-memory active project list could contain a project dir deleted by the old executor.
|
10/31/2018 12:43:52 AM
the assumptions
|
10/31/2018 12:41:52 AM
but not both (#2006)
We ran into a bug in the production environment. Users' jars cannot override default pig libraries. It was found that AZ always adds pig additional jar if pig job type defines it under plugin.properties. The fix is straightforward. If a user defines the property, we use it directly by not including system default settings.
|