10/19/2018 8:03:49 PM
(#1976)
|
10/18/2018 8:13:39 PM
by one class that's singleton itself, so no harm done for now. But this is how it should be, in case it will be injected to more than one instances later.
|
10/18/2018 7:40:06 PM
mode
This to simplify the code.
This requires users to migrate to azkaban.use.multiple.executors=true if not already using it.
After this change Azkaban will refuse to start if the property is missing or if it's set to false.
* Fix wrong comment placement & other minor fixes
* Added TODO to eventually delete checkMultiExecutorMode
* Clean up usage of executor port constants
|
10/18/2018 5:15:15 PM
(#1975)
* Create a unit test for RunningExecutionsUpdater
* Move update request code to ExecutorApiGateway
Better scope for RunningExecutionsUpdater & cleaner unit tests
* Added test updateExecutionsSucceeded() & used constant for "error"
|
|
10/16/2018 3:28:15 PM
attempts than number of active executors.
The configuration key azkaban.maxDispatchingErrors is thus respected without an upper cap.
Normally azkaban-web shouldn't ever fail dispatching executions to executors as long as there are active executors available. This change allows configuring a limit that is in practice high enough to keep retrying forever, until at least one responsive executor appears.
NOTE: The handling of dispatch errors should be improved to distinguish between retriable & non-retriable errors.
The dispatch call itself is simple: it contains the execution id and username. I can't imagine a case where the request would be syntactically invalid/incompatible with the executor.
However, If executor returns a response with "error" in it, the dispatch on ExecutorManager currently fails the dispatch, and keeps retrying until the give up condition is met. It shouldn't be like this in all cases.
For example:
if the error is about "already running" (on that executor), ExecutorManager should treat that as a successful dispatch (even though it may be that this can't ever happen in practice).
Actually for this case I think executor shouldn't even return an error response.
If the error reason is that the execution is not found in the DB when executor tries to load it (how could that happen though..?), ExecutorManager should just give up dispatching.
And so on.
Any way, cleaning up Azkaban DB manually from problematic executions like mentioned above can be left for the responsibility of the admin, if the admin chooses to configure a non-default value for maxDispatchingErrors. This PR doesn't have to deal with handling different dispatch error cases. It can be handled later.
But this is how I'd plan to do it:
If response is connection error -> keep retrying
If response is received but status is not HTTP 200 OK -> keep retrying
If response is "already running on this executor" -> treat as a success (implement this so that executor doesn't return an error in the first place)
If response is received with any other error -> give up after receiving this kind of error from all active executors
Additional note on retrying after dispatch failure with "is already running":
I'm afraid that this can happen with the current azkaban code (remains to be verified though):
azkaban-web tries to dispatch execution 123 to executor 1
azkaban-web crashes / is killed
executor 1 has started the execution 123 and it's running
azkaban-web starts again
azkaban-web fetches the queued executions from the DB
azkaban-web tries to dispatch execution 123 to the assigned executor 1
executor 1 returns an error
azkaban-web rolls back executor assignment of execution 123
azkaban-web dispaches execution 123 to executor 2
execution 123 is running on both executors 1 & 2 at the same time
Also fix is simple: don't return an error if execution is already running on the executor.
However even that fix is not bullet-proof. Azkaban-web could also fail to receive the response of a dispatch call because of a connection error for example. It would also then automatically try the next executor.. One option would be to have some cooldown period after a failed dispatch attempt and checking from DB if the execution is running before trying to dispatch on another executor? Seems hard to get this right without adding some proper locking though. Whew, I'm happy to realize that in our setup we typically have only 1 active executor at a time.
|
10/16/2018 3:22:56 PM
remove redundant initial field values & make them final
- apply save actions plugin
|
|
10/15/2018 8:51:33 PM
1. variable support to reportal presto jobtype 2. support of presto query with trailing semicolon.
|
10/15/2018 4:49:55 PM
(#1974)
|