7/28/2017 1:38:48 PM
execution cleanup (#1292)
Setting the gid bit on the group for a directory causes all items created within that directory to inherit the group of the directory. So all items users create in their /execution/<exec_id> directory will automatically be a part of the azkaban group. This allows the azkaban cleanup thread to properly remove user-generated files/directories.
The gid bit is system-specific, so there is no java standard library api for setting it. The solution I proposed spawns a subprocess that performs the chmod command. I don't think this is very clean, but I haven't been able to find a better way. Anybody have any ideas?
Another option would be to build this executionDirectory as part of our build process, but that isn't how we do things right now (tested by running deploy on holdem4 jenkins job without any of the other steps and verifying no /executions directory is created until system start).
Note that this change is not covered by testing due to the difficulty of dealing simultaneously with filesystems and subprocesses within a testing environment. If disk space usage increases in the future, it has possible that this change has been regressed. In order to confirm that this change is working in production on particularly large clusters (where problems have been seen), I'll be keeping an eye on it when it's released.
|
7/27/2017 10:46:17 PM
would reduce the memory overhead for session to some extend so that we can increase the size of session cache more.
|
7/27/2017 8:26:53 PM
Add metrics for sending email successs/failure.
* Guicify Emailer class and refactor some test cases.
|
7/27/2017 6:49:31 PM
in DB. (#1288)
|
7/25/2017 6:40:06 PM
to classes's annotations
This patch refactors the Guice uses, and mainly move all singleton
binding to respective classes with singleton annotations. The corresponding
tests are added as well. A bit more context is at #1285 .
|
7/25/2017 12:31:14 AM
type for killing a job
The action is to kill a job and retry it based on the retry configuration of that job.
Previously only killing a flow is allowed when SLA is missed even if SLA is set on job level.
New action will kick in when user sets SLA rule on a job and enforce kill action on missing the SLA. There's no UI change.
Testing it manually with following flows:
jobA(retry num: 2)->jobB(retry num: 2)
SLA rule: if job A doesn't succeed in 1 min, kill the job
SLA rule: if job B doesn't succeed in 1 min, kill the job
jobA->jobB, jobA->jobC, jobB->jobD, B retry number is set to 2.
SLA rule: if job B doesn't succeed in 1 min, kill it.
|
7/24/2017 2:22:05 PM
(#1279)
|
7/23/2017 6:28:08 PM
is intended to prevent the bug happened on #1283. I simply annotate ProjectManager singleton in this patth. When developers guicify other classes relying on ProjectManager in future, they will not need to worry about generating bugs.
|
|
|
7/20/2017 5:52:33 PM
bug was found in local solo server built by July 19th's master
code. Our integration test tried to add trigger, which run every
minutes. The strange observation is that this trigger never run. We
debugged the code and found that the TriggerScannerThread was launched
twice, and newly added triggers were put into the should-not-exist
thread and never run. I noticed that TriggerManager did not bind to
Singleton in Guice Confs. I made this change, and the thread doesn't
launch twice. The bug disappeared.
|