3/7/2017 2:46:16 PM
3.16.0
Code Refactor and adding key metrics
This commit adds and fixes the following metrics:
* Flow failure rate
* project zip uploading time duration log
* correct the right NumRunningFlows in web server
* add Jetty thread pool metrics as a Web Server performance monitor
* adding measure DB connection time as a metrics for DB connection pool performance
|
3/2/2017 3:35:10 AM
User can fill in their own alias in property file. It will be used as hostname in getHost(). If not configured we will use canonical hostname as fallback. 2. Put the property name "executor.host" as a constant in ServerProperties as part of code refactoring.
|
2/27/2017 3:09:29 PM
app metrics. (#913)
* constructing new app metrics structure to collect and send app metrics.
This commits create two singletone metrics classes, CommonMetrics and WebMetrics, which are in charge of dealing with collecting and reporting metrics, like user log fetch latency, the number of get rest call in web side. Also added corresponding tests to verify. This commit kicks off the following metrics:
* the rate of Rest Get Call in Webserver
* the rate of Rest Post Call in Webserver
* The rate of DB connection in both web and executor
* user log fetch latency
More metrics could be added easily after the app metrics structure is created.
|
2/14/2017 1:54:33 PM
(#905)
Previously, the log contains only the exception message because it uses log.error(object) method instead of log.error(message, throwable) method.
|
2/9/2017 5:28:03 AM
executor is reading deleted project dirs. (#894)
Since executor is not aware of running flows of other executors, it's possible that one executor is deleting the project which others are running. Previously we use symlink in execution dir. Symlink will be invalidated and execution will thus fail when source project dir is deleted by other executors when multiple executors are running. In the case when two executors are downloading the same project, the logic will still work since execution dirs will be linked to one project dir and no deletion will take place for new projects. Please note that due to the restriction of hard link, project dir and execution dir have to be in the same partitions.
|
2/3/2017 9:22:03 PM
3.15.0
up by next execution. (#892)
When manually start a flow, the initial job disabled/enabled state ( if a job is disabled or not ) should come from the last ( and only ) schedule of the flow if any modification to job states are made. The current behavior is that the initial state will be the state when the flow is uploaded. However when a job state is changed in a manual execution, the state is not persisted.
|
2/3/2017 9:08:56 PM
azkaban/azkaban#869
The current code seems to trigger incorrect SLA alerts and spam users. Azkaban seems to report an alert where every 2 min where the job SLA is 60 min. Revert the change till we can investigate the root cause.
|
2/3/2017 8:52:19 PM
syncing ensures that the messages are in order. However, the problem is that the syncing also slows down the transaction rate of these messages drastically. This affects the service significantly and is a blocking call. Reverting this change which was added to #852
|
2/1/2017 11:22:35 PM
log.error statements in a bunch of places in StatsServlet
|
2/1/2017 10:56:15 PM
project files. (#891)
The initial value of active flag in FlowRunnerManager was set to true by mistake previously.
|