3/26/2019 2:33:03 AM
3.70.0
reasons for FlowRunnerYamlTest failing intermittently.
|
3/24/2019 2:59:07 AM
javaprocess job with empty or non-existing classpath property
- Fail the job with a helpful message if there's nothing to add as -cp arg (instead of trying to run java command that's doomed to fail)
- If classpath property exists, but has an empty value, try to scan the folder for .jar files (as was already the case when classpath property doesn't exist)
- Improve error message when ProcessJob setup fails: print the top-level exception message first, not the cause of it
- For some reason getClassPathParam() method wasn't being used. Now it is used again, with a small modification, though.
|
3/22/2019 6:31:24 PM
(#2150)
Propagate flow and job properties to event metadata
|
3/20/2019 7:01:24 PM
for new Dispatching Logic (#2140)
* Implement "flowPriority" (at the time of polling) feature for new dispatching logic
"flowPriority" is one of the execution options of a flow, which only takes effect if set by an Azkaban admin. By default all flows have a priority of 5 but if a bigger number is specified from the UI that flow will be dispatched for execution first.
With the new Dispatching Logic setting a high "flowPriority" means that flow will be polled first by executors. Note that it has been said "polled first" not "executed first". On the executor server there is a queue to hold executions when the maximum number of threads availables/setup to run flows is reached. This queue is NOT a priority queue.
To implement this feature for the new Dispatching Logic, a new column "flow_priority" needs to be added to table "execution_flows" to speed up access to this info (currently flowPriority is stored in "flow_data" longblob column). When new executions are inserted into "execution_flows" table, user-specified or default flow priority will be inserted as well. Later, executors will poll executions sorting by flow_priority DESC, submit_time ASC, exec_id ASC
* Address review comments
* Set flow_priority column to TINYINT type.
* Add unit tests
* Set submitTime and status attributes outside uploadExecutableFlow method
|
3/19/2019 7:45:09 PM
(#2144)
We should leverage metrics data type available in https://metrics.dropwizard.io/3.1.0/getting-started/ as much as possible.
|
3/18/2019 9:41:51 PM
project cache size is an absolute value indicating how large the project cache could be in MB. This PR makes it a percentage-based value which is a percentage of total size of the disk partition where project cache belongs to. So when disk partition is manually resized, this property won't need to be adjusted.
This PR also:
1. adds more detailed logging
2. use Set instead of List to keep the project directories to delete to guarantee no duplication.
|
3/10/2019 10:54:49 PM
format (#2143)
|
|
3/5/2019 11:36:59 PM
Move flowSetupTimer from CommonMetrics to ExecMetrics
* indentation
* code review comments: variable names swapped
* shorten wait
* resolve merge conflicts
* modify tests to juse assertj
|
3/5/2019 3:47:15 PM
to parallelize test runs. However, in practice enabling parallel tests actually makes the full azkaban test run slower on some machines. There hasn't been any noticeable speed-up on Travis. Hence, changing back to no parallel tests.
Instead of removing entirely, keeping the setting with value 1 & adding a comment to warn about it, so that it will be seen instead of enabling it again in the future without proper validation.
./gradlew cleanTest test took 3m 42s on my machine. That seemed way too much. I was running tests of each module in IDEA and for sure the total time wouldn't be that long. It lead to think that the command-line gradle runner must be doing something that's not optimal.
I knew that idea runs all tests sequentially, so I tried disabling parallelism entirely also for the command-line test runner. It was ~4 times faster:
maxParallelForks = 1
-> ./gradlew cleanTest test 49s
I have 4 cores on my machine so I also gave this a try, but not much help:
maxParallelForks = 4
-> ./gradlew cleanTest test 2m 50s
To make sure it's not just random fluctuation I tested the "fix" once more:
maxParallelForks = 1
-> ./gradlew cleanTest test 1m 4s
And one more go with the original setting:
maxParallelForks = 12
-> ./gradlew cleanTest test 2m 0s
It could be that the forked execution was even slower when my machine had other heavy workloads going on (hence the big variation from 2m to almost 4m).
Maybe the tests interfere with each other.. Maybe the additional forking is just expensive because every test needs to load some heavy classes / static members again.. Whatever the reason is, there's no point in having parallelism on if makes the total build time longer.
|