3/26/2019 10:11:38 PM
#2113.
Quote from @ypadron-in
This change exposed unit tests failures when tests are run using MacOS's console with ./gradlew cleanTest test command. We will revert it temporarily until we fix the tests.
Failing tests: createErrorEmail, createFirstErrorMessage, createSuccessEmail and testDispatchMultipleRetries
|
3/26/2019 2:33:03 AM
reasons for FlowRunnerYamlTest failing intermittently.
|
3/1/2019 9:20:17 PM
completes (#2133)
This PR makes deactivation process block until all flow preparation work finishes.
When deploying new executor, old running executor will be deactivated before new one is activated and only one executor is allowed to delete/hard-linking project directories to avoid race condition(see #2130). So to make deactivation process block until flow preparation work finishes guarantees the old executor won't access FlowPreparer#setup after deactivation.
|
1/22/2019 5:21:44 PM
AZNewDispatchingLogic - Executor failure detection
* Fix typo
* Address comments.
|
1/16/2019 3:42:55 PM
caused the jobs to hang.
Commits to revert:
-Remove unused code (661351f). This is the commit that's causing the issue. It eliminated code that read from the standard output of a subprocess but did nothing with it. That caused the subprocess’ standard output to not be consumed, so when the pipe was full the process blocked trying to write to standard output.
-Log kill command failures when killing jobs (734f01b)
|
1/7/2019 5:44:13 PM
to make it more exposable to azkaban users as we have heard from internal users that they didn't know some of the useful azkaban features(like auto-retry on job failure) until they did fair amount of researching.
|
12/12/2018 4:05:38 PM
Fix url in Job History page’s link to Job page The parameter ‘flow’ in the url to Job page /manager?project=projectName&flow=flowId&job=jobId needs to be parsed following this pattern: flowRootName[,embeddedFlowName:embeddedFlowPath]*
* Flow link on Job History page refers to immediate flow instead of root flow
* Unit test for parsing of immediate flow id
* Fix according to review comments
|
|
12/4/2018 3:59:25 PM
add cache hit ratio metrics for project dir
* in progress
* address comment
|
11/9/2018 10:05:21 PM
(#2022)
|
|
10/25/2018 11:40:03 PM
in shutting down mode when its shutdown API is called. Executor will wait for all the ongoing flows to complete then shut itself down. Web server talks to executor periodically to fetch the status of each flow in its running flow cache. This PR accommodates a grace period between completion of all flows and killing the executor process, in order to allow web server to finalize flow status updating.
|
9/24/2018 3:58:10 PM
(#1963)
This reverts commit b70ee97.
Reportal has to work with internal version of presto JDBC instead of OSS version.
|
9/19/2018 10:17:33 PM
jdbc is required by presto reportal jobtype.
|
9/4/2018 3:20:11 PM
introduced a bug that executions history cannot be displayed in executions tab give every project
The issue is that javascript cannot find ${size}. The resolution is simple that we should add "size" to the context so that javascript is able to locate it.
|
8/31/2018 5:56:04 PM
intention of the original test is to verify if the condition is null, it should take the default value all_success. Modify the test case to make it clear and stable.
|
8/30/2018 7:43:58 PM
jobtype for reportal. User is able to run presto query through reportal UI with a shared presto account defined in jobtype plugin config. Presto account password could be encrypted and kept in the config file, and it requires a key path file which could be defined in jobtype plugin config to decrypt.
|
8/27/2018 2:23:16 PM
workflow - validate conditions
* Address comments
|
8/22/2018 2:28:51 PM
(#1933)
Flow trigger classes will be serialized and persisted in quartz tables and deserialized back.
If serialVersionUID is not defined specifically, JVM would generate a default id in runtime which is dependent on compiler implementation, and and can thus result in unexpected nvalidClassExceptions during deserialization if new deployment has different serialVersionUID from old deployment which has been persisted in db.
Why choose those numbers(-1330280892166841227L ...)?
To be backward compatible since those are the serialVersionUID of first deployment of flow trigger classes and already have been persisted in database.
|
|
8/9/2018 1:36:42 PM
(#1904)
Hadoop/YARN has support for application tags, which MR/Spark can include in its job submission. This PR will add some azkaban metadata into MR config so MR can include them in the YARN application tags when it submits to YARN.
|
8/8/2018 7:12:40 PM
size file if it doesn't exist (#1911)
the method updateDirSize is already verified by FlowPreparerTest#testSetupProject, so unit test is not needed.
|
7/19/2018 2:14:40 PM
(#1861)
This PR proposes a new toolset to do Azkaban documentation. We use Sphinx and ReStructuredText to write down docs, and publish it to readTheDocs. README is included to educate users how to develop the documentation.
Getting started section is rewritten and make sure it is up to date.
|
7/12/2018 6:02:38 PM
has a conflict against our internal gradle system. So replacing a different one.
|
7/10/2018 2:55:17 PM
trigger (#1813)
Upon project uploading, azkaban should be able to schedule the associated flow trigger. This is already achieved when project upload is done from ajax API call(#1631). This PR enables scheduling flow trigger when uploading project from restli endpoint.
|
7/9/2018 7:14:20 PM
Add a test to run a v2 flow file with the new DAG engine
This is a step towards building an integration test for the new DAG engine.
Refactored the DagBuilder API to make it easier to use.
Instead of using a DagBuilder class to link nodes, use the name of the
nodes directly.
This facility can also be used in the future to build the tools to run
flows locally for testing purposes.
* Fix a copy and paste error.
Should check non null of the dagProcessor parameter.
|
6/20/2018 5:33:56 PM
change configurable key name
change default value for max cache size to 128GB which is a more standard number.
change default value for stop cleanup threshold to 60 to allow more space to be freed.
change data type of stop cleanup threshold from double to int.
rename the parameter name(projectDirMaxSizeInMB - > projectDirMaxSizeInMb) to align with coding standard.
|
6/5/2018 8:46:11 PM
(#1790)
This change moves VM files to resources. Tested in staging cluster.
|
6/1/2018 8:38:04 PM
vulnerability.
* Fix typo.
* Adress comments.
* Address comments.
|
5/17/2018 5:07:37 PM
to discover the node corresponding to a subdag when
the subdag starts to run.
The find node by name method is no longer needed for now.
|
4/7/2018 12:36:16 AM
should be able to use a custom job log format such as GMT timestamps
|
4/6/2018 5:38:02 PM
back azkaban/azkaban-plugins#278. People are still calling out for this feature. Will make it configurable in a follow-up PR, so that reportal have the flexibility to enable or disable this feature.
|
3/28/2018 10:07:06 PM
was changed to use POST for all requests, but I missed that azkaban-web also makes calls to /serverStatistics, /jmx & /stats via the gateway, not just /executor.
The problem wasn't seen in manual tests with AzkabanSingleServer because it doesn't use multi-executor mode.
|
3/23/2018 4:18:43 PM
to using POST with form params instead of GET with URL params, so that the number of execution ids passed can be longer.
Also deleting some unused code.
Tested manually that it works like this:
run AzkabanSingleServer in IDEA with debugger
start a flow via the UI (http://localhost:8081/)
set breakpoint to check that requests come to azkaban.execapp.ExecutorServlet and are handled successfully
set breakpoint to check that ExecutorManager can get the execution updates
(initial PR & discussion in #1655)
Builds on #1707 to validate the fix.
|
3/2/2018 9:08:36 PM
the name consistent with the old names to minimize changes in other places that still reference the old names.
|
2/27/2018 5:51:37 PM
(#1667)
changed error message from "max wait min must be longer than X min"
to "max wait min must be at least X min(s)"
|
2/12/2018 4:38:53 PM
instance servlet and associated template.
The page consists of currently running trigger instances and recently finished one.
|
1/29/2018 10:24:23 PM
(#1619)
This PR added two classes to handle status change of trigger instance and dependency instance:
DependencyInstanceProcess/TriggerInstanceProcessor, handling the work post status update. E.x, execute a flow when trigger instance becomes successful, shoot an email when trigger instance is cancelled, persist status update into database.
|
1/10/2018 6:04:48 PM
(#1590)
A couple of production issues warned us that we should enforce the MAX number of concurrent executions given a flow executable. In case people accidentally schedule flows per minute or endlessly submit flows from the client side, this PR proposes to implement a simple quota to prevent it happening.
Unit Test is added.
|
1/2/2018 9:19:17 PM
investigation needed for better fix. (#1596)
|
1/2/2018 9:19:17 PM
investigation needed for better fix. (#1596)
|
|
12/18/2017 7:12:17 PM
hard linking (#1583)
Linux has restriction on shell argument length, it's likely the command to do hard linking project dir of a lot of subdir and files will exceed the restriction, causing flow setup failure. The fix is replacing the logic with java native file API.
|
12/18/2017 7:12:17 PM
hard linking (#1583)
Linux has restriction on shell argument length, it's likely the command to do hard linking project dir of a lot of subdir and files will exceed the restriction, causing flow setup failure. The fix is replacing the logic with java native file API.
|
11/15/2017 8:59:58 PM
provider (#1556)
* commit 1
* commit 2
* remove redundant code
This PR is a follow-up of #1552 . We create a job setting "azkaban.job.enable.ssl". When it is enabled, we use Java reflection to initialize the credential provider object, and run the method.
|
11/7/2017 5:30:49 PM
Interface in order to let custom credential provider register user's secret keys.
One sample use case: Today we don't have a secure way to pass on credentials into Hadoop. This interface allows Azkaban to fetch user's credential from external system (like certificate authority), and add it into Hadoop Job Context. Then submitted hadoop job will be able to use it.
Today we only have one method, but could have more in future.
This PR only proposes an interface, and the next step will be to use this interface in HadoopSecurityManager_H_2_0 class.
|
10/13/2017 12:22:29 PM
tests (#1536)
|
9/29/2017 6:49:41 PM
not run at all, but it has been marked as killed, the right thing to do is set the status to KILLED. Otherwise the flow would get stuck because this job would send the JOB_FINISHED event with the unfinished status: KILLING.
|
9/26/2017 6:57:59 PM
hadoop gradle dependencies to compileOnly (#1499)
* build: change hadoop gradle dependencies to compileOnly instead of incuding them as runtime dependencies
* Fix tests broken due to moving to compileOnly for hadoop dependencies.
Fix was to use testCompile to include hadoop dependencies.
Also fixed transitive dependencies from gobblinKafka dependency
* Fix azkaban-solo-server by including hadoop dependencies in build.gradle (#1503)
* Fix cherry-pick by removing unnecessary gobblin-kafka dependency
|
9/22/2017 8:42:25 PM
in build.gradle (#1503)
|
|
8/22/2017 6:07:05 PM
plugin adds an errorprone configuration that automatically uses the latest release of error-prone. You can override it to use a specific version "
see https://github.com/tbroyer/gradle-errorprone-plugin
It appears that by latest version it will use the last locally cached
version. This caused inconsistency in builds.
## Fix:
Specify a fixed version
Will update to the latest version after the internal Artifactory
is updated with the latest version.
## Test:
Deleted the gradle cache and run the build again and verified in the
console output that the specified version was downloaded.
|
8/22/2017 5:25:23 PM
PR is a follow-up of #1345. Last but not least, I'm moving assignExecutor and UnAssignExecutor to a new DAO file.
The reason I don't create a new test file:
Not usre how much in-memory h2 database consume resources. We could use only one test class (in-memory h2) to do testing.
|
|
8/1/2017 3:14:21 PM
commit d1b836a.
Conflicts:
azkaban-common/src/main/java/azkaban/executor/Status.java
azkaban-exec-server/src/main/java/azkaban/execapp/JobRunner.java
From @jamiesjc: When we kill the flow immediately after it starts in our integration test, it couldn't be killed due to some race condition. Users will see on the execution page that the job is in KILLING status but it actually never gets killed. And users cannot click kill button again during the KILLING period.
At the time when we kill, the process might not have started yet or the jobRunner has not yet been added to the activeJobRunners.
You can check more details in the PR description: #1289. We are still working on the fix.
The intention is to reintroduce this commit once the underlying bug is fixed.
|
7/10/2017 7:30:10 PM
(#1235)" (#1269)
The PR #1235 (commit 68e507208941659026ac15a1b242364b25f5fa31) introduces a bug which breaks Azkaban's capability to handle multiple HCAT servers. More specifically #1235 introduces a factory class for the Hive Client and caches the `HiveConf` during construction. When fetching delegation tokens from other HCAT servers, it accidentally uses the cached configuration instead of creating a new HiveConf specific to the target HCAT server. Hence it ends up fetching tokens from `HIVE_HOME` configured HCAT server instead of the target HCAT server leading to SASL errors.
Reverting the change. Confirmed that `HadoopSecurityManager_H_2_0` is identical to the version prior to #1235
|
7/10/2017 7:30:10 PM
(#1235)" (#1269)
The PR #1235 (commit 68e507208941659026ac15a1b242364b25f5fa31) introduces a bug which breaks Azkaban's capability to handle multiple HCAT servers. More specifically #1235 introduces a factory class for the Hive Client and caches the `HiveConf` during construction. When fetching delegation tokens from other HCAT servers, it accidentally uses the cached configuration instead of creating a new HiveConf specific to the target HCAT server. Hence it ends up fetching tokens from `HIVE_HOME` configured HCAT server instead of the target HCAT server leading to SASL errors.
Reverting the change. Confirmed that `HadoopSecurityManager_H_2_0` is identical to the version prior to #1235
|
6/22/2017 4:12:52 PM
reverts PR #1225 (commit e1f6d3942b29d917aa2afdd94cd2f7ce51a6f9e3)
This plugin breaks the internal build.
|
6/13/2017 8:34:04 PM
authorizing prior to any HDFS call, the HDFS storage code does not seem to work without an existing Kerberos session. The problem is that the `FileSystem.get()` API needs to be called after a successful UGI login. Currently that is not the case. The Hadoop code caches the logged in user at the time of creating the File System object and this is not reflected in the stack traces making it difficult to debug.
Fix: ensure UGI auth prior to creation of file system object.
|
6/12/2017 5:01:58 PM
reverts commit ffd9ebd.
The problem is not in the "fails fast" strategy per se, but there's another bug that would cause a job in a subflow to fail instead of running it, when two subflows contain jobs of the same name. It's because current DB record key for executable node is (execution id, job id, attempt). So the problem only happens when two subflows contain two jobs with the same name.
|
6/6/2017 10:45:12 AM
[MutableConstantField] Constant field declarations should use
the immutable type (such as ImmutableList) instead of the general
collection interface type (such as List) private static final Set<String> MEM_KEYS = ImmutableSet ^ (see http://errorprone.info/bugpattern/MutableConstantField)
|
|
6/1/2017 3:20:21 PM
(#1139)
In this patch, I added two method setTriggerCondition and setExpireCondition to the Trigger Class. The reason behind is that one of our products relying on Azkaban library needs to modify Trigger object by inserting newly constructed ExpireCondition.
|
5/30/2017 2:12:18 PM
(#1108)
* fix AZKABAN_OPTS handling in azkaban-{web,executor}-start.sh
If the value contains spaces, the comparison causes an error
to be logged. If the reference to AZKABAN_OPTS is wrapped in
quotes, spaces are handled properly.
|
|
5/15/2017 6:36:48 PM
improve logging
|
5/8/2017 6:18:21 PM
finis… (#1062)
* Fix bugs in resubmitting flow and fetching flow logs after flow finishes.
After removing runningFlows cache from web server, running flow info will be fetched from DB directly by ExecutorManager. This includes the flow updateTime which is updated by FlowRunnerManager when the flow finishes. Currently the UpdaterThread in ExecutorManager will send update request to executor to get updated flow info. UpdateTime now is synced between web server and executor since they all update and get from DB instead of cache, the previous logic of comparing updateTime should be changed.
TODO: updaterThread in executorManager should be removed in the future to simplify the logic. handleAjaxUpdateRequest() should be deprecated as well.
|
5/3/2017 8:40:33 PM
String is more generic and gives flexibility to the storage implementation to structure their own keys.
For example, the storage layer may use a JSON string as an identifier. This is not currently a use case, but going forward it would probably be better to not impose an unnecessary restriction on a top level API.
Tested separately on dev clusters.
|
5/2/2017 7:36:33 PM
(#1049)
This is required by the internal build system and it needs to be fixed
prior to clean up of this code
|
4/18/2017 1:48:17 PM
three errors.
"installedVersions.remove(versionKey);"
The type of the value doesn't match the actual type defined in the list.
Upon close inspection, it turns out that if the entry were
correctly removed, it would cause a logical error since the
for loop access the list by an index the list should not be modified
within the same loop.
|
4/10/2017 5:36:04 PM
#975 reported a bug that the name of moment-timezone js is not
right, and gradle can not fetch it to distribution package.
Tried ligradle build, and figured out that the latest moment-timezone js 0.5.13 is not compatible iwht 0.5.5, since the js file built's name doesn't keep the same.
We use `^` to specify compatible js libraries (see details in https://docs.npmjs.com/misc/semver). In this change, we disable it to in order to secure repo.
|
4/4/2017 7:59:23 PM
(#955)
If a flow run for longer than 10 days (configurable), kill it.
|
|
3/7/2017 2:46:16 PM
Code Refactor and adding key metrics
This commit adds and fixes the following metrics:
* Flow failure rate
* project zip uploading time duration log
* correct the right NumRunningFlows in web server
* add Jetty thread pool metrics as a Web Server performance monitor
* adding measure DB connection time as a metrics for DB connection pool performance
|
2/3/2017 9:22:03 PM
up by next execution. (#892)
When manually start a flow, the initial job disabled/enabled state ( if a job is disabled or not ) should come from the last ( and only ) schedule of the flow if any modification to job states are made. The current behavior is that the initial state will be the state when the flow is uploaded. However when a job state is changed in a manual execution, the state is not persisted.
|
|
1/4/2017 10:36:55 PM
jobtypes. (#866)
|
12/5/2016 6:51:29 PM
under utils that will tie out and err to log4j. This
class does this in the main functions of AzkabanWebServer and
AzkabanExecServer, so that all out going messages will be caught by
log4j.
|
12/1/2016 6:40:01 PM
current LI build infrastructure generates version field in a lazy fashion. More specifically, the version value is not immediately available during the root project configuration phase. While this works seamlessly in the open source build, it generates incorrect version number in the jar manifest. Solution was to simply make this configuration lazy using afterEvaluate.
|
11/30/2016 12:28:26 AM
is allowed to perform deletion of projects
This is the prequisite of making project directory as shared among multiple deployments
|
11/29/2016 7:45:14 PM
(#820)"
This reverts commit 6207561922380f4dbed5f09e7b390e3a39d6779a.
|
11/28/2016 3:01:16 PM
error output to a log file correctly (#833)
Previously the error output was still sent to the stdout.
|
11/16/2016 10:31:41 PM
a chunk
|
|
11/15/2016 4:34:18 PM
the deleted files are being written into Azkaban jars. These
do not belong here, and can cause hidden configurations from being used.
Consequently, azkaban-solo-server will not work as it relies on these
files, so solo-server was given its own conf directory.
Also, h2 was added as a runtime dependency for solo-server, which
happened to be missing for the past little while.
|
11/6/2016 8:57:38 PM
web server (#758)
- Add method in WebUtils class to parse real client ip given HTTP headers map and client remote address - Add logic into abstract login servlet to correctly validate client IP vs IP stored for a session - Add logic in REST API to do the same - Unit tests for the above
|
10/31/2016 10:40:05 PM
(#786)
* Warn Users a pop up window if the cron expression string is not correct
* handle both 6 and 7 fields cases
* replace error code by Constant String
* add unit test
* update version number
|
|
10/18/2016 1:53:54 AM
in the distribution (#772)
Since our internal build system uses a custom build directory, the distribution created was missing some files. Changing to $buildDir removes such issues.
|
10/10/2016 7:03:56 PM
(#762)
Token toString() and getIdentifier() may reveal sensitive information that should not be visible in logs, if the logs are to be publicly visible. Hence, removing logging statements that include this information.
|
9/26/2016 5:42:53 PM
(#751)
|
|
7/29/2016 1:16:31 PM
versioning and refactor of build scripts
Added plugin group: 'com.cinnober.gradle', name: 'semver-git', version: '2.2.0' to enable semantic versioning of azkaban artifacts.
Refactor: Moved all subproject build code to respective projects.
* Build versioning and refactor of build scripts
Incorporating review comments - restored .gitignore filter on build - switching semver plugin to 2.2.1 which works with Java 7
|
|
9/21/2015 9:39:59 PM
up individual krb5cc files for each individual job. prevents c…
|
|
10/20/2014 3:54:40 PM
EscapeTool to able to escape certain data element to prevent XSS #339
|
8/1/2014 5:42:20 PM
NPE when accessing non-existent flow - HADOOP-6388
|
|
4/30/2014 2:41:09 PM
expression
|
4/22/2014 12:35:53 AM
accessor to the AzkabanWebServer for plugins
|
2/25/2014 7:19:17 PM
release-2.5
Conflicts:
build.xml
|
2/18/2014 10:48:30 PM
for better hadoop and hive jar/conf classpath resolution
|
2/6/2014 2:50:17 AM
#143 and 146- Cancel related bugs
|
2/5/2014 3:58:11 AM
schedule buttons on Flow Summary
|
2/4/2014 3:41:59 AM
which version of azkaban is built.
|
1/29/2014 1:21:58 AM
don't exist, pull from the build.properties.
|