10/10/2017 8:19:51 PM
a problem when a queued job would disappear from the job list view when its state changes to QUEUED.
Currently, you can access job logs for current and failed attempts in the Job List tab of an execution.
If some job failed and has retries, it goes into QUEUED state while sleeping before the next attempt.
If you want to see why any of the previous attempts failed, you can right-click the grey attempt bar
To see how long it's going to sleep, you can check the job log
Without this fix you can't do either of those if the job is sleeping, because it's hidden if the state is QUEUED.
Also if you have a job that hasn't yet started to run but is in QUEUED state (ie. blocked by pipeline), if you want to know why it's queued, you should check the job log. If you can't access the job log via Job List tab, this is not easily accomplished.
|
10/10/2017 5:05:27 PM
PR proposes bringing quartz to AZ. Quartz Scheduler provides 3 modes of job stores: RAMJobStore, JDBCJobStore, and TerraCottaJobStore. In order to persist the job statuses out of Application memory regardless of downtime, we recommend using JDBC JobStore. In fact, users can specify the mode in quartz properties.
This code patch creates Quartz server and manages Quartz API (e.g., start, pause, shut down) at class QuartzScheduler. We directly use h2-in-memory DB to directly test JDBC Quartz in unit test.
|
10/6/2017 3:21:51 PM
add flow_id to key set of execution_jobs table
2. drop two unnecessary indexes.
Previously primary key set of execution_jobs is (execution_id, job_id, attempt).
This is under the assumption where no job shares the same name within a flow. However this is not the case when embedded flow comes into picture, which allows user to embed a smaller flow into multiple places inside a bigger flow.
With new keyset, two indexes becomes a subset of the key set, so we can drop them.
|
10/5/2017 6:43:18 PM
compileOnly, which caused web-server and exec-server can not initialize Guice successfully if users don't specify their own Hadoop jars in the classpath.
```
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethods(Class.java:1975)
at com.google.inject.internal.ProviderMethodsModule.getProviderMethods(ProviderMethodsModule.java:132)
at com.google.inject.internal.ProviderMethodsModule.configure(ProviderMethodsModule.java:123)
at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:349)
at com.google.inject.spi.Elements.getElements(Elements.java:110)
at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
at com.google.inject.Guice.createInjector(Guice.java:99)
at com.google.inject.Guice.createInjector(Guice.java:73)
at com.google.inject.Guice.createInjector(Guice.java:62)
at azkaban.soloserver.AzkabanSingleServer.main(AzkabanSingleServer.java:75)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more
```
This patch creates a new HadoopModule, which places Hadoop-related dependencies. we only install this module when Hadoop related injection is needed. Today only HDFS_Storage needs it.
|
10/5/2017 2:41:54 PM
flows (#1511)
The broken graph attached to the bug #1508 happens because animatePolylineEdge fails: it assumes that each edge has always a number of edge.oldpoints equals to the number of newPoints, which sometimes fails.
newPoints is fed by the moveNodeEdges: it iterates through all the edge.guides (the segments composing the lines connecting the graph nodes) and adds the starting and ending points; this makes sense, because the assumption edge.guides.length == edge.points.length - 2 should always hold.
The bug happens because this assumption is broken: in layoutGraph we sometimes add a new segment to edge.guides but we forget to update the points accordingly.
This patch fixes the problem: when a new segment is added to an edge (see the code after the // Add gap comment) we also add an extra point, therefore assuring the assumption holds, and allowing the animatePolylineEdge to work properly.
|
10/5/2017 2:41:04 PM
were ignored, but now enabled are:
* testFailedRun
* testCancelRun
* testDelayedExecutionJob
|
10/5/2017 12:19:40 PM
(#1524)
This is the platform part of the earlier PR #1332 which was reverted.
Summary:
With this change, support for reporting events of interest from Azkaban has been provided.
Users should provide an implementation of the AzkabanEventReporter interface.
To begin with, the following events are reported. 1. FLOW_STARTED 2. FLOW_FINISHED 3. JOB_STARTED 4. JOB_FINISHED
In future, this can be easily extended to report other events.
Configuration changes:
Note: All changes must be applied to the executor server or the solo-server if in solo mode.
// Property is used to enable/disable event reporting with the default being false.
azkaban.event.reporting.enabled=false
// Implementations of the reporter to be specified using this property.
azkaban.event.reporting.class=com.foo.EventReporterImpl
// Kafka topic name for the default implementation.
azkaban.event.reporting.kafka.topic=TestTopicName
// Kafka broker list for the default implementation.
azkaban.event.reporting.kafka.brokers=hostname.com:port_num
// Schema registry server for the default Kafka implementation.
azkaban.event.reporting.kafka.schema.registry.url=schemaRegistryUrl.com:port/schema
|
10/4/2017 2:37:31 PM
ajax action name 'loadFlow' for backward compatibility.
This resolves issue #1506.
|
10/2/2017 9:01:10 PM
to Metastores in HA mode (#1491)
The logic that Azkaban uses to get the delegation token from ‘other_hcat_locations’ is trying to connect to all hcat servers in that field, and getting the delegation token from each hcat server. This is could cause delegation token confusion problems. Thus, we decide to introduce a new property field called “other_hcat_clusters”, which can also be set in the “properties” field in the workflow file. This field requires users to group the hcat servers to different clusters, in the form as “cluster1hcat01, cluster1cat02;cluster2hcat01,cluster2hcat02”. In our implementation, this string will be split by semicolon, and each group(cluster) will be traversed until we can get the delegation token from one of the machines in that cluster.
|
9/29/2017 6:49:41 PM
3.36.0
not run at all, but it has been marked as killed, the right thing to do is set the status to KILLED. Otherwise the flow would get stuck because this job would send the JOB_FINISHED event with the unfinished status: KILLING.
|