azkaban-aplcache

Show queued jobs in the execution list (#607) The PR fixes …

10/10/2017 8:19:51 PM

a problem when a queued job would disappear from the job list view when its state changes to QUEUED.

Currently, you can access job logs for current and failed attempts in the Job List tab of an execution.

If some job failed and has retries, it goes into QUEUED state while sleeping before the next attempt.

If you want to see why any of the previous attempts failed, you can right-click the grey attempt bar
To see how long it's going to sleep, you can check the job log
Without this fix you can't do either of those if the job is sleeping, because it's hidden if the state is QUEUED.

Also if you have a job that hasn't yet started to run but is in QUEUED state (ie. blocked by pipeline), if you want to know why it's queued, you should check the job log. If you can't access the job log via Job List tab, this is not easily accomplished.

artem-garmash

Commit: 0022e52

Tree: bfc0afe

Parents: ca33c7f

Creating quartz scheduler (#1489) Inspired by #1488 , this …

10/10/2017 5:05:27 PM

PR proposes bringing quartz to AZ. Quartz Scheduler provides 3 modes of job stores: RAMJobStore, JDBCJobStore, and TerraCottaJobStore. In order to persist the job statuses out of Application memory regardless of downtime, we recommend using JDBC JobStore. In fact, users can specify the mode in quartz properties.

This code patch creates Quartz server and manages Quartz API (e.g., start, pause, shut down) at class QuartzScheduler. We directly use h2-in-memory DB to directly test JDBC Quartz in unit test.

Liang Tang

Commit: ca33c7f

Tree: fe93f03

Parents: d400859

add flow_id to key set of execution_jobs table (#1523) 1. …

10/6/2017 3:21:51 PM

add flow_id to key set of execution_jobs table
2. drop two unnecessary indexes.
Previously primary key set of execution_jobs is (execution_id, job_id, attempt).
This is under the assumption where no job shares the same name within a flow. However this is not the case when embedded flow comes into picture, which allows user to embed a smaller flow into multiple places inside a bigger flow.
With new keyset, two indexes becomes a subset of the key set, so we can drop them.

Cheng Ren

Commit: d400859

Tree: 4f87c54

Parents: 76f3114

create hadoop module (#1516) #1499 makes Hadoop dependencies …

10/5/2017 6:43:18 PM

compileOnly, which caused web-server and exec-server can not initialize Guice successfully if users don't specify their own Hadoop jars in the classpath.
```
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
	at java.lang.Class.getDeclaredMethods0(Native Method)
	at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
	at java.lang.Class.getDeclaredMethods(Class.java:1975)
	at com.google.inject.internal.ProviderMethodsModule.getProviderMethods(ProviderMethodsModule.java:132)
	at com.google.inject.internal.ProviderMethodsModule.configure(ProviderMethodsModule.java:123)
	at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:340)
	at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:349)
	at com.google.inject.spi.Elements.getElements(Elements.java:110)
	at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:138)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:104)
	at com.google.inject.Guice.createInjector(Guice.java:99)
	at com.google.inject.Guice.createInjector(Guice.java:73)
	at com.google.inject.Guice.createInjector(Guice.java:62)
	at azkaban.soloserver.AzkabanSingleServer.main(AzkabanSingleServer.java:75)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 14 more
```

This patch creates a new HadoopModule, which places Hadoop-related dependencies. we only install this module when Hadoop related injection is needed. Today only HDFS_Storage needs it.

Liang Tang

Commit: 76f3114

Tree: 6da89dc

Parents: fd8b752

fix bug #1508: fix graph visualisation when exploding nested …

10/5/2017 2:41:54 PM

flows (#1511)

The broken graph attached to the bug #1508 happens because animatePolylineEdge fails: it assumes that each edge has always a number of edge.oldpoints equals to the number of newPoints, which sometimes fails.

newPoints is fed by the moveNodeEdges: it iterates through all the edge.guides (the segments composing the lines connecting the graph nodes) and adds the starting and ending points; this makes sense, because the assumption edge.guides.length == edge.points.length - 2 should always hold.

The bug happens because this assumption is broken: in layoutGraph we sometimes add a new segment to edge.guides but we forget to update the points accordingly.

This patch fixes the problem: when a new segment is added to an edge (see the code after the // Add gap comment) we also add an extra point, therefore assuring the assumption holds, and allowing the animatePolylineEdge to work properly.

Stefano Parmesan

Commit: fd8b752

Tree: 3395910

Parents: 2a2975f

Enable all the rest in JobRunnerTest (#1479) The tests that …

10/5/2017 2:41:04 PM

were ignored, but now enabled are:

* testFailedRun
* testCancelRun
* testDelayedExecutionJob

Juho Autio

Commit: 2a2975f

Tree: f8ec39a

Parents: 0c2c1a1

Event reporting for azkaban events - No default implementation …

10/5/2017 12:19:40 PM

(#1524)

This is the platform part of the earlier PR #1332 which was reverted.

Summary:
With this change, support for reporting events of interest from Azkaban has been provided.
Users should provide an implementation of the AzkabanEventReporter interface.
To begin with, the following events are reported. 1. FLOW_STARTED 2. FLOW_FINISHED 3. JOB_STARTED 4. JOB_FINISHED

In future, this can be easily extended to report other events.

Configuration changes:
Note: All changes must be applied to the executor server or the solo-server if in solo mode.

// Property is used to enable/disable event reporting with the default being false.
azkaban.event.reporting.enabled=false

// Implementations of the reporter to be specified using this property.
azkaban.event.reporting.class=com.foo.EventReporterImpl

// Kafka topic name for the default implementation.
azkaban.event.reporting.kafka.topic=TestTopicName

// Kafka broker list for the default implementation.
azkaban.event.reporting.kafka.brokers=hostname.com:port_num

// Schema registry server for the default Kafka implementation.
azkaban.event.reporting.kafka.schema.registry.url=schemaRegistryUrl.com:port/schema

Ragesh Rajagopalan

Commit: 0c2c1a1

Tree: 5e93ac5

Parents: 367449b

Rename ajax loadFlow to fetchSchedules (#1515) Keep alternative …

10/4/2017 2:37:31 PM

ajax action name 'loadFlow' for backward compatibility.
This resolves issue #1506.

Juho Autio

Commit: 367449b

Tree: 682664c

Parents: ff221db

Enable new configuration other_hcat_clusters in AZ to talk …

10/2/2017 9:01:10 PM

to Metastores in HA mode (#1491)

The logic that Azkaban uses to get the delegation token from ‘other_hcat_locations’ is trying to connect to all hcat servers in that field, and getting the delegation token from each hcat server. This is could cause delegation token confusion problems. Thus, we decide to introduce a new property field called “other_hcat_clusters”, which can also be set in the “properties” field in the workflow file. This field requires users to group the hcat servers to different clusters, in the form as “cluster1hcat01, cluster1cat02;cluster2hcat01,cluster2hcat02”. In our implementation, this string will be split by semicolon, and each group(cluster) will be traversed until we can get the delegation token from one of the machines in that cluster.

Yuxiang(Chris) Chen

Commit: ff221db

Tree: d335edb

Parents: cb532be

Fix bug in killed quick finish (#1518) If a job is actually …

9/29/2017 6:49:41 PM

3.36.0

not run at all, but it has been marked as killed, the right thing to do is set the status to KILLED. Otherwise the flow would get stuck because this job would send the JOB_FINISHED event with the unfinished status: KILLING.

Juho Autio

Commit: cb532be

Tree: 9f30aba

Parents: 3fe6d05