Azkaban

Oozie

Job Definition:Both support workflow defined as DAG Job: Java properties file, *.job Job: Referred as “actions”, Workflow defined in  a XML file
Does not have notion of self contained workflow. A job can be depend on any other job in the system. A job has to specify its dependent in job file. XML define the starting point of workflow.

Special actions fork and join. Can have a sub-workflow defined in other XML.

Cyclical dependency is checked among the Job files. Job files must have unique name. XML validation
Job Submission Create a zip or tar ball of job files.Can include jar files needed to launch a job. Command-line  program for submitting jobs.
Another possible solution is to copy the zip in the desired directory and ask azkaban to reload. xml and jars should be placed in HDFS.Jars needed should be placed in ‘lib’.
Running a job HTTP POST request via curlor

Choose job in UI and click Run Now.

Runs it as Oozie client.Full path specification in HDFS is important.

Try to have versioning for different workflow (advantage: re-submission of job would not fail the previous job)

Azkaban launch the workflow by choosing last  nodes/jobs in the DAG. Launching job is specified in xml.
Driver program as child process of Azkaban process.Can limit number of simultaneous jobs.

DOS might happen when resources are occupied.

Oozie runs driver program as MapTask.

– Wrong scheduler info, might happen that drivers occupy these map slots.- If a map task fails then we have drill down to hadoop job and access log to see what happened.- Oozie performs less tasks so DOS is unlikely to happen.

Pic Courstesy: devianart.com