Skip to content

Workflow Engines Overview

Everyone's computing needs are different, so we ensured that quacc is interoperable with a variety of modern workflow management tools. There are 300+ workflow management tools out there, so we can't possibly support them all. Instead, we have focused on a select few that adopt a similar decorator-based approach to defining workflows with substantial support for HPC systems.

Summary

Recommendations

Not sure which to choose? In general, we recommend starting with Parsl for most HPC users. For a more feature-rich workflow orchestration platform, we recommend trying Prefect or Covalent depending on your needs. Some additional opinions on the matter:

  • Covalent: You want a visual dashboard and are prioritizing the use of distributed compute resources, especially cloud compute.
  • Dask: You already are familiar with the Dask ecosystem and are happy to stick with it.
  • Parsl: You want to run many workflows as fast as possible on one or more job scheduler-based HPC machines.
  • Prefect: You want a visual dashboard with a robust workflow management platform and are familiar with the basic concepts of workflow orchestration.
  • Redun: You are running calculations on AWS.
  • Jobflow: You are affiliated with the Materials Project or are already using Jobflow and/or FireWorks.

Covalent is a user-friendly workflow management solution from the company Agnostiq.

Pros:

  • Excellent visual dashboard for job monitoring
  • Easy to use in distributed, heterogeneous compute environments
  • Excellent documentation
  • Automatic and simple database integration
  • The compute nodes do not need to be able to connect to the internet

Cons:

  • It requires a centralized server to be running continuously in order to manage the workflows unless using Covalent Cloud
  • Support for job scheduler HPC environments is available but not as robust or performant as other solutions
  • High-security HPC environments may be difficult to access via SSH with the centralized server approach
  • Not as widely used as other workflow management solutions

Dask is a popular parallel computing library for Python. We use Dask Delayed for lazy function execution, Dask Distributed for distributed compute, and (optionally) Dask-Jobqueue for orchestrating the execution on HPC machines.

Pros:

  • Extremely popular
  • Has native support for running on HPC resources
  • It does not involve a centralized server or network connectivity
  • Supports adaptive scaling of compute resources
  • The dashboard to monitor resource usage is very intuitive

Cons:

  • If the Dask cluster dies, there is no mechanism to gracefully recover the workflow history
  • Monitoring job progress is more challenging and less detailed than other solutions
  • The documentation, while comprehensive, can be difficult to follow given the various Dask components
  • Calculations cannot be submitted remotely or across disparate compute resources

Parsl is a workflow management solution out of Argonne National Laboratory, the University of Chicago, and the University of Illinois. It is well-adapted for running on virtually any HPC environment with a job scheduler.

Pros:

  • Extremely configurable and deployable for virtually any HPC environment
  • Quite simple to define the workflows and run them from a Jupyter Notebook
  • Thorough documentation and active user community across academia
  • Well-suited for pilot jobs and advanced queuing schemes
  • Does not rely on maintaining a centralized server

Cons:

  • The number of different terms can be slightly overwhelming to those less familiar with HPC
  • Monitoring job progress is more challenging and less detailed than other solutions
  • Debugging failed workflows can be difficult
  • The pilot job model is often a new concept to many HPC users that takes some time to understand

Prefect is a workflow orchestration tool that is popular in the data engineering community. It has an excellent dashboard for monitoring workflows.

Pros:

  • Quite popular among the data engineering community
  • Excellent web-based dashboard for monitoring workflow progress
  • The free version of Prefect Cloud is reasonably generous
  • Can use advanced queueing schemes to manage workflow execution
  • New features are being added regularly and rapidly

Cons:

  • For those who are less HPC-savvy, some of the concepts can be quite technical
  • If using Prefect Cloud, the compute nodes must have a network connection
  • The dashboard, while useful for monitoring successes and failures, is not ideal for analyzing results
  • The software is geared more towards data engineering than scientific computing, and that is reflected in the features and documentation

Redun is a flexible workflow management program developed by Insitro.

Pros:

  • Extremely simple syntax for defining workflows
  • Has strong support for task/result caching
  • Useful CLI-based monitoring system
  • Very strong AWS support

Cons:

  • Currently lacks support for typical HPC job schedulers
  • No user-friendly GUI for job monitoring
  • Does not have a particularly active user community
  • Not updated frequently

Jobflow is developed and maintained by the Materials Project team at Lawrence Berkeley National Laboratory and serves as a seamless interface to FireWorks or Jobflow Remote for dispatching and monitoring compute jobs.

Warning

Jobflow is not yet compatible with the @flow or @subflow decorators used in many quacc recipes and so should only be used if necessary. See this issue to track the progress of this enhancement.

Pros:

  • Native support for a variety of databases
  • Directly compatible with Atomate2
  • Designed with materials science workflows in mind
  • Actively supported by the Materials Project team

Cons:

  • Is not fully compatible with all the features of quacc
  • Parsing the output of a workflow is not as intuitive as other solutions
  • Defining dynamic workflows with Jobflow's Response object can be more complex than other solutions
  • FireWorks is not the most user-friendly, and Jobflow Remote is in active development