Workflow Engines Overview¶

Everyone's computing needs are different, so we ensured that quacc is interoperable with a variety of modern workflow management tools. There are 300+ workflow management tools out there, so we can't possibly support them all. Instead, we have focused on a select few that adopt a similar decorator-based approach to defining workflows with substantial support for HPC systems.

Note

We are planning to further enhance the support for Prefect and Jobflow starting in fall, 2025.

Summary¶

Recommendations

Not sure which to choose? In general, we recommend starting with Parsl for most HPC users. For a more feature-rich workflow orchestration platform, we recommend trying Prefect or Jobflow depending on your needs. Some additional opinions on the matter:

Covalent: You want a visual dashboard and are prioritizing the use of distributed compute resources, especially cloud compute.
Dask: You already are familiar with the Dask ecosystem and are happy to stick with it.
Parsl: You want to run many workflows as fast as possible on one or more job scheduler-based HPC machines.
Prefect: You want a visual dashboard with a robust workflow management platform and are familiar with the basic concepts of workflow orchestration.
Redun: You are running calculations on AWS.
Jobflow: You are familiar with using MongoDB.

CovalentDaskParslPrefectRedunJobflow

Covalent is a user-friendly workflow management solution from the company Agnostiq.

Pros:

Excellent visual dashboard for job monitoring
Easy to use in distributed, heterogeneous compute environments
Excellent documentation
Automatic and simple database integration
The compute nodes do not need to be able to connect to the internet

Cons:

It requires a centralized server to be running continuously in order to manage the workflows unless using Covalent Cloud
Support for job scheduler HPC environments is available but not as robust or performant as other solutions
High-security HPC environments may be difficult to access via SSH with the centralized server approach
Not as widely used as other workflow management solutions

Dask is a popular parallel computing library for Python. We use Dask Delayed for lazy function execution, Dask Distributed for distributed compute, and (optionally) Dask-Jobqueue for orchestrating the execution on HPC machines.

Pros:

Extremely popular
Has native support for running on HPC resources
It does not involve a centralized server or network connectivity
Supports adaptive scaling of compute resources
The dashboard to monitor resource usage is very intuitive

Cons:

If the Dask cluster dies, there is no mechanism to gracefully recover the workflow history
Monitoring job progress is more challenging and less detailed than other solutions
The documentation, while comprehensive, can be difficult to follow given the various Dask components
Calculations cannot be submitted remotely or across disparate compute resources

Parsl is a workflow management solution out of Argonne National Laboratory, the University of Chicago, and the University of Illinois. It is well-adapted for running on virtually any HPC environment with a job scheduler.

Pros:

Extremely configurable and deployable for virtually any HPC environment
Quite simple to define the workflows and run them from a Jupyter Notebook
Thorough documentation and active user community across academia
Well-suited for pilot jobs and advanced queuing schemes
Does not rely on maintaining a centralized server

Cons:

The number of different terms can be slightly overwhelming to those less familiar with HPC
Monitoring job progress is more challenging and less detailed than other solutions
Debugging failed workflows can be difficult
The pilot job model is often a new concept to many HPC users that takes some time to understand

Prefect is a workflow orchestration tool that is popular in the data engineering community. It has an excellent dashboard for monitoring workflows.

Pros:

Quite popular among the data engineering community
Excellent web-based dashboard for monitoring workflow progress
The free version of Prefect Cloud is reasonably generous
Can use advanced queueing schemes to manage workflow execution
New features are being added regularly and rapidly

Cons:

For those who are less HPC-savvy, some of the concepts can be quite technical
If using Prefect Cloud, the compute nodes must have a network connection
The dashboard, while useful for monitoring successes and failures, is not ideal for analyzing results
The software is geared more towards data engineering than scientific computing, and that is reflected in the features and documentation

Redun is a flexible workflow management program developed by Insitro.

Pros:

Extremely simple syntax for defining workflows
Has strong support for task/result caching
Useful CLI-based monitoring system
Very strong AWS support

Cons:

Currently lacks support for typical HPC job schedulers
No user-friendly GUI for job monitoring
Does not have a particularly active user community
Not updated frequently

Jobflow is developed and maintained by the Materials Project team at Lawrence Berkeley National Laboratory and serves as a seamless interface to FireWorks or Jobflow Remote for dispatching and monitoring compute jobs.

Warning

Jobflow is not yet compatible with the @flow or @subflow decorators used in many quacc recipes and so should only be used if necessary. See this issue to track the progress of this enhancement.

Pros:

Native support for a variety of databases
Directly compatible with Atomate2
Designed with materials science workflows in mind
Actively supported by the Materials Project team

Cons:

Is not fully compatible with all the features of quacc
Parsing the output of a workflow is not as intuitive as other solutions
Defining dynamic workflows with Jobflow's Response object can be more complex than other solutions