In this seminar, we will learn the basics of workflow automation and orchestration framework.
We will also play with some workflow automation frameworks, such as:
- prefect
- airflow
A workflow automation and orchestration framework coordinates multiple tasks(often interdependent) so that complex
processes run reliably, observably, and efficiently without human intervention.
- Automation: Executes pre-defined tasks automatically (Execution.)
- Orchestration: Ensures multiple automated components run in the right order, with state tracking, error handling, and data passing between them. (Coordination)
| Component | Description | Prefect Analogy | Airflow Analogy |
|---|---|---|---|
| Task | The smallest executable unit (a function, script, or command). | @task decorator | PythonOperator, BashOperator, or any Airflow operator. |
| Flow / DAG | A sequence of tasks, rules, and dependencies designed to achieve a specific outcome. It can be represented as a directed acyclic graph. | @flow decorator | DAG object — defines Directed Acyclic Graph structure. |
| Task Dependency | Orders and relationships between tasks (upstream/downstream). | Python call order (task_a() → task_b()). Prefect infers dependencies automatically. | Explicitly defined using >> or << operators (task_a >> task_b). |
| Scheduler | Decides when to run flows (e.g., cron-like, event triggered). | Prefect deployments & schedules | Airflow Scheduler with schedule_interval (cron, timedelta, etc.). |
| Executor / Worker | Runs tasks on compute resources (local, Docker, cluster). | Prefect worker(pulls jobs from a Work Pool and executes flows.) | Executor — runs tasks via Celery, LocalExecutor, or KubernetesExecutor. |
| State Management | Tracks states of each task(e.g. running, success, retry, fail, cancel) | Prefect Orion state engine | askInstance and DagRun states (success, failed, up_for_retry, etc.). |
| Observer / Logger | Records logs, metrics, and events for debugging. | Prefect UI & logging | Airflow Web UI (http://localhost:8080) — DAG-based view. |
| Configuration Store | Holds environment variables, credentials, or secrets. | Prefect Blocks | Connections / Variables in Airflow (stored in metadata DB or environment). |
| API / Backend | Central coordination hub for flows and tasks. | Prefect API Server | Airflow Webserver + Scheduler + Metadata DB (tightly coupled). |
| Dimension |Automation |Orchestration| |Scope |Single task or script |Multiple tasks with dependencies| |Goal |Reduce manual execution |Ensure end-to-end process consistency| |Example | Run a Spark job daily | Run extract → transform → validate → load, in sequence|
Every task has a state (e.g., Pending, Running, Completed, Failed).
Orchestration Framework tracks transitions between states, which allows:
- Retry logic
- Conditional branching (if success → do next step)
The Orchestration Framework
- ensures tasks dependency defined in a dag:
extract → transform → validate → load → notify - tracks inputs and outputs, making debugging easier.
The Orchestration Frameworks manage when to run workflows:
- Time-based: cron-like (daily, weekly)
- Event-based: file arrival, API signal, database change
- Manual trigger: via CLI or API
Built-in resilience:
- Retry failed tasks up to N times
- Pause or continue depending on conditions
- Mark run as
FailedorRecovered
Orchestration provides visibility into:
- Execution time per task
- Failure reasons
- Logs and stdout
- Resource usage
In Prefect, this is visible in the Orion dashboard and stored in the SQLite/PostgreSQL backend.
Flows can take parameters to control behavior:
@flow
def etl(dataset: str, date: str):
...
Enables reusing the same logic for multiple datasets.
The orchestrator separates workflow logic from runtime.
With the same workflow code, you can:
- Run locally for testing.
- Move to a cluster (Kubernetes, Dask, Ray).
┌──────────────────────┐
│ User / Developer │
└─────────┬────────────┘
│
▼
┌─────────────────┐
│ Flow Definition │ ← Python, YAML, or JSON
└────────┬────────┘
│
▼
┌─────────────────┐
│ Scheduler / API │ ← Prefect Server / Cloud
└────────┬────────┘
│
┌──────────┴──────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Worker Node A │ │ Worker Node B │ ← Executes tasks
└─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Logging & UI │ ← Observability
└─────────────────┘
| Category | Benefit |
|---|---|
| Reliability | Automated recovery and retries |
| Reproducibility | All runs tracked and logged |
| Scalability | Parallel task execution |
| Auditability | Historical logs and metadata stored |
| Maintainability | Modular and reusable workflow components |
| Security | Credential isolation through secrets and blocks |