Skip to main content

7. Run multiple tasks for a given project

At the moment, all the models in the same project as well as the tests run in the same task. This is fine for small projects, with a limited number of models and dependencies. However, when your project grows, you would like to split models over different tasks:

  • Simplify rerunning failed tasks
  • Speeds up running tasks as it can increase parallellism
  • Better overview of which model failed in Airflow
  • separate test tasks from model runs

For this reason, dbt supports selectors, which allows you to specify which models to run with a given command. This way you can choose which models to run for a given task

# Only run models in a given directory
dbt run --select models/example

# Only run models that have the nightly tag
dbt run --select tag:nightly

7.1 Create a second model and run it as a separate task

Let's create a second model that is just a copy from the raw_customers table by using the following code:

{{ config(materialized='external', location='s3://<conveyor_demo_XYZ>/model/customers.parquet') }}

with customers as (
select
id as customer_id,
first_name,
last_name
from {{ source('external_source', 'raw_customers') }}
)

SELECT * FROM customers
important

Do not forget to update the location property on line 1 with your actual bucket

Update the dags/$PROJECT_NAME.py file with a second ConveyorContainerTask and also make sure that both tasks select only 1 model as follows:

ConveyorContainerOperatorV2(
dag=dag,
task_id="task1",
arguments=["build", "--target", "dev", "--select", "customer_orders"],
...
)

ConveyorContainerOperatorV2(
dag=dag,
task_id="task2",
arguments=["build", "--target", "dev", "--select", "customers"],
...
)

We now have two tasks that each run and test 1 model. They can run in parallel since they do not depend on each other.

note

When using dbt and DuckDB, you can best put all models that depend on each other in 1 task (e.g. models that use the ref function). Models for distinct use cases that use data loaded from external sources can safely be separated in different tasks.

7.2 (Alternative) Try out the ConveyorDbtTaskFactory

The ConveyorDbtTaskFactory goes one step further and creates 1 run task and 1 test task for every model in your project.

The ConveyorDbtTaskFactory is not very that helpful when using duckdb since the model runs and tests are separated in different tasks. This implies that you need to persist the database state across task runs which is at the moment very difficult.

info

If you want more information on how to use the ConveyorDbtTaskFactory, take a look at the following how-to guide

7.3 Redeploy your code

Now, re-build the project and deploy to your environment.

conveyor build
conveyor deploy --env $ENVIRONMENT_NAME --wait

7.4 Re-run the tasks

The initial deployment of your project ran all models in the same task. We will instruct Airflow to re-run with the updated code that runs the 2 models in different tasks.

In the Conveyor UI navigate to your environment and open Airflow. Navigate to your project Dag and re-trigger the dag by clicking the > and selecting trigger dag in the Airflow UI. You should see that both tasks succeed.