Skip to main content

Using Airflow sensors on Conveyor

Introduction

In Apache Airflow, Sensors are a specialized type of operator designed to monitor the state of external systems or specific conditions within a workflow. They effectively "pause" the execution of a task until a certain criterion is met, such as the availability of a file, the completion of another task, or the arrival of a specific time. Sensors play a crucial role in orchestrating complex workflows, ensuring that downstream tasks are only triggered when the necessary dependencies are satisfied, thereby enhancing the reliability and efficiency of data pipelines.

As indicated by their name, sensors achieve this behavior by periodically running their condition check. Most Airflow sensors support two different strategies to execute their condition checks: poke and reschedule. These modes are supported for the sensors that come with Airflow by default (such as the ExternalTaskSensor), as well as for the sensors specific to Conveyor, such as the ConveyorContainerSensor and ConveyorExternalTaskSensor.

Poke Mode

In this mode, the sensor continuously "pokes" the external system or checks the condition at regular intervals defined by the poke_interval. It remains in an active state for the entire duration of the sensor, consuming resources such as a worker slot, until the condition is met or the sensor times out. While reliable, this approach can be resource-intensive, especially in large-scale workflows with many sensors.

Reschedule Mode

This mode is designed to be more resource-efficient. Instead of continuously checking, the sensor checks the condition and then suspends itself, freeing up the worker slot. After a defined period (also controlled through the poke_interval), the sensor resumes and checks the condition again. This reduces resource usage, particularly in environments with long wait times between checks.

Running on Conveyor

By default, all workloads in Conveyor (including Airflow sensors) will be run on spot instances to optimize costs. This comes with the caveat that these workloads should be fault-tolerant. Given that sensors are usually intended to run more than once, this is typically not an issue - provided that the Airflow tasks are properly configured.

For sensors using the reschedule mode, Conveyor will automatically detect spot interrupts, and recover from them by simply scheduling the sensor again (this means that one attempt of the sensor check is effectively skipped if it gets spot interrupted).

For sensors using the poke mode, Conveyor cannot recover from spot interrupts automatically, as this sensor type is running continuously. In this case, the user should configure the retries and retry_delay parameters with appropriate values. If no retries are configured on a sensor running in poke mode, a spot interrupt will cause the sensor to end up in the Failed state.

Recommendations

In general, we recommend to run sensor tasks using the reschedule mode and a poke_interval of at least 60 seconds (preferably more) to maintain a resource-efficient system. If your workflow requires lower latencies than one check per 60 seconds, we recommend switching to poke mode instead, as rescheduling more often than once per minute is wasteful as well. In this case, users should take care to properly configure the settings described above.