Common issues

When building my project, I get the error "No space left on device"
When building my project, I get a "DAG validation failed error"
I want to use a Docker build option with the Conveyor CLI
Logs don't show up in the Conveyor UI
CLI login fails, but logging into the Conveyor UI works
Conveyor CLI cannot connect to the Docker daemon
Command or entrypoint error detected
My job failed due to a spot interrupt
My job has been running for much longer than expected

When building my project, I get the error "No space left on device"

This could be an error similar to the one below.

> conveyor build
...
> INFO:root:{'stream': '\x1b[91mCould not install packages due to an EnvironmentError: [Errno 28] No space left on device\n\n\x1b[0m'}

This means Docker has no disk space left to create your image. You can use the following command to clean up the Docker images built by Conveyor:

conveyor cleanup

If you still do not have enough disk space, you can also execute the following command to clean up all Docker images.

docker system prune -a

When building my project, I get a "DAG validation failed error"

This can be something similar like the error below:

> conveyor build

  Failed creating the container: Error response from daemon:
  invalid mount config for type "bind": bind source path does not exist: /tmp/validateDags

This typically means that your container runtime (Docker, Podman) cannot access the folder that Conveyor is trying to mount. Please make sure that the correct paths are visible/mounted for your container runtime.

By default, Conveyor will attempt to mount temporary files in the temporary directory of your OS. In case you provide the environment variable CI=true, Conveyor will create temporary folders inside your working directory instead. This is mainly useful in CI environments, where no temporary directory exists.

However, some distributions of Docker only allow accessing folders in the home directory (notably the version distributed by Snapcraft). In this case, setting the CI=true environment variable can help you work around this limitation. Alternatively, you might want to install Docker directly from their repositories for a more traditional behavior.

I want to use a Docker build option with the Conveyor CLI

The Conveyor CLI allows directly passing options supported by your chosen container engine via the --build-opts argument. This argument takes a single space-separated string as value, so you can just provide the argument list to Conveyor as you would normally supply it to Docker or Podman.

To create a Conveyor build without using the image cache, you can supply the --no-cache option as follows:

> conveyor build --build-opts="--no-cache"

Logs don't show up in the Conveyor UI

This might happen if you are using print statements for debugging in Pyton. Print statements are only flushed after a certain time or once the buffer is full. When your application fails during the first few seconds of it starting up, it could be the case that this flush never took place. To fix this, it is better to use the Python logging framework. So if you have code like this:

print("I want to debug my code")

Please change it to:

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.debug("I want to debug my code")

The first statement logging.basicConfig should only be done once. You can set the level here to suppress some logs.

Using the logging library has many advantages over just using print statements. One of these is that they are immediately flushed and thus are shown quickly both in Airflow and the Conveyor UI.

If you are using the Brave browser, and you have Brave shields enabled, requests to localhost are blocked. Unfortunately, this functionality is currently needed for the CLI login to work. More info on the root cause can be found here. Until a proper fix is supported by Brave, you can work around the issue by either using Firefox or Chrome, or by disabling Brave shields.

Conveyor CLI cannot connect to the Docker daemon

When running a Conveyor command, I get the following error:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

This means that Conveyor is unable to use Docker, which can be caused by:

Docker is not installed on your local machine. Take a look at the details on how to install Docker for your operating system.
The Docker daemon is not running. Verify whether you can run docker pull ubuntu. If this does not work either, please start the Docker daemon.
You are running on macOS and the Docker socket is located at ~/.docker/run/docker.sock instead of /var/run/docker.sock. Look here in order to fix the issue.

Command or entrypoint error detected

This failure can happen when the container can not be started up correctly because the command to start up the container fails. Every container is launched with the default entrypoint or the specified command of the user.

The entrypoint can be specified in the Dockerfile:

FROM python

...

ENTRYPOINT my_executable

If you specify the entrypoint but no command in the DAG, the specified executable will be executed when the container is launched. If this entrypoint does not exist, the container will fail to run. You can test this out by first building your container, and then running it:

docker build . -t testimage
docker run -it testimage

The run will fail with the same issue as is specified on Conveyor. To fix this, make sure the executable name is correct and is registered in the $PATH.

You can override the entrypoint in your DAG by specifying the command variable:

from conveyor.operators import ConveyorContainerOperatorV2

ConveyorContainerOperatorV2(
    task_id="a-task",
    command=["python4", "myapp.py"],
)

To test this locally, you can run the following commands:

docker build . -t testimage
docker run -it testimage python4 myapp.py

You see that in the run we added the same command arguments that were specified in the ConveyorContainerOperatorV2. In this way, you can test your argument config locally and verify that your fix is working.

My job failed due to a spot interrupt

This means that your job was running on a spot instance and was interrupted by AWS because AWS reclaimed the instance in order to fulfil the request for another customer. This can happen for multiple components in Conveyor, such as:

The Airflow worker running on a Kubernetes instance
The python/dbt/spark containers running on a Kubernetes instance

In Airflow, you can see the following error message in the Airflow logs:

[2024-03-12, 00:00:00 UTC] {taskinstance.py:2480} ERROR - Received SIGTERM. Terminating subprocesses.
[2024-03-12, 00:00:00 UTC] {conveyor_container_hook_v2.py:222} INFO - Cleaning up the container application
[2024-03-12, 00:00:00 UTC] {__init__.py:51} INFO - Received spot node interrupt for node with id i-0cd3c61c6e42a1234 and name ip-10-0-000-001.eu-west-1.compute.internal
[2024-03-12, 00:00:00 UTC] {__init__.py:14} ERROR - The airflow executor is shutting down because of a spot interrupt.

For Conveyor task executions, this is shown in the status field of the task in the Conveyor UI.

If this happens, you can either:

Rerun the job, which will start a new spot instance and run the job again. As long as your operation is idempotent and there are no strict deadlines for the job, this is the most cost-effective approach.
Use on-demand instances, which are more expensive but are never interrupted by AWS. This is the best approach if either performance or reliability is critical.

My job has been running for much longer than expected

Conveyor and Airflow track task completion by the completion of the container you are running for your task. It might happen that the application that you are running inside the container got into a state where it is no longer doing meaningful work, but also not completing.

Unfortunately, there is no way for Conveyor to distinguish between this state and a job that simply takes a long time to run. The recommended course of action is to review and debug your application logic to check where it can get stuck and mitigate this defect.

To avoid these application bugs from hijacking your data pipeline, you can limit the maximum runtime of your task in Airflow by setting a task timeout. When a task exceeds it's allotted time, Airflow will kill the task and mark it as failed. For most data processing tasks, the expected runtime is fairly stable. Setting this timeout parameter is therefore considered good practice where feasible.

Table of Contents​

When building my project, I get the error "No space left on device"​

When building my project, I get a "DAG validation failed error"​

I want to use a Docker build option with the Conveyor CLI​

Logs don't show up in the Conveyor UI​

CLI login fails, but logging into the Conveyor UI works​

Conveyor CLI cannot connect to the Docker daemon​

Command or entrypoint error detected​

My job failed due to a spot interrupt​

My job has been running for much longer than expected​

Table of Contents