Skip to main content

Common issues

Table of Contents

When building my project, I get the error "No space left on device"

This could be an error similar to the one below.

> conveyor build
...
> INFO:root:{'stream': '\x1b[91mCould not install packages due to an EnvironmentError: [Errno 28] No space left on device\n\n\x1b[0m'}

This means Docker has no disk space left to create your image. You can use the following command to clean up the Docker images built by Conveyor:

conveyor cleanup

If you still do not have enough disk space, you can also execute the following command to clean up all Docker images.

docker system prune -a

When building my project, I get a "DAG validation failed error"

This can be something similar like the ones below:


> conveyor build
...
> Failed starting the Airflow DAG test container: Failed creating the container: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: <some_path>
> Failed creating the container: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: <some_path>

This typically means that your container runtime (podman, docker) cannot access the folder that we want to mount. Depending on where it is executed, we use the following folders for temporary files:

  • CI (identified by environment variable CI=true): /<current_workdir>/tmp/...
  • non CI: /<os_tmp_dir>/...

Make sure that the correct paths are visible/mounted for your container runtime.

Logs don't show up in the Conveyor UI

This might happen if you are using print statements for debugging in Pyton. Print statements are only flushed after a certain time or once the buffer is full. When your application fails during the first few seconds of it starting up, it could be the case that this flush never took place. To fix this, it is better to use the Python logging framework. So if you have code like this:

print("I want to debug my code")

Please change it to:

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.debug("I want to debug my code")

The first statement logging.basicConfig should only be done once. You can set the level here to suppress some logs.

Using the logging library has many advantages over just using print statements. One of these is that they are immediately flushed and thus are shown quickly both in Airflow and the Conveyor UI.

CLI login fails, but logging into the Conveyor UI works

If you are using the Brave browser and you have Brave shields enabled, requests to localhost are blocked. Currently, this functionality is needed for the CLI login to work. More info on the root cause can be found here. Until a proper fix is supported by Brave, you can work around the issue by either use Firefox/Chrome or by disabling Brave shields.

Conveyor CLI cannot connect to the Docker daemon

When running a Conveyor command, I get the following error:

Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

This means that Conveyor is unable to use Docker, which can be caused by:

  • Docker is not installed on your local machine. Take a look here for more details on how to install docker on your operating system.
  • The docker daemon is not running. Verify whether you can run docker pull ubuntu. If this does not work either, start the docker daemon.
  • You are running on macos and the docker socket is located at ~/.docker/run/docker.sock instead of /var/run/docker.sock. Look here in order to fix the issue.

Command or entrypoint error detected

This failure can happen when the container can not be started up correctly because the command to start up the container fails. Every container is launched with the default entrypoint or the specified command of the user.

The entrypoint can be specified in the Dockerfile:

FROM python
...
ENTRYPOINT my_executable

If you specify the entrypoint but no command in the DAG, the specified executable will be executed when the container is launched. If this entrypoint does not exist, the container will fail to run. You can test this out by first building your container, and then running it:

docker build . -t testimage
docker run -it testimage

The run will fail with the same issue as is specified on Conveyor. To fix this, make sure the executable name is correct and is registered in the $PATH.

You can override the entrypoint in your DAG by specifying the cmds variable:

from conveyor.operators import ConveyorContainerOperatorV2

ConveyorContainerOperatorV2(
task_id="a-task",
cmds=["python4", "myapp.py"],
)

To test this locally, you can run the following commands:

docker build . -t testimage
docker run -it testimage python4 myapp.py

You see that in the run we added the cmds we also specified in the ConveyorContainerOperatorV2, this way you can test this locally and verify that your fix is working.

My job failed due to a spot interrupt

This means that your job was running on a spot instance and was interrupted by AWS because AWS reclaimed the instance in order to fulfil the request for another customer. This can happen for multiple components in Conveyor, such as:

  • The Airflow worker running on a kubernetes instance
  • The python/dbt/spark containers running on a kubernetes instance

In Airflow, you can see the following error message in the Airflow logs:

[2024-03-12, 00:00:00 UTC] {taskinstance.py:2480} ERROR - Received SIGTERM. Terminating subprocesses.
[2024-03-12, 00:00:00 UTC] {conveyor_container_hook_v2.py:222} INFO - Cleaning up the container application
[2024-03-12, 00:00:00 UTC] {__init__.py:51} INFO - Received spot node interrupt for node with id i-0cd3c61c6e42a1234 and name ip-10-0-000-001.eu-west-1.compute.internal
[2024-03-12, 00:00:00 UTC] {__init__.py:14} ERROR - The airflow executor is shutting down because of a spot interrupt.

For Conveyor task executions, this is shown in the status field of the task in the Conveyor UI.

If this happens, you can either:

  • Rerun the job, which will start a new spot instance and run the job again. As long as your operation is idempotent and there are no strict deadlines for the job, this is the most cost effective approach.
  • Use on-demand instances, which are more expensive but are never interrupted by AWS. This is the best approach if either performance or reliability is critical.