Common issues
Table of Contents
- When building my project, I get the error "No space left on device"
- When building my project, I get a "DAG validation failed error"
- I want to use a Docker build option with the Conveyor CLI
- Logs don't show up in the Conveyor UI
- CLI login fails, but logging into the Conveyor UI works
- Conveyor CLI cannot connect to the Docker daemon
- Command or entrypoint error detected
- My job failed due to a spot interrupt
When building my project, I get the error "No space left on device"
This could be an error similar to the one below.
> conveyor build
...
> INFO:root:{'stream': '\x1b[91mCould not install packages due to an EnvironmentError: [Errno 28] No space left on device\n\n\x1b[0m'}
This means Docker has no disk space left to create your image. You can use the following command to clean up the Docker images built by Conveyor:
conveyor cleanup
If you still do not have enough disk space, you can also execute the following command to clean up all Docker images.
docker system prune -a
When building my project, I get a "DAG validation failed error"
This can be something similar like the error below:
> conveyor build
Failed creating the container: Error response from daemon:
invalid mount config for type "bind": bind source path does not exist: /tmp/validateDags
This typically means that your container runtime (Docker, Podman) cannot access the folder that Conveyor is trying to mount. Please make sure that the correct paths are visible/mounted for your container runtime.
By default, Conveyor will attempt to mount temporary files in the temporary directory of your OS.
In case you provide the environment variable CI=true
, Conveyor will create temporary folders inside your working directory instead.
This is mainly useful in CI environments, where no temporary directory exists.
However, some distributions of Docker only allow accessing folders in the home directory (notably the version distributed by Snapcraft).
In this case, setting the CI=true
environment variable can help you work around this limitation.
Alternatively, you might want to install Docker directly from their repositories for a more traditional behavior.
I want to use a Docker build option with the Conveyor CLI
The Conveyor CLI allows directly passing options supported by your chosen
container engine via the --build-opts
argument.
This argument takes a single space-separated string as value,
so you can just provide the argument list to Conveyor as you would normally supply it to Docker or Podman.
To create a Conveyor build without using the image cache, you can supply the --no-cache
option as follows:
> conveyor build --build-opts="--no-cache"
Logs don't show up in the Conveyor UI
This might happen if you are using print statements for debugging in Pyton. Print statements are only flushed after a certain time or once the buffer is full. When your application fails during the first few seconds of it starting up, it could be the case that this flush never took place. To fix this, it is better to use the Python logging framework. So if you have code like this:
print("I want to debug my code")
Please change it to:
import logging
import sys
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.debug("I want to debug my code")
The first statement logging.basicConfig
should only be done once.
You can set the level here to suppress some logs.
Using the logging library has many advantages over just using print statements. One of these is that they are immediately flushed and thus are shown quickly both in Airflow and the Conveyor UI.
CLI login fails, but logging into the Conveyor UI works
If you are using the Brave browser, and you have Brave shields enabled, requests to localhost are blocked. Unfortunately, this functionality is currently needed for the CLI login to work. More info on the root cause can be found here. Until a proper fix is supported by Brave, you can work around the issue by either using Firefox or Chrome, or by disabling Brave shields.
Conveyor CLI cannot connect to the Docker daemon
When running a Conveyor command, I get the following error:
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
This means that Conveyor is unable to use Docker, which can be caused by:
- Docker is not installed on your local machine. Take a look at the details on how to install Docker for your operating system.
- The Docker daemon is not running. Verify whether you can run
docker pull ubuntu
. If this does not work either, please start the Docker daemon. - You are running on macOS and the Docker socket is located at
~/.docker/run/docker.sock
instead of/var/run/docker.sock
. Look here in order to fix the issue.
Command or entrypoint error detected
This failure can happen when the container can not be started up correctly because the command to start up the container fails. Every container is launched with the default entrypoint or the specified command of the user.
The entrypoint can be specified in the Dockerfile:
FROM python
...
ENTRYPOINT my_executable
If you specify the entrypoint but no command in the DAG, the specified executable will be executed when the container is launched. If this entrypoint does not exist, the container will fail to run. You can test this out by first building your container, and then running it:
docker build . -t testimage
docker run -it testimage
The run will fail with the same issue as is specified on Conveyor.
To fix this, make sure the executable name is correct and is registered in the $PATH
.
You can override the entrypoint in your DAG by specifying the cmds
variable:
from conveyor.operators import ConveyorContainerOperatorV2
ConveyorContainerOperatorV2(
task_id="a-task",
cmds=["python4", "myapp.py"],
)
To test this locally, you can run the following commands:
docker build . -t testimage
docker run -it testimage python4 myapp.py
You see that in the run we added the same cmds
arguments that were specified in the ConveyorContainerOperatorV2
.
In this way, you can test your argument config locally and verify that your fix is working.
My job failed due to a spot interrupt
This means that your job was running on a spot instance and was interrupted by AWS because AWS reclaimed the instance in order to fulfil the request for another customer. This can happen for multiple components in Conveyor, such as:
- The Airflow worker running on a Kubernetes instance
- The python/dbt/spark containers running on a Kubernetes instance
In Airflow, you can see the following error message in the Airflow logs:
[2024-03-12, 00:00:00 UTC] {taskinstance.py:2480} ERROR - Received SIGTERM. Terminating subprocesses.
[2024-03-12, 00:00:00 UTC] {conveyor_container_hook_v2.py:222} INFO - Cleaning up the container application
[2024-03-12, 00:00:00 UTC] {__init__.py:51} INFO - Received spot node interrupt for node with id i-0cd3c61c6e42a1234 and name ip-10-0-000-001.eu-west-1.compute.internal
[2024-03-12, 00:00:00 UTC] {__init__.py:14} ERROR - The airflow executor is shutting down because of a spot interrupt.
For Conveyor task executions, this is shown in the status field of the task in the Conveyor UI.
If this happens, you can either:
- Rerun the job, which will start a new spot instance and run the job again. As long as your operation is idempotent and there are no strict deadlines for the job, this is the most cost-effective approach.
- Use on-demand instances, which are more expensive but are never interrupted by AWS. This is the best approach if either performance or reliability is critical.