Skip to main content

dbt-duckdb issues

Table of Contents

Issue accessing aws resources due to 403 error

If you are getting Forbidden errors when running dbt-duckdb and accessing s3 or glue. This might be caused by an issue in the duckdb-aws extension as described here. The issue can be identified by the following errors:

Runtime Error in model orders (models/test/orders.sql)
HTTP Error: Unable to connect to URL "https://bucket.s3.amazonaws.com/path-to.parquet": 403 (Forbidden)


Runtime Error in model copy-glue (models/test/copy-glue.sql)
HTTP Error: HTTP GET error on '/?encoding-type=url&list-type=2&prefix=pyspark3%2F' (HTTP 403)

The current workaround is to explicitly set the AWS environment variables before running the dbt command. Note: this will only work if the job takes less than an hour to complete as otherwise the credentials will be expired.

Create a file called expose_credentials.py with the following content:

import boto3
import os

def main():
session = boto3.Session()

AWS_ACCESS_KEY_ID = session.get_credentials().access_key
AWS_SECRET_ACCESS_KEY = session.get_credentials().secret_key
AWS_SESSION_TOKEN = session.get_credentials().token

print('export AWS_ACCESS_KEY_ID={}'.format(AWS_ACCESS_KEY_ID))
print('export AWS_SECRET_ACCESS_KEY={}'.format(AWS_SECRET_ACCESS_KEY))
print('export AWS_SESSION_TOKEN={}'.format(AWS_SESSION_TOKEN))


if __name__ == '__main__':
main()

Change the entrypoint of your dbt image to run the script called entrypoint.sh with the following content:

#/bin/bash

echo "getting credentials and exposing them as env variables"
eval `python3 expose_credentials.py`

echo "AWS_ACCESS_KEY_ID has value"
echo "$AWS_ACCESS_KEY_ID"

echo "running dbt command with arguments: $@"

dbt "$@"

To change the entrypoint, just add the following line to your Dockerfile and make sure you copied both files to your container:

ENTRYPOINT ["/bin/bash", "entrypoint.sh"]