dbt-duckdb issues
Table of Contents
Issue accessing aws resources due to 403 error
If you are getting Forbidden errors when running dbt-duckdb and accessing s3 or glue. This might be caused by an issue in the duckdb-aws extension as described here. The issue can be identified by the following errors:
Runtime Error in model orders (models/test/orders.sql)
HTTP Error: Unable to connect to URL "https://bucket.s3.amazonaws.com/path-to.parquet": 403 (Forbidden)
Runtime Error in model copy-glue (models/test/copy-glue.sql)
HTTP Error: HTTP GET error on '/?encoding-type=url&list-type=2&prefix=pyspark3%2F' (HTTP 403)
The current workaround is to explicitly set the AWS environment variables before running the dbt command. Note: this will only work if the job takes less than an hour to complete as otherwise the credentials will be expired.
Create a file called expose_credentials.py
with the following content:
import boto3
import os
def main():
session = boto3.Session()
AWS_ACCESS_KEY_ID = session.get_credentials().access_key
AWS_SECRET_ACCESS_KEY = session.get_credentials().secret_key
AWS_SESSION_TOKEN = session.get_credentials().token
print('export AWS_ACCESS_KEY_ID={}'.format(AWS_ACCESS_KEY_ID))
print('export AWS_SECRET_ACCESS_KEY={}'.format(AWS_SECRET_ACCESS_KEY))
print('export AWS_SESSION_TOKEN={}'.format(AWS_SESSION_TOKEN))
if __name__ == '__main__':
main()
Change the entrypoint of your dbt image to run the script called entrypoint.sh
with the following content:
#/bin/bash
echo "getting credentials and exposing them as env variables"
eval `python3 expose_credentials.py`
echo "AWS_ACCESS_KEY_ID has value"
echo "$AWS_ACCESS_KEY_ID"
echo "running dbt command with arguments: $@"
dbt "$@"
To change the entrypoint, just add the following line to your Dockerfile and make sure you copied both files to your container:
ENTRYPOINT ["/bin/bash", "entrypoint.sh"]