Skip to main content

AWS integration issues

Table of Contents

I cannot access the files written to an S3 bucket in another account

When you write an object to an S3 bucket in another account, the Conveyor account will remain the owner of this object. Take the following steps to resolve this:

  1. Set the S3 Object Ownership setting to: Bucket owner preferred
  2. When you put files on S3, pass the acl: bucket-owner-full-control

boto3

s3.put_object(
Bucket=bucketname,
Key=filename,
Body=content_bytes,
ACL="bucket-owner-full-control"
)

Spark template

private val defaultConfiguration: Map[String, String] = Map(
"fs.s3.impl" -> "org.apache.hadoop.fs.s3a.S3AFileSystem",
"fs.s3a.canned.acl" -> "BucketOwnerFullControl",
"spark.serializer" -> "org.apache.spark.serializer.KryoSerializer",
"spark.sql.sources.partitionOverwriteMode" -> "dynamic"
)

PySpark template

spark_builder.config("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark_builder.config("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
spark_builder.config("fs.s3a.canned.acl", "BucketOwnerFullControl")