Data access
Table of Contents
I cannot access the files written to an S3 bucket in another account
When you write an object to an S3 bucket in another account, the Conveyor account will remain the owner of this object. Take the following steps to resolve this:
- Set the S3 Object Ownership setting to: Bucket owner preferred
- When you put files on S3, pass the acl: bucket-owner-full-control
boto3
s3.put_object(
Bucket=bucketname,
Key=filename,
Body=content_bytes,
ACL="bucket-owner-full-control"
)
Spark template
private val defaultConfiguration: Map[String, String] = Map(
"fs.s3.impl" -> "org.apache.hadoop.fs.s3a.S3AFileSystem",
"fs.s3a.canned.acl" -> "BucketOwnerFullControl",
"spark.serializer" -> "org.apache.spark.serializer.KryoSerializer",
"spark.sql.sources.partitionOverwriteMode" -> "dynamic"
)
PySpark template
spark_builder.config("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark_builder.config("fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
spark_builder.config("fs.s3a.canned.acl", "BucketOwnerFullControl")