Skip to main content

Configure fine-grained S3 access for Spark / PySpark jobs

Description

It is a best practice to create IAM roles using the least privilege permissions. For Spark jobs you do this, by restricting the S3 path your job has access to. The problem here is that Spark requires some minimal permissions in order to succeed.

How to do it

The driver and executors of your Spark job need to have access to the parent directory in which they write their data.

E.g., if you want to write to a prefix s3://my-bucket/my-project/dataset-1, you need to give the job access to write to s3://my-bucket/my-project/*.