Skip to main content

Using Azure IAM identities with Conveyor

This document describes how to create your Azure service principals and managed identities with Conveyor. You can specify the principal in the azure_application_client_id field in Airflow V2 Operators, IDEs, streaming jobs,... The azure_application_client_id field allows your job to use an Azure service principal to communicate to services like Azure Blob Storage, Purview, Azure SQL etc.

We use Azure workload identity federation with AKS, more information can be found here. This is the new and improved way to identify containers in AKS, replacing AAD Pod Identity.

Supporting workload identity in your Docker images

In order to authenticate your application, you should include the MSAL library corresponding to the programming language used in your Docker image. For (Py)Spark we already provide the correct library in our base images, but otherwise you should include one of the following:

The full list of MSAL libraries for other programming languages can be found here.

Alternatively, you can use the Azure SDK for your programming language of choice as defined here.

Configure federated identity credentials for Azure service principals

Configuring the service principal

In order for AKS to request tokens on behalf of your Azure service principal, you must create a federated identity credential between:

  • The Azure AD Service Principal
  • Your project running in a specific Kubernetes namespace

This way you create a trust relationship between your project in a specific Kubernetes namespace and the Azure Service Principal. When using Terraform, the federated identity credential can be created as follows:

resource "azuread_application" "projectA" {
display_name = "conveyor-projectA"
}

resource "azuread_application_federated_identity_credential" "projectA_env_dev" {
application_id = azuread_application.projectA.id
display_name = "kubernetes-federated-identity-${var.project_name}"
audiences = ["api://AzureADTokenExchange"]
issuer = var.oidc_issuer_url
subject = "system:serviceaccount:${var.environment}:${var.project_name}"
}

resource "azuread_service_principal" "projectA" {
client_id = azuread_application.projectA.client_id
app_role_assignment_required = false
}

When using the Azure Portal to create the federated identity credentials, use the following steps:

  • Go to your app registrations in Azure Active Directory
  • Select the application you want AKS to use
  • Go to Certificates & Secrets in the left menu
  • Select the Federated Credentials tab and click 'Add credential'
important

The service account name, which is the last part of the subject must be the same as the Conveyor project name. The subject looks as follows: system:serviceaccount:k8s_namespace:project_name

Give your Service Principal access to Azure resources

The last step is to give your service principal access to the necessary resources, including: storage containers, Azure SQL, Purview,... In order to give your service principal access to a blob storage container, you can use the following Terraform code:

resource "azurerm_storage_account" "samples" {
name = "samples"
resource_group_name = var.resource_group_name
location = var.resource_group_location
account_tier = "Standard"
is_hns_enabled = true
account_replication_type = "LRS"
allow_blob_public_access = false
shared_access_key_enabled = false
network_rules {
default_action = "Deny"
bypass = ["AzureServices"]
virtual_network_subnet_ids = [var.aks_subnet_id]
}
}

resource "azurerm_storage_container" "projectA" {
name = "projectA"
storage_account_name = azurerm_storage_account.samples.name
container_access_type = "private"
}

resource "azurerm_role_assignment" "blob_storage_read_access" {
scope = "/subscriptions/${var.subscription}/resourceGroups/${var.resource_group_name}/providers/Microsoft.Storage/storageAccounts/${var.samples_storage_account_name}"
role_definition_name = "Storage Blob Data Contributor"
principal_id = azuread_service_principal.projectA.id
}

Configure federated identity credentials for Azure user managed identities

Configuring the user managed identity

In order for AKS to request tokens on behalf of your user managed identity, you must create a federated identity credential between:

  • The user managed identity
  • Your project running in a specific Kubernetes namespace

This way you create a trust relationship between your project in a specific Kubernetes namespace and the user managed identity. When using Terraform, the federated identity credential looks as follows:

resource "azurerm_user_assigned_identity" "projectA" {
name = "conveyor-projectA"
}

resource "azurerm_federated_identity_credential" "projectA_env_dev" {
name = "kubernetes-federated-identity-${var.project_name}"
audience = ["api://AzureADTokenExchange"]
issuer = var.oidc_issuer_url
parent_id = azurerm_user_assigned_identity.projectA.id
subject = "system:serviceaccount:${var.environment}:${var.project_name}"
}

Give your user managed identity access to Azure resources

The last step is to give your user managed identity to the necessary resources, including: storage containers, Azure SQL, Purview,... In order to give your service principal access to a blob storage container, you can use the following Terraform code:

resource "azurerm_role_assignment" "blob_storage_read_access" {
scope = "/subscriptions/${var.subscription}/resourceGroups/${var.resource_group_name}/providers/Microsoft.Storage/storageAccounts/${var.samples_storage_account_name}"
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_user_assigned_identity.projectA.id
}