Skip to main content

Configure logging in Spark jobs

Description

This how-to guide describes how to use logging in Spark jobs. When logging for applications running on Kubernetes, all logs you care about should be written to stdout or stderr instead of to a file. The logs written to stdout and stderr are collected by the Kubernetes logging infrastructure and can be accessed.

If you are using pyspark, you can just use the standard Python logging module for your python code. If you want to change the logging configuration for Spark, see the first 2 sections of this how-to guide.

In order to write logs for your application, you must satisfy the following requirements:

  • Add the necessary logging libraries to your Docker image
  • Add the necessary logging configuration
  • Write log events using a logging framework

Logging libraries provided in Spark images

By default, the Conveyor Spark images package the Slf4j api and the Log4j2 as logging implementation. The easiest way to get started with logging is thus to use these 2 in your application.

Log configuration

We also package a driver and executor log4j2 configuration file that specifies sensible defaults for Spark logging. The location of the log4j property files are:

  • Driver: /opt/spark/log4j/log4j2.properties
  • Executor: /opt/spark/log4j/log4j2-executor.properties

If you want to deviate from the default configuration, you can overwrite the existing files with your own log4j2.properties or log4j2-executor.properties. When running a Spark job in Conveyor, we add both properties file as extraJavaOptions to the spark-submit command as follows to make sure both files are picked up:

"spark.driver.extraJavaOptions": "-Dlog4j2.configurationFile=file:///opt/spark/log4j/log4j2.properties"
"spark.executor.extraJavaOptions": "-Dlog4j2.configurationFile=file:///opt/spark/log4j/log4j2-executor.properties"

Writing log events using Scala

If you are using Scala, the recommended logging framework to use is scala-logging. If all the previous requirements are satisfied, you can use the following code to write logs:

import com.typesafe.scalalogging.Logger

object GreeterJob {
private val logger = Logger("GreeterJob")

def run(config: GreeterConfiguration): Unit = {
logger.info("Hello from greeter job")
}
}