Instances
Conveyor supports the following instances types for all jobs:
Instance type | CPU | Total Memory (AWS) | Total Memory (Azure) |
---|---|---|---|
mx.nano | 1* | 0.438 GB | 0.434 GB |
mx.micro | 1* | 0.875 GB | 0.868 GB |
mx.small | 1* | 1.75 GB | 1.736 GB |
mx.medium | 1 | 3.5 GB | 3.47 GB |
mx.large | 2 | 7 GB | 6.94 GB |
mx.xlarge | 4 | 14 GB | 13.89 GB |
mx.2xlarge | 8 | 29 GB | 30.65 GB |
mx.4xlarge | 16 | 59 GB | 64.16 GB |
cx.nano | 1* | 0.219 GB | Not supported |
cx.micro | 1* | 0.438 GB | Not supported |
cx.small | 1* | 0.875 GB | Not supported |
cx.medium | 1 | 1.75 GB | Not supported |
cx.large | 2 | 3.5 GB | Not supported |
cx.xlarge | 4 | 7 GB | Not supported |
cx.2xlarge | 8 | 14 GB | Not supported |
cx.4xlarge | 16 | 29 GB | Not supported |
rx.xlarge | 4 | 28 GB | Not supported |
rx.2xlarge | 8 | 59 GB | Not supported |
rx.4xlarge | 16 | 120 GB | Not supported |
(*) These instance types don't get a guaranteed full CPU but only a slice of a full CPU, but they are allowed to burst up to a full CPU if the cluster allows.
The numbers for AWS and Azure differ because nodes on both clouds run different DaemonSets and have different reservation requirements set by the provider. We aim to minimize the node overhead as much as possible while still obeying the minimum requirements of each cloud provider.
Spark resources
When running Spark/PySpark applications, only a part of the total memory for the container is available for Spark itself. The details are described in the following tables:
- AWS
- Azure
Instance type | CPU | Total memory | Spark memory | PySpark memory |
---|---|---|---|---|
mx.micro | 1* | 0.875 GB | 0.8 GB | 0.6 GB |
mx.small | 1* | 1.75 GB | 1.6 GB | 1.25 GB |
mx.medium | 1 | 3.5 GB | 3.2 GB | 2.5 GB |
mx.large | 2 | 7 GB | 6.4 GB | 5 GB |
mx.xlarge | 4 | 14 GB | 12.7 GB | 10 GB |
mx.2xlarge | 8 | 29 GB | 26.7 GB | 21 GB |
mx.4xlarge | 16 | 59 GB | 54 GB | 42.4 GB |
cx.medium | 1 | 1.75 GB | 1.6 GB | 1.25 GB |
cx.large | 2 | 3.5 GB | 3.2 GB | 2.5 GB |
cx.xlarge | 4 | 7 GB | 6.4 GB | 5 GB |
cx.2xlarge | 8 | 14 GB | 12.7 GB | 10 GB |
cx.4xlarge | 16 | 29 GB | 26.7 GB | 21 GB |
rx.xlarge | 8 | 28 GB | 26 GB | 21 GB |
rx.2xlarge | 16 | 59 GB | 54 GB | 43 GB |
rx.4xlarge | 16 | 120 GB | 112 GB | 88 GB |
Instance type | CPU | Total memory | Spark memory | PySpark memory |
---|---|---|---|---|
mx.micro | 1* | 0.868 GB | 0.78 GB | 0.60 GB |
mx.small | 1* | 1.73 GB | 1.56 GB | 1.21 GB |
mx.medium | 1 | 3.47 GB | 3.12 GB | 2.43 GB |
mx.large | 2 | 6.94 GB | 6.25 GB | 4.86 GB |
mx.xlarge | 4 | 13.89 GB | 12.50 GB | 9.72 GB |
mx.2xlarge | 8 | 30.65 GB | 27.58 GB | 21.45 GB |
mx.4xlarge | 16 | 64.16 GB | 57.74 GB | 44.91 GB |
(*) These instance types don't get a guaranteed full CPU but only a slice of a full CPU. If the cluster has space for it, they are allowed to burst up to a full CPU.
As you can see from the tables, the supported executor memory configs change depending on using regular (Scala) Spark or PySpark.
The explanation for this can be found in the spark.kubernetes.memoryOverheadFactor
which can be found in the
Spark settings.
This setting is configured to 0.1 for JVM jobs (Scala and Java Spark), and to 0.4 for non-JVM jobs (PySpark, SparkR).
A portion of the memory is set aside for non-JVM things like: off-heap memory allocations, system-processes, Python, R...
Otherwise, your job would commonly fail with the error "Memory Overhead Exceeded".
GPU instances
When installed on AWS, Conveyor supports using instances that come with a GPU. Currently, you can make use of the following instance types:
Instance type | CPU | Total Memory (AWS) | GPU |
---|---|---|---|
g4dn.xlarge | 4 | 16 GB | NVIDIA T4 |
GPU instances are not currently supported on Azure. If you would like to see this happen, please get in touch!
Disk space allocation
When an application saves data to disk, it will by default consume disk space from the host that it is running on. It's important to note that this disk space will be shared across all the jobs that are running on the same physical machine. Applications are unable to read each-others files, but a particularly storage-hungry application might consume all available disk-space, potentially causing issues for other jobs running on the same host machine.
Applications requesting a T-shirt size of mx.xlarge
or greater will get the "full" instance assigned.
This means that no other applications will be deployed on that instance and will thus not suffer from a "noisy neighbor" problem.
Applications running on smaller instance sizes will receive a slice of a physical machine,
and share the amount of available disk space (about 50GB of allocatable space).
To avoid this issue, you can provision application-specific storage
by specifying the disk_size
(and optionally disk_mount_path
) when using the ContainerOperator.
Spark applications can make use of the equivalent executor_disk_size
when using the SparkSubmitOperator.
This setting will provision additional storage for each executor, which will then be automatically used by Spark.