Instances

Conveyor supports the following instances types for all jobs:

Instance type	CPU	Total Memory (AWS)	Total Memory (Azure)
mx.nano	1*	0.438 GB	0.434 GB
mx.micro	1*	0.875 GB	0.868 GB
mx.small	1*	1.75 GB	1.736 GB
mx.medium	1	3.5 GB	3.47 GB
mx.large	2	7 GB	6.94 GB
mx.xlarge	4	14 GB	13.89 GB
mx.2xlarge	8	29 GB	30.65 GB
mx.4xlarge	16	59 GB	64.16 GB
cx.nano	1*	0.219 GB	Not supported
cx.micro	1*	0.438 GB	Not supported
cx.small	1*	0.875 GB	Not supported
cx.medium	1	1.75 GB	Not supported
cx.large	2	3.5 GB	Not supported
cx.xlarge	4	7 GB	Not supported
cx.2xlarge	8	14 GB	Not supported
cx.4xlarge	16	29 GB	Not supported
rx.xlarge	4	28 GB	Not supported
rx.2xlarge	8	59 GB	Not supported
rx.4xlarge	16	120 GB	Not supported

info

(*) These instance types don't get a guaranteed full CPU but only a slice of a full CPU, but they are allowed to burst up to a full CPU if the cluster allows.

The numbers for AWS and Azure differ because nodes on both clouds run different DaemonSets and have different reservation requirements set by the provider. We aim to minimize the node overhead as much as possible while still obeying the minimum requirements of each cloud provider.

Spark resources

When running Spark/PySpark applications, only a part of the total memory for the container is available for Spark itself. The details are described in the following tables:

AWS
Azure

Instance type	CPU	Total memory	Spark memory	PySpark memory
mx.micro	1*	0.875 GB	0.8 GB	0.6 GB
mx.small	1*	1.75 GB	1.6 GB	1.25 GB
mx.medium	1	3.5 GB	3.2 GB	2.5 GB
mx.large	2	7 GB	6.4 GB	5 GB
mx.xlarge	4	14 GB	12.7 GB	10 GB
mx.2xlarge	8	29 GB	26.7 GB	21 GB
mx.4xlarge	16	59 GB	54 GB	42.4 GB
cx.medium	1	1.75 GB	1.6 GB	1.25 GB
cx.large	2	3.5 GB	3.2 GB	2.5 GB
cx.xlarge	4	7 GB	6.4 GB	5 GB
cx.2xlarge	8	14 GB	12.7 GB	10 GB
cx.4xlarge	16	29 GB	26.7 GB	21 GB
rx.xlarge	8	28 GB	26 GB	21 GB
rx.2xlarge	16	59 GB	54 GB	43 GB
rx.4xlarge	16	120 GB	112 GB	88 GB

Instance type	CPU	Total memory	Spark memory	PySpark memory
mx.micro	1*	0.868 GB	0.78 GB	0.60 GB
mx.small	1*	1.73 GB	1.56 GB	1.21 GB
mx.medium	1	3.47 GB	3.12 GB	2.43 GB
mx.large	2	6.94 GB	6.25 GB	4.86 GB
mx.xlarge	4	13.89 GB	12.50 GB	9.72 GB
mx.2xlarge	8	30.65 GB	27.58 GB	21.45 GB
mx.4xlarge	16	64.16 GB	57.74 GB	44.91 GB

info

(*) These instance types don't get a guaranteed full CPU but only a slice of a full CPU. If the cluster has space for it, they are allowed to burst up to a full CPU.

As you can see from the tables, the supported executor memory configs change depending on using regular (Scala) Spark or PySpark. The explanation for this can be found in the spark.kubernetes.memoryOverheadFactor which can be found in the Spark settings. This setting is configured to 0.1 for JVM jobs (Scala and Java Spark), and to 0.4 for non-JVM jobs (PySpark, SparkR). A portion of the memory is set aside for non-JVM things like: off-heap memory allocations, system-processes, Python, R... Otherwise, your job would commonly fail with the error "Memory Overhead Exceeded".

GPU instances

warning

GPU's are not supported on Azure. If you would like to see this happen, please get in touch.

warning

Not all regions on AWS support all these instance types, you can consult the AWS pricing page to see if an instance type is supported in your region.

important

If you want to use GPU acceleration in your jobs, your container image should contain the proper support for it. Have a look at the Working with GPU's guide to get started.

When installed on AWS, Conveyor supports using instances that come with a GPU. Currently, you can make use of the following instance types:

Instance type	CPU	Total Memory (AWS)	GPU
g4dn.xlarge	4	16 GB	1 NVIDIA T4
g4dn.2xlarge	8	32 GB	1 NVIDIA T4
g4dn.4xlarge	16	64 GB	1 NVIDIA T4
g5.xlarge	4	16 GB	1 NVIDIA A10G Tensor Core GPU
g5.2xlarge	8	32 GB	1 NVIDIA A10G Tensor Core GPU
g5.4xlarge	16	64 GB	1 NVIDIA A10G Tensor Core GPU
g6.xlarge	4	16 GB	1 NVIDIA L4 Tensor Core GPU
g6.2xlarge	8	32 GB	1 NVIDIA L4 Tensor Core GPU
g6.4xlarge	16	64 GB	1 NVIDIA L4 Tensor Core GPU

Disk space allocation

When an application saves data to disk, it will by default consume disk space from the host that it is running on. It's important to note that this disk space will be shared across all the jobs that are running on the same physical machine. Applications are unable to read each-others files, but a particularly storage-hungry application might consume all available disk-space, potentially causing issues for other jobs running on the same host machine.

Applications requesting a T-shirt size of mx.xlarge or greater will get the "full" instance assigned. This means that no other applications will be deployed on that instance and will thus not suffer from a "noisy neighbor" problem. Applications running on smaller instance sizes will receive a slice of a physical machine, and share the amount of available disk space (about 50GB of allocatable space).

To avoid this issue, you can provision application-specific storage by specifying the disk_size (and optionally disk_mount_path) when using the ContainerOperator.

Spark applications can make use of the equivalent executor_disk_size when using the SparkSubmitOperator. This setting will provision additional storage for each executor, which will then be automatically used by Spark.

Spark resources​

GPU instances​

Disk space allocation​

Spark resources

GPU instances

Disk space allocation