Storage architecture

The MetalSoft provides support for multiple types of storage:

  1. iSCSI storage which are used for diskless boot and/or for expanding the storage space on locally booted instances.
  2. Local storage as data only disks - scrub disks which is used for storing data only. This is useful for big data type scenarios. In this situation the disks controller are configured in pass-through (HBA) mode.
  3. Local storage as OS and data volumes which is used in certain situations when SAN booting is not desirable such as “edge” scenarios. In this situation the disk controller (where present) is set in RAID mode in RAID 1,10 or 5 configuration depending on the number of disks. This is only applicable to server_types with local drives and a RAID controller. NVMe drives are supported but the OS will be installed on the first disk.
  4. Object storage is not used as OS drives but can be provisioned as application specific data drives. Certain administrators configure the object storage as a FUSE drive but this is discouraged by MetalSoft due to stability and compatibility issues as object storage systems are not typically POSIX compliant.

a concept of storage ‘pools’ which is a pool of disk space out of which Drives are offered to ‘users’. The system has been designed to operate a large number of smaller pools in order to increase throughput and scalability. The pools (storages) do not share anything and can be of different types.

iSCSI storage

There are two types of Drives: which have associated iSCSI LUNs:

  1. OS drives are created by cloning an existing template LUN using “copy-on-write (COW)” and then expanding the drive to the size requested by the user.
  2. Non-OS drives are created with “copy-on-write (COW)” directly.

The system has support for multiple storage classes such as “SSD” OR “NEAR-LINE”. At provisioning time the system chooses a storage pool with enough disk space and with the specified storage class. The system will not allocate space from a system that is in “maintenance” mode.

A server can log-in to multiple iSCSi targets at the same time. The system will attempt to allocate volumes on the same drive array to different storage pools and the drives on the same server on the same storage pool to ensure that clusters of servers survive a network or storage pool downtime.

../_images/storage_architecture_1.svgarchitecture

If the drive is an OS drive the system will first replicate the selected OS template from the repository or another storage pool if not already available on the storage. A worm-up mechanism exists to ensure speedy provisioning times that copies the most frequent templates to a new storage

Additional Non-OS drives are mounted in the OS automatically by means of a protocol called iSNS delivered by a specialized datacenter agent. Certain OS templates might not be configured with this service enabled

Local storage

For OS drives the system will provision OSes on a specific server’s local drives using separate templates from the iSCSI ones via a cloning mechanism. At provisioning time the system is booted using a special operating system which will execute the actual cloning over HTTP directly from the repository.

Non OS drives are simply presented as independent block devices. They are NOT formatted or mounted by the system except where used by Kubernetes or other ‘applications’.

Secure erase of data

All data on all local drives is erased (securely by writing 0 or changing the encryption keys) at cleanup time (before the server). This is why cleanups can take a long time (hours) if the server has many local drives.

Data on iSCSI volumes is not actively wiped but is non-recoverable due to the copy-on-write mechanisms. No blocks with previous data will be presented to users by the COW storage as blocks are allocated only upon write, a read attempt on an unallocated block will return 0.