Rapid.Space has three design goals: slash costs, high performance and ethics. This is achieved by operating with Free Software re-certified Open Compute Project (OCP) servers hosted in energy efficient data-centers and by eliminating any form of hardware redundancy. Each OCP server has:
- a single 128GB SSD disk for base system image;
- a single 4TB SSD (SATA) disk for data;
- a single 10 Gbps network interface connected to a single switch per rack;
- a single power supply per server;
- 256 GB RAM;
- 20 cores (Xeon 2680v2).
Data-centers which host Rapid.Space servers may only have a single electrical power source and a single network access. No power generator is needed. Batteries only supply power during 2 minutes. Servers are designed to operate up to 35 Celsius degree outdoor temperature in free airflow data-centers with no air-conditioning nor cooling.
All design principles that are usually taken for granted in data-centers or cloud computing systems have been questioned by Rapid.Space. Rather than relying on hardware redundancy, any form of redundancy or resiliency should be implemented through software and multiple data-centers.
For example, network resiliency is implemented through re6st software which is capable of overcoming routing issues that often happen on the Internet. Application resiliency can be implemented either using SlapOS's resiliency technology or by relying on databases which are nativly redundant over multiple data-centers.
No IPMI, No KVM, No Power Switches
One of the challenges faced by Rapid.Space was to reach similar or better performance as bare metal servers provided by OVH, Online, Hetzner while cutting costs as much as possible on the server management infrastructure. A Rapid.Space data-center should ideally operate without any form of IPMI management network, without virtual keyboards and monitor (KVM), without power switches, without routers, without having to ever reinstall servers, without anti-DDOS system, without any of those sophisticated hardware or systems that took a decade to build by other providers.
Simplification and cost cutting goals led to the following decisions:
- the standard unit of management is the virtual machine: it eliminates the need for IPMI, KVM and power switches which are replaced by software equivalent (NoVNC);
- all network is IPv6 based and self-routed by babel: it eliminates the need for routers or expensive IPv4 address class;
- bootloader is Linuxboot with read-only system image self-installed on dedicated SSD: servers do not need to be reinstalled;
- network traffic is blocked by default: it eliminates the need for expensive DDOS mitigation or malware detection.
This approach sounds ideal in many aspects: lower cost, less hardware and less staff to maintain the system. Performance of networking or CPU Is not degraded by virtualization thanks to virtio. However, this approach faces one hurdle: the risk of poor performance of storage virtualization especially in the case of database applications.
High performance disk I/O with qemu
Storage virtualization often leads to poor I/O performance. We have experienced how Storage Area Network (SAN) systems often lead to overall database performance 10 to 100 lower than with a consumer grade 200€ SSD because of latency or congestion on the Fiber Channel network. Similar issues can happen with network block device for similar reasons. Whenever poor database performance is the consequence of high latency to access the storage, not much can be done besides moving to locally attached storage.
In order to evaluate Rapid.Space performance, we decided to evaluate the performance of a real ERP application developed by Nexedi which produces and posts to accounting ledger about 300.000 invoices in about 8 hours: "ERP5 Billing Run". It is the biggest database application operated by Nexedi. It is currently hosted using OVH on high-end Xeon dedicated servers with locally attached SSD (NVME). All services of this applications are automatically orchestrated over bare metal with SlapOS.
This "ERP5 Billing Run" application combines both CPU and disk I/O. It is written in python language. The underlying database is MariaDB which is used both for relational queries and to store BLOBs with NEO. Faster CPU and faster disk I/O will lead in slower time to complete the test.
We then compared different scenarii:
- use of Rapid.Space OCP server and qemu with a qcow2 disk image on a locally attached storage (qemu qcow2);
- use of Rapid.Space OCP server and qemu with a disk partition of a locally attached storage (qemu /dev/sdb2);
- use of Rapid.Space OCP server and qemu with the complete device of a locally attached storage (qemu /dev/sdb);
- use of bare metal Rapid.Space OCP server on a locally attached storage (bare metal);
- use of an OVH OpenStack (dual C2-120 VM with 120 GB RAM each) with locally attached disk;
- use of an OVH dedicated server (single Big-HG server with 256 GB RAM).
In the case of OCP hardware Qemu configuration was optimized for performance by using virtio, pass-through and cache parameters.
Disk performance benchmark of qemu with ERP5 Billing Run
|
OCP qemu qcow2 |
OCP qemu /dev/sdb2 |
OCP qemu /dev/sdb |
OCP bare metal |
OVH OpenStack |
OVH Dedicated |
Duration |
85h |
85h |
12h55 |
9h51 |
16h22 |
8h27 |
VM Manager |
SlapOS |
SlapOS |
SlapOS |
N/A |
OpenStack |
N/A |
Orchestrator |
SlapOS |
SlapOS |
SlapOS |
SlapOS |
SlapOS |
SlapOS |
Database |
NEO + InnoDB |
NEO + InnoDB |
NEO + InnoDB |
NEO + InnoDB |
NEO + InnoDB |
NEO + InnoDB |
Pystone (CPU test) |
150.000 |
150.000 |
150.000 |
150.000 |
200.000 |
230.000 |
vCore |
40 |
40 |
40 |
40 |
64 |
40 |
Xeon Generation |
v2 |
v2 |
v2 |
v2 |
N/A |
v3 |
Frequency |
2.8/3.6 GHz |
2.8/3.6 GHz |
2.8/3.6 GHz |
2.8/3.6 GHz |
3.1 GHz |
2.6/3.3 GHz |
Storage |
4 TB SATA SSD |
4 TB SATA SSD |
4 TB SATA SSD |
4 TB SATA SSD |
4 TB High Speed |
4 TB NVME SSD |
RAM |
256 GB |
256 GB |
256 GB |
256 GB |
2 x 120 GB |
256 GB |
Monthly Price |
N/A |
N/A |
195€ |
195€ |
2261€ |
769€ |
Redundancy |
No |
No |
No |
No |
Yes |
No |
Our conclusions are:
- it is possible to reach near bare metal disk performance with qemu by attaching a whole SSD device to qemu;
- attaching a partition or using a disk image in qemu reduces disk performance by an order of magnitude;
- bare metal execution of database is about 30% faster than the best qemu configuration;
- OVH dedicated servers are faster than Rapid.Space but cost more;
- OVH OpenStack VMs are smaller and slower than Rapid.Space despite faster CPU, provide redundancy but cost much more;
- Rapid.Space costs much less but is about 30% slower than the fastest dedicated servers on the market.
Based on these results, all Rapid.Space OCP servers are now configured with two SSD disks: one for the system image and one entirely dedicated to qemu process. In the near future Rapid.Space will add a third disk for bare metal deployment of database.
Nexedi will keep on relying on OVH because it is one of the best providers of high performance dedicated servers in the world. Nexedi will use Rapid.Space VMs and bare metal to address use cases not covered by OVH or to cut costs on use cases that do not require highest performance.
Overall, the "ERP5 Billing Run" test is consistent with the pedigree of Rapid.Space servers: very high-end servers that were built 3 years ago for large corporations and that are now about 30% slower than recent generation high-end servers, once they have been re-certified and upgraded with a brand new SSD. Considering their cost, it is deal!
Further performance with SlapOS nano-containers, FusionIO and Free Software
Even with the best possible configuration, qemu disk performance is still 30% slower than bare metal. This is obviously much better than 800% slower or even worse as we often experienced with SAN or some public cloud computing services. Yet, some applications may need even more disk performance. For those applications, Rapid.Space will provide with each virtual machine a FusionIO dedicated storage. Rapid.Space customers will be able to deploy any SlapOS database profile (MariaDB, NEO, etc.) to benefit from bare metal performance of locally attached FusionIO disks, while running their applications inside qemu virtual machine.
We expect a performance boost of 200% to 1000% through this approach compared to best results with qemu.
We also expect to open SlapOS nano-container technology to Rapid.Space users, so that any GNU/Linux software (MariaDB, Postgresql, ERP5, NEO, Wendelin, Spark, etc.) can be deployed on bare metal next to the dedicated VM. If a Rapid.Space user needs a different database, a different version or a different configuration, he or she can extend SlapOS software library.
This is the other beauty of Rapid.Space: all its source code is Free Software. Rapid.Space customers are free to contribute to it and improve it.