This blogpost discusses how the public cloud provides unique benefits to its users: robust and compliant global infrastructure, extremely efficient server virtualisation engines and new state-of-the-art processors. It all translates into substantially lower costs and innovation. In a previous post, I discussed platform as a service (PaaS) offerings and how they confer a huge competitive advantage on the public cloud providers. All the advantages in this and the previous post are underpinned by the huge economies of scale that the public cloud providers have created.
Putting economies of scale in context
Let us first look at the orders of magnitude involved.
At the Amazon Web Services (AWS) re:Invent event in 2016, James Hamilton, AWS VP & Distinguished Engineer, stated:
AWS adds the capacity equivalent of a FORTUNE 500 Enterprise daily.
Four years later, in 2020, AWS grew its revenue by $10 billion over the previous year. AWS realised this revenue growth despite its aggressive drive to move customers from traditional computing based on virtual servers to cheaper serverless computation. Ethan Kaplan, Chief Product Officer of Fender Digital, discussed how guitar manufacturer Fender reengineerd its Fender Play service to serverless during re:Invent 2018:
Our AWS bill as of last month is 15% less than it was a year ago while we are serving 21 times the traffic
Although the move to serverless is highly deflationary for revenues, AWS continues growing its revenue at circa 28% p.a. with an annual run rate of $51 bn as of Q4 2020. Imagine how much faster its share of global computing installed base is growing and what economies of scale such a vast infrastructure is creating.
The website Platfornomics aggregates and publishes annual CAPEX of large companies. The three large hyperscale providers have CAPEX numbers that are difficult to wrap your head around. Although the CAPEX figures include investment areas other than public cloud, the size and growth rates give an indication of the size of the hyperscalers.
What such scale buys you
This unprecedented scale has enabled each of the three hyperscale providers to achieve the following: (i) a global fault-tolerant data centre and networking infrastructure (ii) highly performant and secure server management tooling based on proprietary software and computer chips (iii) new server chips offering higher performance at lower cost and lower power consumption than what is available in the private cloud. These achievements require huge investments, commitment and long-term vision and as such are impossible to replicate in the private cloud.
In the following three sections, we shall describe each area and shall detail the corresponding benefits.
Global, fault tolerant infrastructure
The data centre and networking infrastructure of a cloud provider is the basic building block for all the services offered. It determines the physical security of your data, the blast radius of a disruption and what options are available to you to increase the availability of the services you wish to run. The benefits of running in the public cloud are enormous: best in class availability, security and quality all available as a service.
AWS has pioneered cloud computing and its infrastructure design principles form the template for the infrastructure of all hyperscale providers. Based on the principle that “everything breaks all the time” data centres are grouped in Availability Zones (AZs) and AZs are grouped into regions. This hierarchy maximises availability and minimises the blast radius of any disruption. The regions are connected with one another through a globe-spanning wide area network.
An AZ is a logical data centre and usually consists of a distinct number of physical data centres in close proximity. An AZ behaves as one data center but mitigates availability risks by using more than one location. Microsoft Azure does not have AZs in all regions yet but has announced that all regions will get them by the end of 2021.
For higher availability, an organisation can run services across multiple AZs in the same region. These AZs are within a range of 100km (60 miles) ensuring that latency remains within 1ms. For organisations delivering services across countries and continents, services can also be deployed in multiple regions.
There are multiple benefits of having such a global infrastructure at your disposal:
- No investments required in housing, hosting, operating system or database licenses.
- No data centre lifecycle issues; no need to build, upgrade or decommission data centres, hardware or infrastructural software such as networking or virtualisation components.
- An AWS AZ is similar to the state-of-the-art dual or triple data centres operated by a large bank. This is the minimum level of robustness you get.
- Highly compliant with global security standards including European regulations such as GDPR. Total control in where you data resides.
- Data at rest and in transit can be encrypted at no extra cost and with no impact on performance.
- Extremely low risk of having a disruption beyond a single AZ.
- Multi AZ services are available out of the box; disaster recovery is a tick in the box.
- Easy to provide low latency services anywhere in the world.
- All cross region data communication is encrypted (both AWS and Google Cloud Platform guarantee this at present).
- Significantly reduced carbon footprint due to best in class data centre efficiency.
- The energy used to power the data centres is increasingly green; all three hyperscale providers are making huge investments to become carbon neutral in the next years.
High efficiency, security and greater choice of servers
When you look inside a data centre, all computation occurs within servers. How these servers are managed differs greatly between the public cloud in comparison to all other environments. The benefits are significantly higher efficiency (lower cost per workload), meaningfully improved security and more choice in the types of servers available.
In the last 20 years most workloads run on virtualised servers. Virtualisation provides many benefits: (i) it allows one to increase the capacity utilisation of hardware (ii) it limits the impact of the loss of a physical server (iii) it facilitates the management of your workloads.
Nevertheless, virtualisation engines (called hypervisors) also introduce their own limitations. A major disadvantage is that they degrade the raw performance of a server; a server running on a machine without virtualisation (bare metal) will be more performant than a virtual server with the same sizing.
There are four main hypervisors used in on premise, colocation and private cloud environments: VMWare, HyperV, Xen and KVM. Xen is the most efficient hypervisor and HyperV is the least efficient one. The difference between Xen and HyperV in CPU use and disk throughput is between 20-40% depending on the workload. AWS started its infrastructure by using the Xen hypervisor. Despite the relative efficiency of Xen, the CTO of Amazon, Werner Vogels, said:
In the early days of EC2 (the AWS server offering), we used the Xen hypervisor, which is purely software-based, to protect the physical hardware and system firmware; virtualize the CPU, storage, and networking; and provide a rich set of management capabilities. But with this architecture, as much as 30% of the resources in an instance were allocated to the hypervisor and operational management for network, storage, and monitoring.
Thirty percent is significant, and this waste wasn’t providing direct value to our customers. It became clear to us that if we wanted to significantly improve performance, security, and agility for our customers, we had to migrate most of our hypervisor functionalities to dedicated hardware. That’s when we began our journey of designing the Nitro System in 2012.
The Nitro System supports key network, server, security, firmware patching, and monitoring functions freeing up the entire underlying server for customer use. This allows EC2 instances (AWS server instances) to have access to all cores – none need to be reserved for storage or network I/O. This both gives more resources over to our largest instance types for customer use – we don’t need to reserve resources for housekeeping, monitoring, security, network I/O, or storage. The Nitro System also makes possible the use of a very simple, lightweight hypervisor that is just about always quiescent and it allows us to securely support bare metal instance types.
How efficient is the Nitro hypervisor? Netflix provides an answer. The company is a massive consumer of server resources for its operations and as such is focussed on getting the most out of its servers. At the scale Netflix operates, every 1% increase in efficiency matters. Brendan Gregg, a senior performance engineer at Netflix, writes about the Nitro system (emphasis is mine):
I’ve been investigating the overhead of Nitro, and have so far found it to be miniscule, often less than 1% (it’s hard to measure). Nitro’s performance is near-metal.
Nitro enables AWS to offer:
- The highest efficiency and hence the best price per actual unit of performance.
- Extremely safe servers (totally isolated from the hypervisor and the operators managing the hypervisor).
- The biggest range of server types and sizes.
- The lowest latency.
- The highest network bandwidth.
Groundbreaking efficiency and performance in new workloads (ML, ARM servers)
Central processing units (CPUs) are the server components responsible for general purpose computation. As such, they are the most important components inside a server; they greatly influence the speed of servers, the heat generation in the building and the energy consumption of the data centre. Last but not least, they influence the total cost of a data centre. In this section we discuss how the world of silicon chips is changing and how this can benefit you.
We begin by discussing some general trends in the design of silicon chips. In the past, computers contained a single CPU. Performance improvements were realised by increasing the frequency at which the CPU worked. If you doubled the frequency you did twice the amount of work in a given time. However, at some point this strategy hit a brick wall. Increasing the operating frequency of a chip beyond 3-5 GHz led to enormous heat generation.
For this reason chip designers switched to another strategy. They started adding more CPU cores to a single chip. The continuous advances in miniaturisation allowed them to do so. For a chip of the same size and the same power usage, the “transistor budget” keeps increasing. You can therefore cram more CPU cores in a single chip.
We are now entering a third era in chip design where the continuous addition of more CPU cores based on the x86 Intel standard is not the winning strategy. Industry leaders like AWS and Apple are following two new approaches:
- Building specialised chips: a CPU core is a swiss army knife of computation that can complete a wide variety of tasks. Specialised chips however are much faster and more energy efficient at completing specific tasks.
- Designing and building new general purpose CPUs based on different instruction sets and architecture principles with significantly improved performance and efficiency.
The era of specialised chips
Specialised chips are being introduced for many workloads across all types of devices; from mobile phones to data centre servers. The trend began with graphical processing units (GPUs) decades ago but has accelerated in the last years with the growth in smartphones. An iphone contains a CPU, a GPU, an image processing unit (ISP), a digital signal processor (DSP), a neural processing unit (NPU), a video encoder/decoder and a secure enclave.
The same trend is now evident in the data centre. Google Cloud Platform announced the first Tensor processing unit (TPU) to run Machine Learning workloads in 2016. By now the chip is already in its 4th generation.
AWS announced the machine learning (ML) chip Inferentia designed to run ML inference in 2018. By now, every time a user speaks to Amazon Alexa the human speech is interpreted using the Inferentia chips. Since moving from graphical processing units (GPUs) to inferentia chips for this task, Amazon has achieved 25% lower end-to-end latency while cutting operating costs by 30%.
In December 2020, AWS announced the launch of a new chip designed to optimise the training of ML models. This is the counterpart process to inference in machine learning. Similar performance and cost improvements have been promised for the servers that will launch in the first half of 2021.
There is no end in sight for this trend. Specialised chips are being developed offering both increased performance and reduced costs and energy consumption. Most of these possibilities are only available in the public cloud.
The end of the x86 era in general computing
As discussed earlier, the trend in the last years has been to add more cores and make sure that cores never sit idle. To achieve high utilisation more and more complex instructions have been built into the x86 instruction set – this the dominant CPU instruction set that Intel and AMD use in their chips.
Another trick used has been simultaneous multi threading (SMT): assigning two instruction threads to each core. In this way if one thread cannot be executed in one cycle because data has to be fetched from memory, the other thread can be executed. Two threads ensure that you are not idle as often and hence can improve performance by 30%.
Nevertheless, all this complexity has introduced performance variability and overhead. Even if you switch off SMT you cannot reclaim the transistors dedicated to this complexity.
In addition, SMT has introduced security vulnerabilities called back channel attacks. For this reason AWS and Google Cloud Platform never run the workloads of two different customers on the same core. Based on Microsoft Azure documentation, it appears that customers have to disable SMT (and incur a 30% performance penalty) to eliminate this risk. You pay for 100, get 70 and then need to pay almost 50% more to get back up to 100.
Given all these challenges, AWS took a step back and designed a new CPU. The CPU is named Graviton and is based on the ARM instruction set. The AWS objective with the Graviton was to develop a processor that delivers maximum real world performance for modern cloud workloads. Real world performance means that the chip must deliver consistency and predictability. In order to achieve this, the Graviton:
- Packs as many independent cores as possible and each of these cores have dedicated caches.
- No multi-threading (SMT) is being used.
The authoritative Anandtech review site wrote the following regarding the (second generation) Graviton2 performance:
…not only were Amazon and ARM able to deliver on all of their promises, but they’ve also hit it out of the park in terms of value against the incumbent x86 players.
In terms of value, the Graviton2 seemingly ends up with top grades and puts the competition to shame. This aspect not only will be due to the Graviton2’s performance and efficiency, but also due to the fact that suddenly Amazon is now vertically integrated for its EC2 hardware platforms. If you’re an EC2 customer today, and unless you’re tied to x86 for whatever reason, you’d be stupid not to switch over to Graviton2 instances once they become available, as the cost savings will be significant.
An aggregate of all workloads summed up together, which should hopefully end up in a representative figure for a wide variety of real-world use-cases, we do end up seeing the Graviton2 coming in 40% cheaper than the competing platforms, an outstanding figure.
I think it is appropriate to end this blogpost with James Hamilton introducing new server instances based on the Graviton2 ARM processor. In it he captures the essence of what economies of scale and industry leadership can bring to customers:
Nothing drives low cost economics better than volume. Nothing funds the R&D investment required to produce great server processors better than volume. Nothing builds devtool ecosystems faster than volume and it is volume that brings application developers. I started in this industry back in about 1986 working on mainframe ADA compilers, I subsequently spent years working on IBM db2 hosted on UNIX super servers and then moved to Microsoft sequel server hosted on x86 processors. I joined Amazon Web Services a bit more than eleven years ago. Each of these progressive changes was to a higher volume faster growing platform. Each product had more happy customers and produced far more revenue than the previous. Betting on volume has treated me well over the years and as a consequence more than a decade ago I got super interested in ARM cores and using ARM cores in servers.
After decades of no competition in the use of silicon chips (everyone used Intel), there is an enormous renaissance in the world of silicon chips. Selecting the right supplier offers you:
- Higher security
- 30-40% better performance (speed, latency, predictability) for the same cost
- Lower carbon footprint
Wish to learn more?
Public cloud environments offer significant benefits in comparison to private cloud or colocation infrastructures. Lower cost, better security, higher performance and more choice. These benefits are further enhanced by the Platform as a Service (PaaS) offerings that are only available there. Taking your private cloud configuration and copying it to the hyperscale provider’s cost calculator makes no sense. It also makes no sense comparing identical configurations across two public cloud providers. Not all public cloud providers are born equal.
Contact us if you want to learn more or you are looking for support in making the right choice.