While testing the features for CockroachDB 2.1, the team discovered that AWS offered 40% greater throughput than GCP. To understand the reason for this result, the team compared GCP and AWS on TPC-C performance (e.g., throughput and latency), CPU, Network, I/O, and cost.
This has resulted in CockroachDB releasing a 2018 Cloud Report to help customers decide on which cloud solution to go with based on the most commonly faced questions, such as should they use Amazon Web Services (AWS), Google Cloud Platform (GCP) or Microsoft Azure? How should they tune their workload for different offerings? Which of the platforms are more reliable?
Note: They did not test Microsoft Azure due to bandwidth constraints but will do so in the near future.
The tests conducted
For GCP, the team chose the n1-standard-16 machine with Intel Xeon Scalable Processor (Skylake) in the us-east region and for AWS they chose the latest compute-optimized AWS instance type, c5d.4xlarge instances, to match n1-standard-16, because they both have 16 cpus and SSDs.
#1 TPC-C Benchmarking test
The team tested the workload performance by using TPC-C. The results were surprising as CockroachDB 2.1 achieves 40% more throughput (tpmC) on TPC-C when tested on AWS using c5d.4xlarge than on GCP via n1-standard-16. They then tested the TPC-C against some of the most popular AWS instance types.
Taking the testing a step ahead, they focused on the higher performing c5 series with SSDs, EBS-gp2, and EBS-io1 volume types. The AWS Nitro System present in c5and m5 series offers approximately similar or superior performance when compared to a similar GCP instance. The results were clear: AWS wins on TPC-C benchmark.
#2 CPU Experiment
The team chose stress-ng as according to them, it offered more benchmarks and provided more flexible configurations as compared to sysbench benchmarking test.
On running the Stress-ng command stress-ng –metrics-brief –cpu 16 -t 1m five times on both AWS and GCP, they found that AWS offered 28% more throughput (~2,900) on stress-ng than GCP.
#3 Network throughput and latency test
They have given a detailed setup of the iPerf tool used for this experiment in a blog post. The tests were run 4 times, each for AWS and GCP. The results once again showed AWS was better than GCP.
GCP showed a fairly normal distribution of network throughput centered at ~5.6 GB/sec. Throughput ranges from 4.01 GB/sec to 6.67 GB/sec, which according to the team is “a somewhat unpredictable spread of network performance”, reinforced by the observed average variance for GCP of 0.487 GB/sec.
AWS, offers significantly higher throughput, centered on 9.6 GB/sec, and providing a much tighter spread between 9.60 GB/sec and 9.63 GB/sec when compared to GCP. On checking network throughput variance, for AWS, the variance is only 0.006 GB/sec. This indicates that the GCP network throughput is 81x more variable when compared to AWS.
The network latency test showed that, AWS has a tighter network latency than GCP. AWS’s values are centered on an average latency, 0.057 ms. AWS offers significantly better network throughput and latency with none of the variability present in GCP.
#4 I/O Experiment
The team tested I/O using a configuration of Sysbench that simulates small writes with frequent syncs for both write and read performance. This test measures throughput based on a fixed set of threads, or the number of items concurrently writing to disk.
The write performance showed that AWS consistently offers more write throughput across all thread variance from 1 thread up to 64. In fact, it can be as high as 67x difference in throughput. AWS also offers better average and 95th percentile write latency across all thread tests. At 32 and 64 threads, GCP provides marginally more throughput.
For read latency, AWS tops the charts for up to 32 threads. At 32 and 64 threads GCP and AWS split the results. The test also shows that GCP offers a marginally better performance with similar latency to AWS for read performance at 32 threads and up.
The team also used the no barrier method of writing directly to disk without waiting for the write cache to be flushed. The result for this were reverse as compared to the above experiments. They found that GCP with no barrier speeds things up by 6x! On AWS, no barrier (vs. not setting no barrier) is only a 25% speed up.
Considering AWS outperformed GCP at the TPC-C benchmarks, the team wanted to check the cost involved on both platforms. For both clouds we assumed the following discounts available:
- On GCP :a three-year committed use price discount with local SSD in the central region.
- On AWS : a three-year standard contract paid up front.
They found that GCP is more expensive as compared to AWS, given the performance it has shown in the tests conducted. GCP costs 2.5 times more than AWS per tpmC.
In response to this generated report, Google Cloud developer advocate, Seth Vargo, posted a comment on Hacker News assuring users that Google’s team would look into the tests and conduct their own benchmarking to provide customers with the much needed answers to the questions generated by this report.
It would be interesting to see the results GCP comes up with in response to this report.
Head over to cockroachlabs.com for more insights on the tests conducted.
CockroachDB 2.0 is out!
Cockroach Labs announced managed CockroachDB-as-a-Service
Facebook GEneral Matrix Multiplication (FBGEMM), high performance kernel library, open sourced, to run deep learning models efficiently