Atlassian recently announced the release of their open source Kubernetes autoscaler project, Escalator. This project aims at resolving issues related with autoscaling where clusters were not fast enough in scaling up or down.
Atlassian explained the problem with scaling up, which was when clusters hit capacity, users would have to wait for a long time for the additional Kubernetes workers to be booted up in order to assist with the additional load. Many builds cannot tolerate extended delays and would fail.
On the other hand, the issue while scaling down was that when loads had subsided, the autoscaler would not scale-down fast enough. Though this is not really an issue when the node count is less, however a problem can arise when that number reaches hundreds and more.
Escalator, written in Go, is the solution
To address the problem with the scalability of the clusters, Atlassian created Escalator, which is a batch of job optimized autoscaler for Kubernetes.
Escalator basically had two goals :
- Provide preemptive scale-up with a buffer capacity feature to prevent users from experiencing the ‘cluster full’ situation,
- Support aggressive scale-down of machines when they were no longer required.
Atlassian also wanted to build a Prometheus metrics for the Ops team, to gauge how well the clusters were working.
With Escalator, one need not wait for EC2 instances to boot and join the cluster. It also helps in saving money by allowing one to pay for the number of machines actually needed. It has also helped Atlassian save a lot of money, nearly thousands of dollars per day, based on the workloads they run.
At present, Escalator is released as open source to the Kubernetes community. However, others can avail its features too. The company would be expanding the tool to its external Bitbucket Pipeline users, and would also explore ways to manage more service-based workloads.