In May 2019, DataStax hosted the Accelerate conference for Apache Cassandra™ inviting community members, DataStax customers, and other users to come together, discuss the latest developments around Cassandra, and find out more about the development of Cassandra. Nate McCall, Apache Cassandra Project Chair, presented the road to version 4.0 and what the community is focusing on for the future.
So, what does the future really hold for Cassandra? The project has been going for ten years already, so what has to be added?
First off, listening to Nate’s keynote, the approach to development has evolved. As part of the development approach around Cassandra, it’s important to understand who is committing updates to Cassandra. The number of organisations contributing to Cassandra has increased, while the companies involved in the Project Management Committee includes some of the biggest companies in the world.
The likes of Instagram, Facebook and Netflix have team members contributing and leading the development of Cassandra because it is essential to their businesses. For DataStax, we continue to support the growth and development of Cassandra as an open source project through our own code contributions, our development and training, and our drivers that are available for the community and for our customers alike.
Having said all this, there are still areas where Cassandra can improve as we get ready for 4.0. From a development standpoint, the big things to look forward to as mentioned in Nate’s keynote are:
An improved Repair model
For a distributed database, being able to carry on through any failure event is critical. After a failure, those nodes will have to be brought back online, and then catch up with the transactions that they missed. Making nodes consistent is a big task, covered by the Repair function. In Cassandra 4.0, the aim is to make Repair smarter. For example, Cassandra can preview the impact of a repair on a host to check that the operation will go through successfully, and specific pull requests for data can also be supported.
Alongside this, a new transient replication feature should reduce the cost and bandwidth overhead associated with repair. By replicating temporary copies of data to supplement full copies, the overall cluster should be able to achieve higher levels of availability but at the same time reduce the overall volume of storage required significantly. For companies running very large clusters, the cost savings achievable here could be massive.
A Messaging rewrite
Efficient messaging between nodes is essential when your database is distributed. Cassandra 4.0 will have a new messaging system in place based on Netty, an asynchronous event-driven network application framework. In practice, using Netty will improve performance of messaging between nodes within clusters and between clusters.
On top of this change, zero copy support will provide the ability to improve how quickly data can be streamed between nodes. Zero copy support achieves this by modifying the streaming path to add additional information into the streaming header, and then using ZeroCopy APIs to transfer bytes to and from the network and disk. This allows nodes to transfer large files faster.
Cassandra and Kubernetes support
Adding new messaging support and being able to transfer SSTables means that Cassandra can add more support for Kubernetes, and for Kubernetes to do interesting things around Cassandra too. One area that has been discussed is around dynamic cluster management, where the number of nodes and the volume of storage can be increased or decreased on demand.
Sidecars are additional functional tools designed to work alongside a main process. These sidecars fill a gap that is not part of the main application or service, and that should remain separate but linked. For Cassandra, running sidecars allows developers to add more functionality to their operations, such as creating events on an application.
Java 11 support
Diagnostic events and logging
This will make it easier for teams to use events for a range of things, from security requirements through to logging activities and triggering tools.
As part of the conference, there were two big trends that I took from the event. The first is – as Nate commented in his keynote – that there was a definite need for more community events that can bring together people who care about Cassandra and get them working together.
The second is that Apache Cassandra is essential to many companies today. Some of the world’s largest internet companies and most valuable brands out there rely on Cassandra in order to achieve what they do. They are contributors and committers to Cassandra, and they have to be sure that Cassandra is ready to meet their requirements.
For everyone using Cassandra, this means that versions have to be ready for use in production rather than having issues to be fixed. Things will get released when they are ready, rather than to meet a particular deadline. And the community will take the lead in ensuring that they are happy with any release.
Cassandra 4.0 is nearing release. It’ll be out when it is ready. Whether you are looking at getting involved with the project through contributions, developing drivers or through writing documentation, there is a warm welcome for everyone in the run up to what should be a great release.
I’m already looking forward to ApacheCon later this year!
Patrick McFadin is the vice president of developer relations at DataStax, where he leads a team devoted to making users of DataStax products successful. Previously, he was chief evangelist for Apache Cassandra and a consultant for DataStax, where he helped build some of the largest and most exciting deployments in production; a chief architect at Hobsons; and an Oracle DBA and developer for over 15 years.