Kubernetes is in a bit of state about state

Kubernetes is “four to five years away” from being a stable distribution capable of running stateful apps, according to Redis Labs chief product officer Alvin Richards.

Enterprise IT applications typically need to access recorded data, process it, and write fresh data. This data is its state. Can Kubernetes run stateful applications out of the box? “No, absolutely not,” Richards told us in a phone interview. That’s where Kubernetes’ myriad extensions come in.

More on that later, But first, let’s remind ourselves of ‘stateful’ and ‘stateless’.

Stateful and stateless containers

Alvin Richards

Kubernetes began its life orchestrating stateless containers and these provided a way to scale applications relatively easily. But they don’t record data, having no memory of previous sessions when they are instantiated. Like a simple calculator application they have no state. Whenever a calculator is started up it sets itself at zero and waits for input. That input is lost when the calculation app is closed. 

A spreadsheet, on the other hand, has state. In other words, you don’t need to enter information each time Excel is started up. The values in its cells are recorded, stored and loaded into memory when the spreadsheet app opens the filed spreadsheet data.

CSI (Container Storage Interface) is the starting point for adding state to containers orchestrated by Kubernetes. CSI enables third-party storage providers to create plugins adding a block or file storage system to a Kubernetes deployment without affecting the core Kubernetes code. Multiple data storage vendors have set up CSI plugins linking their external storage systems to containers or they set up actual storage containers. Each one is unique.

Kubernetes Operators

So is CSI job-done for Kubernetes and storage? Not according to Richards, who says the software needs an operator.

Redis Labs software talks to an operator, a piece of interface code that automates Kubernetes operations. An operator uses the Kubernetes API to tell Kubernetes what to do. Effectively, Richards says, “Kubernetes is an API … to provision and automate infrastructure.” Redis operator has been extended so it can use the API to create a database inside a cluster. It provides high availability with automated disaster recovery.

A question for Redis, as a supplier of an open source, distributed, memory-cached NoSQL database, is what should Kubernetes do about such stateful services?

These, like the Redis NoSQL database, could run in the local Kubernetes cluster or in a managed cloud service accessed through a gateway. The operator needs to know this and there has to be secure node-to-node communication for software like Redis. Raw storage functionality is not enough for enterprise storage needs.

Multiplying distributions

According to Richards, there are many Kubernetes distributions, with each adding its own bit of uniqueness to the pot, that could lead to customer confusion. How will these disparate distributions get combined? Richards contrasts Linux and Docker: “Linux got to a point with a guiding light (Linus Torvalds) and powered ahead…. Docker did not.”

He thinks there are two very different Kubernetes communities: stateless and stateful. My impression is that he wants it to take the Linus/Linux route to power ahead, and adopt stateful container functionality.

Incentives dichotomy

In our view, the baseline Kubernetes distribution expands functionality over time, absorbing the good additions from the various forks. Suppliers providing added paid-for services on top of the raw Kubernetes distribution have to accept this, and move their services up the stack or add new services. They don’t contribute their paid-for code to the Kubernetes open source project; that would destroy their ability to monetise their code. 

Other more altruistic coders develop similar or equivalent functionality and contribute it to the project, eroding the basis for the paid-for code. Richards describes this as a dichotomy of incentives. For instance, he thinks that stateful app management, like encryption, should be just a basic part of the raw Kubernetes distribution, i.e, a feature and not a product.

Richards cites MongoDB as now having core and good enough database functionality. In other words mucho of the functionality in originally, paid-for extensions has been assimilated into the core database.

Stating the not-so-obvious

So will a vibrant Kubernetes open source community follow suit? Progress has been made and you can run stateful services in Kubernetes. But, Richards says: “Can a wider audience/community be successful? Well the jury is out on that.” 

He thinks we’re maybe four to five years away from having a stable Kubernetes distribution capable of running stateful apps. But “the timeline will shorten as it becomes a major focus.”

Until then we have to endure the messy altruistic and artisanal coding creativity that is the open source movement, and hope it collectively takes us to the right destination where raw Kubernetes can run stateful containers out of the box.