Not everyone is happy with Kubernetes. Google open-sourced the internal container orchestration solution in 2014, and it became one of the most sought-after skills to learn in all these years. In fact, Kubernetes, or K8s, has evolved into a whole ecosystem for managing microservices across cloud platforms.
However, when AI/ML workloads came into the picture, it became challenging for companies to continue using it as it required substantial compute resources.
The most recent example of a company leaving Kubernetes is Gitpod, which decided earlier this month to shift its focus to home-grown tools. Citing a history of “experiments, failures, and dead-ends” with its cloud-native container orchestration tools, the company’s development platform plans to transition to a custom-built solution called Flex.
According to a blog published by Christian Weichel, the co-founder and CTO of Gitpod, and Alejandro de Brito Fontes, staff engineer at the company, Kubernetes initially seemed like the obvious choice for Gitpod’s remote, standardised, and automated development environments.
Just like everyone else, Gitpod has relied on Kubernetes since its founding in 2020, drawn by its scalability and robust ecosystem. However, scaling up brought forth significant issues with complexity, resource management, and state handling.
“Over the years, we experimented with many ideas involving SSDs, PVCs, eBPF, seccomp notify, TC and io_uring, shiftfs, FUSE and idmapped mounts, ranging from microVMs, kubevirt to vCluster,” reads the blog.
But Kubernetes often pushed the team to reverse-engineer solutions to handle unique challenges like scaling, secure arbitrary code execution, and stability for developers. While Kubernetes excelled at managing well-controlled application workloads, Gitpod found it ill-suited for unpredictable development environments. Even managed services like GKE and EKS eased some pain points but introduced restrictions that complicated operations.
“Managing Kubernetes at scale is complex,” the team noted. “Many teams looking to operate a cloud development environment underestimate the complexity of Kubernetes, which leads to significant support loads.”
In January 2024, Gitpod began developing Flex, which was launched in October. Built on Kubernetes-inspired principles like declarative APIs and control theory, Flex simplifies architecture prioritises zero-trust security, and addresses the specific needs of development environments.
A Common Sentiment?
In August, Cloud Native Computing Foundation (CNCF), a nonprofit organisation that promotes the development and adoption of Kubernetes, released a new update, Kubernetes 1.31 (Elli). It was designed specifically to improve resource management and efficiency for handling AI/ML applications on Kubernetes.
To understand it better, AIM spoke to Murli Thirumale, GM (cloud-native business unit), Portworx at Pure Storage, who said that generative AI is actually fuelling the need to keep using Kubernetes, and the demand will only grow.
“Data scientists continually tweak and refine models based on the evolving training data and changing parameters. This frequent modification makes container environments particularly well-suited for handling the dynamic nature of these models,” Thirumale said.
The Gitpod team also said that Kubernetes continues to be a “fine choice” as long as you are running application workloads.
However, the team said that when it comes to development environments, K8s present security and operational challenges. With virtual demos planned to showcase its capabilities, Gitpod Flex can be deployed in under three minutes. While Kubernetes remains powerful, Gitpod’s journey underscores the need to differentiate between application and system workload requirements.
Ben Houston, founder and CTO of ThreeKit, an online visual commerce platform, illustrated another of Kubernetes’s challenges in his recent blog. Houston explained why he shifted from Kubernetes to Google Cloud Run. The primary reason for his drifting away from Kubernetes was its complexity and high cost, which outweighed its benefits for managing infrastructure at scale.
For Houston, Kubernetes required extensive provisioning, maintenance, and management, leading to significant DevOps overhead. Additionally, its slow autoscaling often resulted in over-provisioning and paying for unused resources.
In contrast, Google Cloud Run provided a simpler and cost-effective alternative with features like rapid autoscaling, pay-per-use pricing, and reduced management overhead.
This shift allowed for easier deployments, lower costs, and greater scalability, making Cloud Run a better fit for agile projects focused on simplicity and efficiency. Furthermore, “Kubernetes has a steep learning curve, and certainly a lot of complexity, but when used appropriately for the right case, by God it’s glorious,” said a developer in a discussion on Hacker News.
The Kubernetes ecosystem is filled with unnecessary add-ons and tools marketed as “essential.” Features like sidecars and service meshes often introduce more complexity than utility, and organisations may fall prey to adopting them without fully understanding their implications.
This creates a bloated and fragile infrastructure that is harder to maintain. Moreover, reliance on managed Kubernetes offerings from cloud providers can lead to vendor lock-in, reducing flexibility and driving up costs.
What Can Be Done?
But Kubernetes’ promise was never about building for developer environments. Pablo Chico de Guzman, CTO of Okteto, a company that helps build Kubernetes-based applications, said on Reddit that while he agrees with Gitpod’s blog, his approach to development environments is slightly different.
Instead of running full IDEs in Kubernetes, Okteto uses a customised BuildKit service with file synchronisation to streamline builds and deployments, ensuring development closely mirrors production without compromising feedback loops. Each development environment operates in its own Kubernetes namespace, with optional dedicated cluster nodes for higher isolation, though few customers use this feature.
Similar to what Thirumale from Pure Storage told AIM, Guzman said that Okteto’s custom Resource Manager predicts CPU and memory needs by analysing identical services across environments, optimising resource allocation to enhance cluster performance and developer experience.
Okteto also provides separate Kubernetes clusters for each company, offering tailored infrastructure configurations and supporting multi-cluster setups to address scalability challenges in large-scale environments.
In a discussion on Hacker News about the viability of Kubernetes, developers from different companies cited several reasons for Kubernetes being cumbersome. The main reasons remain the complex learning curve, maintenance inefficiency, vendor lock-in, overuse, and the mismatch between needs and scale.
The post Why Companies are Quitting Kubernetes appeared first on Analytics India Magazine.