Datadog puts a leash on Kubernetes cloud costs

Cloud monitoring firm Datadog has unveiled its Kubernetes Autoscaling service, a set of capabilities that intelligently automate resource optimization. The technology automatically scales customers’ Kubernetes environments based on real-time and historical utilization metrics.

Datadog claims it is the first observability platform vendor to enable customers to make changes to their Kubernetes environments directly from the platform. When deploying applications on Kubernetes, teams often choose to over-provision resources as a way to avoid infrastructure capacity issues affecting end users. This can lead to a large amount of wasted compute and increased cloud costs.  

Yrieix Garnier, Datadog
Yrieix Garnier

“Containers are a leading area of wasted spend because so many costs are associated with idle resources, but organizations also can’t risk degrading performance or not having enough resources to scale,” said Yrieix Garnier, VP of product at Datadog. “The key for businesses is to find a balance between control and automation. Datadog is the only enterprise-grade, unified platform that provides end-to-end observability, security and resource management at scale for any Kubernetes-driven organization.”

According to Datadog research, 83 percent of container costs are associated with idle resources. “For this reason, it’s critical organizations have a solution in place which can monitor resource usage and optimize infrastructure performance and computing costs, while ensuring applications remain performant with enough resources to scale,” it said.

Datadog Kubernetes Autoscaling continuously monitors and automatically “rightsizes” Kubernetes resources. “This leads to significant cost savings for an organization’s cloud infrastructure, and helps to ensure optimal application performance for workloads, improved user experiences and better ROI on container assets,” said Datadog.

Customers are able to identify workloads and clusters with a high number of idle resources, implement a one-time fix through intelligent automation or enable Datadog to automatically scale the workload on an ongoing basis. These capabilities, we’re told, empower operators and their customers to decide the right balance on cost and user experience based on their risk profiles.

Teams can use a unified view and an “intuitive” UI that displays Kubernetes resource utilization and cost metrics, making it “simple” for any team member to understand and scale resources. Organizations gain full visibility into how rightsizing impacts their workload and cluster performance, backed by high-resolution trailing container metrics, so teams can take action based on this rich context, says the firm.

Earlier this month, Datadog introduced Data Jobs Monitoring, which allows teams to detect “problematic” Spark and Databricks jobs anywhere in their data pipelines, allowing them to remediate failed and long-running-jobs faster, and optimize over-provisioned compute resources.