OpenTelemetry Is Ageing Like Fine Wine

Why is Databricks Betting on Data Engineering Over AI Magic?

Founded in 2016, OpenTelemetry set out to end the observability nightmare. Enterprises were drowning in custom integrations, where each data source demanded its own connector for metrics, traces, telemetry, and logs.

Today, in an AI-first world, OpenTelemetry continues to affirm its original purpose: to provide enterprises with an open, standardised, and vendor-neutral framework for collecting machine data from numerous sources.
For example, Splunk, the leading contributor to the OpenTelemetry project, incorporates the framework extensively into its newest AI observability solutions. Its AI Agent Monitoring feature, in its cloud platform, tracks LLM-based applications through performance, cost, and behaviour metrics – all built on OpenTelemetry’s vendor-neutral foundation.

Morgan McClean, the director of product management at Splunk and the co-founder of OpenTelemetry, told AIM that OpenTelemetry is the only agent mechanism being used in Splunk’s Observability Cloud platform.

Splunk originally used the Smart Agent from SignalFX (a company it acquired a few years ago), but in 2022, switched entirely to OpenTelemetry as a standalone mechanism.
“It was a big strategic bet we made at the time, but it has paid off,” said McClean, adding, “we also switched all of the Kubernetes log instrumentation to the Splunk Platform. That all now uses OpenTelemetry.”

Even Splunk’s recently announced Database Monitoring solution follows the same pattern. The tool provides query-level insights such as wait times, CPU usage, memory consumption, and execution plans, via OpenTelemetry instrumentation.

The trend extends across major cloud providers. Companies like IBM, AWS, Dynatrace, Insightfinder and others provide OpenTelemetry-based AI observability solutions. Recently, AWS also launched a Generative AI Observability Preview, where, using the AWS Distro for OpenTelemetry (ADOT) SDK, AI agents and frameworks can be auto-instrumented.

The recently announced Gemini CLI GitHub Actions from Google is integrated with OpenTelemetry, allowing users to stream logs and metrics to platforms such as Google Cloud Monitoring.

But platform adoption is only part of the story. The more important change is that AI frameworks now themselves generate OpenTelemetry-compliant data natively.

AI Frameworks Embrace Native Telemetry

This closes the loop. Frameworks produce standardised data using OpenLLMetry, and platforms like Splunk or AWS ingest and analyse it without custom adapters.

“One of the other things happening in OpenTelemetry due to AI is just a lot more semantic standardisation around AI. The most visible part of this is OpenLLMetry,” said McClean.

The OpenTelemetry community has recently established semantic conventions specifically for AI workloads, known as OpenLLMetry. These conventions standardise how telemetry for aspects such as model inputs/outputs, token usage, response metadata, and others are recorded.

OpenTelemetry normalises these signals, enabling consistent comparison and analysis of AI workloads, regardless of whether the underlying model is served via OpenAI, Hugging Face, Anthropic, or a custom deployment.

To support this further, instrumentation libraries are being developed and released. For example, there are ongoing efforts to provide plug-in instrumentation for the OpenAI Python API and other similar software development kits (SDKs). These libraries automatically capture telemetry data like prompts, responses, and token counts, without manual instrumentation from developers.

And these semantics are what make it possible for AI frameworks to support OpenTelemetry data out of the box. As McClean notes, this means the community no longer has to build one-off integrations reactively as each new framework emerges.

Instead, frameworks like CrewAI, LangGraph, and PydanticAI emit OpenTelemetry data natively, aligning with OpenLLMetry semantics from the start.

This makes it easy for teams who are using these frameworks in their AI/agentic applications to export telemetry data to platforms that offer observability features.

“This further validates the vision and strategy that everyone in the OpenTelemetry community has had,” said McClean, as he discussed the motivation with which the project began.

The Origins

Organisations needed to capture data from applications written in different languages, running on various operating systems, and using countless frameworks and libraries.

The challenge was both technical and economic. “Each individual integration is pretty straightforward, but there are so many of them. And even if you go and build them all, you need to maintain them perpetually,” he said.

Traditional platform monitoring vendors faced the same constraints. Even with teams of 40-100 engineers dedicated to building monitoring software and integrations, they could only support a limited set of languages.

“They would support Java and .NET, but if you went in and said I have an application written in Go, they’d say ‘Well, we can’t help you’,” added McClean.

OpenTelemetry solved this problem by creating a standardised, open-source framework that could be used to instrument, generate, collect, and export telemetry data. Companies like Google, Microsoft, Splunk and others came together to create, maintain and update this standard.

In 2019, the framework was integrated into the Cloud Native Computing Foundation (CNCF), home to the popular Kubernetes framework today. OpenTelemetry is the second most popular project in CNCF, just behind Kubernetes.

However, what highlights the importance of OpenTelemetry in the AI era is what occurs after OpenTelemetry does its job, which is providing observability to teams.

Observability Is AI’s Trust Foundation

Jeetu Patel, president and chief product officer at Cisco, explained why observability is the need of the hour.

“One of the big constraints of AI is a trust deficit. And if people don’t trust the system, they’re not going to use it. You can’t really trust something that you don’t see,” Patel told AIM, adding that visibility is essential across the stack, from the silicon to the agent.

“And you know how the GPU is performing, you know how the models are performing, you know the data going into the models, you know what the tokenomics look like,” Patel explained.

All of those things put together are what allow organisations to start reimagining and automating workflows, he concluded.

OpenTelemetry plays a key role in integrating data. As tools and platforms enhance their observability features, it’s still essential for companies to input the correct data into these systems to achieve meaningful insights. Poor data quality obscures outage root causes and system bottlenecks.

Although AI platforms and products largely meet their advertised capabilities, occasional downtimes remain a challenge. In recent months, there have been several instances where ChatGPT experienced a downtime, and situations where Google Cloud faced an outage, causing disruptions to related platforms.

These reliability challenges fuel continued innovation in observability. As platforms work toward zero downtime, observability companies find endless opportunities to build more sophisticated monitoring solutions.

The post OpenTelemetry Is Ageing Like Fine Wine appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...