Feb 5, 2026

What 300,000 Unowned Datasets Taught Us About Semantics

Sriharsha Chintalapani
What 300,000 Unowned Datasets Taught Us About Semantics

At Uber, we had over 300,000 datasets, and most of them were unowned. Engineers couldn't find the data they needed, and when they did, they had no way to know if they could trust it. Who owned this table? Where did the data come from? What did the numbers actually mean?

We built Databook to solve this problem. But as the platform scaled and we tried to model dashboards, metrics, ML features, and business terms alongside datasets, we ran into a wall. The system wasn't designed to capture relationships between different types of entities. That's when we learned something that has shaped everything we've built since.

Semantics is bigger than metrics. A metric definition tells you how to calculate revenue. But it doesn't tell you what a "customer" is, how customers relate to orders, which policies govern that data, or whether you can trust it. Metrics are important, but as the OSI initiative itself has made clear, they are only one piece of the broader semantic puzzle.

That lesson drove us to build OpenMetadata. And it's the perspective we're bringing to the industry's growing conversation about semantic standards.

This is also why industry efforts like the Open Semantic Interchange (OSI) matter, since they intentionally start with a shared, vendor-neutral foundation designed to evolve as the ecosystem builds on and extends it.

What a complete semantic foundation must include

There's a lot of energy right now around semantic layers. In practice, many early semantic efforts focus on consistent KPI definitions across dashboards and BI tools. That focus is a deliberate and practical starting point, and one the OSI community has been explicit about as it builds toward a broader, shared semantic foundation.

Complete semantic intelligence goes beyond calculations and measures. It includes the entities and relationships that define your business. What is a "customer"? How does a customer connect to an order, a contract, or a support ticket? These aren't just join paths in a data model. They're the conceptual structure that gives your data meaning.

Business glossaries capture the definitions, synonyms, and tribal knowledge that keep teams aligned. Lineage and provenance tell you where data came from, what transformations shaped it, and whether you can trust it. Governance and classification determine who can access what and which policies apply. And increasingly, AI agents need rich context to understand relationships and constraints when answering business questions.

A metric store tells you what to calculate. A semantic foundation tells you what things mean. Both matter, and together they unlock far more than either alone. This distinction matters more than ever as AI agents become primary consumers of enterprise data, and those agents need the full picture: the knowledge graph that connects everything in addition to a consistent definition of revenue.

We've been building this for years

At Collate, we started early on semantic standards, building on existing open frameworks while the industry was still converging.

The lesson from Uber's Databook drove how we designed OpenMetadata. When metadata systems can't capture relationships across entity types, they break. So we built OpenMetadata as a knowledge graph from the start, capturing not just datasets and metrics but dashboards, ML features, business terms, owners, lineage, and quality signals. Everything lives in a unified graph where relationships matter as much as the data itself.

We also built it on open standards:

  • JSON Schema with over 700 strongly typed, well-documented schemas covering every entity from tables and dashboards to ML models and glossary terms

  • RDF Ontologies powering a unified metadata graph for semantic interoperability and federated discovery

  • Open APIs designed for extensibility and integration across any tool in your stack

Our Semantic Intelligence Platform translates metadata and business context into meaning, enabling organizations to discover, govern, and put trusted data to work.

We've been doing this for years across thousands of deployments, and OpenMetadata is now the fastest-growing open source metadata project. Our community has helped shape these production standards. Now that the industry is aligning around shared semantic standards, initiatives like OSI provide the forum to evolve this work collaboratively across the ecosystem, which is why we’re excited to contribute what we’ve learned.

Why we're joining OSI

The Open Semantic Interchange initiative represents that alignment. Metric drift is a real problem. When "revenue" means something different on every dashboard, trust erodes and decision-making slows. OSI is bringing together Snowflake, Salesforce, dbt Labs, and others to create a vendor-neutral standard for metric definitions, and that's valuable work.

We're joining OSI to contribute what we've learned.

The initiative's early focus on metrics is a practical and necessary step, and one that aligns with the stated roadmap toward richer semantic constructs. The working groups are already thinking beyond metrics, with efforts on Ontology Representation and Catalog Integration that signal the community recognizes the broader opportunity.

We bring:

  • A decade of building metadata systems at scale across Uber, Hortonworks, and now OpenMetadata

  • A schema-first, Ontology-aligned architecture that's already running in production across thousands of deployments

  • A perspective that spans the full data lifecycle from discovery and quality to governance and AI

Metric consistency matters, and it opens the door to something bigger. We want to see glossaries, lineage, governance, and AI context flow as freely as metric definitions. That's true semantic interoperability, and we're excited to help the industry get there.

This is just the beginning

Twenty years ago, a semantic layer meant a BI tool that hid SQL from business users. Today, it means something far more ambitious: a foundation for consistent analytics, trusted governance, and AI that actually understands your business.

That's what we've been building at Collate. OpenMetadata was designed for this moment, and OSI is an important milestone along that path. Our vision is closely aligned with OSI’s direction, extending beyond metric interchange toward a shared semantic foundation. We're building toward a world where a business analyst in finance, a data scientist in marketing, and an AI agent serving customers all work from the same semantic foundation. Same definitions. Same lineage. Same trust.

The semantic layer has grown up. And we're just getting started.

Learn more about the Open Semantic Interchange at the Snowflake blog.

Explore OpenMetadata at open-metadata.org and join the community.
Learn about Collate at
getcollate.io.

Ready for trusted intelligence?
See how Collate helps teams work smarter with trusted data