The Proliferation Era
Between 2018 and 2023, the data ecosystem exploded. At peak, a typical enterprise data stack included 15+ specialized tools: ingestion, transformation, orchestration, cataloging, observability, reverse ETL, and more.
This fragmentation was driven by venture capital. With cheap money, every point solution could raise rounds and grab a slice of data budgets. Integration was someone else's problem.
The Hangover
Reality set in around 2024. Data teams discovered that:
Integration burden is massive: Connecting 15 tools creates 105 potential integration points. Each integration is a maintenance liability and potential failure mode.
Talent is scarce: Hiring engineers who understand dbt, Airflow, Spark, Fivetran, Snowflake, and Looker is nearly impossible. The cognitive overhead of the modern stack exceeds human capacity.
Costs compound: Each tool has its own pricing model, often based on data volume or compute. Overlapping functionality means paying multiple vendors to do similar things.
What Survives
We see consolidation around three anchors:
The Warehouse: Snowflake, Databricks, and BigQuery will absorb adjacent functionality. Transformation (dbt-style), orchestration, and light governance will become platform features. Most enterprises will standardize on one.
The Catalog: Data discovery and governance can't be fully absorbed by warehouses due to multi-cloud realities. Standalone catalogs with strong lineage and access control (Alation, Atlan) remain valuable.
The Observability Layer: Data quality and observability require independence from the systems they monitor. Tools that can watch across the entire stack (Monte Carlo, our portfolio company Nexus) will remain standalone.
What Gets Absorbed
Ingestion (Fivetran-style) will increasingly be bundled with warehouses. Basic transformation becomes a commodity. Reverse ETL was always a feature, not a category.
The Next Wave
Post-consolidation, we see innovation in:
AI-Native Data: Tools that assume LLMs as first-class citizens. Semantic layers that speak natural language. Automated data quality that learns from context.
Real-time Convergence: The batch/streaming divide is artificial. Systems that unify both paradigms (Kafka + warehouse fusion) will win.
Privacy Infrastructure: As regulations expand, privacy-preserving computation (differential privacy, secure enclaves) moves from research to production requirement.
Portfolio Implications
We're positioned for this transition through Nexus (orchestration + observability) and Meridian (privacy-native analytics). We're actively looking for AI-native data tools and real-time infrastructure plays.
The data market is maturing. That's good for sustainable businesses and bad for hype-driven point solutions.