Skip to main content

The Modern Data Stack is Consolidating. Here's What Comes Next.

After years of proliferation, the data ecosystem is consolidating. We examine which categories will survive and where the next wave of innovation will emerge.

GASJ Team12 min read

The Proliferation Era

Between 2018 and 2023, the data ecosystem exploded. At peak, a typical enterprise data stack included 15+ specialized tools: ingestion, transformation, orchestration, cataloging, observability, reverse ETL, and more.

This fragmentation was driven by venture capital. With cheap money, every point solution could raise rounds and grab a slice of data budgets. Integration was someone else's problem.

The Hangover

Reality set in around 2024. Data teams discovered that:

Integration burden is massive: Connecting 15 tools creates 105 potential integration points. Each integration is a maintenance liability and potential failure mode.

Talent is scarce: Hiring engineers who understand dbt, Airflow, Spark, Fivetran, Snowflake, and Looker is nearly impossible. The cognitive overhead of the modern stack exceeds human capacity.

Costs compound: Each tool has its own pricing model, often based on data volume or compute. Overlapping functionality means paying multiple vendors to do similar things.

What Survives

We see consolidation around three anchors:

The Warehouse: Snowflake, Databricks, and BigQuery will absorb adjacent functionality. Transformation (dbt-style), orchestration, and light governance will become platform features. Most enterprises will standardize on one.

The Catalog: Data discovery and governance can't be fully absorbed by warehouses due to multi-cloud realities. Standalone catalogs with strong lineage and access control (Alation, Atlan) remain valuable.

The Observability Layer: Data quality and observability require independence from the systems they monitor. Tools that can watch across the entire stack (Monte Carlo, our portfolio company Nexus) will remain standalone.

What Gets Absorbed

Ingestion (Fivetran-style) will increasingly be bundled with warehouses. Basic transformation becomes a commodity. Reverse ETL was always a feature, not a category.

The Next Wave

Post-consolidation, we see innovation in:

AI-Native Data: Tools that assume LLMs as first-class citizens. Semantic layers that speak natural language. Automated data quality that learns from context.

Real-time Convergence: The batch/streaming divide is artificial. Systems that unify both paradigms (Kafka + warehouse fusion) will win.

Privacy Infrastructure: As regulations expand, privacy-preserving computation (differential privacy, secure enclaves) moves from research to production requirement.

Portfolio Implications

We're positioned for this transition through Nexus (orchestration + observability) and Meridian (privacy-native analytics). We're actively looking for AI-native data tools and real-time infrastructure plays.

The data market is maturing. That's good for sustainable businesses and bad for hype-driven point solutions.

More Insights

8 min read

Why We Hold Forever: Our Approach to Permanent Capital

Most private equity operates on a 5-7 year fund cycle. We believe the best software companies deserve a longer time horizon. Here's why we've structured GASJ as a permanent capital vehicle.

PhilosophyStrategyLong-term Thinking
Read article
7 min read

Security as a Feature, Not an Afterthought

How we help portfolio companies build security into their products and organizations from day one, and why it's becoming a competitive advantage.

SecurityOperationsPortfolio Support
Read article
10 min read

What We Look For: Evaluating Infrastructure Software

A transparent look at our investment criteria for infrastructure and developer tools companies. What makes a business durable enough for permanent ownership.

StrategyDeveloper ToolsInfrastructure
Read article

Want to discuss this further?

We're always happy to chat with founders and operators about technology infrastructure and investing.