After months of research and model development, our data science team has reached a critical insight that’s reshaping how we think about startup prediction: the industry’s fundamental approach to data is broken.
The Endogenous Bias Problem
When we started building predictive models for venture capital, we did what everyone does. We gathered data from Crunchbase, PitchBook, and CB Insights. We trained models on funding rounds, exits, and failures.
The results were… mediocre.
Not because our models were bad. Not because we lacked data. But because the data itself was fundamentally flawed.
The problem: endogenous bias.
When you train algorithms on performance data that isn’t captured at origin, you’re not predicting outcomes — you’re pattern-matching on signals that are already contaminated by the outcomes you’re trying to predict.
Let me explain with an example:
Imagine you’re trying to predict which startups will succeed. You train your model on data from Series B companies. The data looks clean. The patterns seem clear.
But here’s the catch: Series B companies have already been selected. They’ve already passed through multiple filters:
- They convinced an angel investor
- They survived to raise a seed round
- They impressed Series A investors
- They grew enough to merit a Series B
Your model isn’t learning to identify future winners. It’s learning to recognize companies that have already won their early battles. The selection bias is baked into every data point.
Same Data, Same Results
Here’s the uncomfortable truth about VC analytics: everyone uses the same data.
- Crunchbase: 80% of the industry
- PitchBook: The “premium” alternative
- CB Insights: For those who want charts
The data sources are commoditized. The features are commoditized. The insights are commoditized.
When everyone trains on the same data, everyone reaches the same conclusions. There’s no edge. No alpha. Just expensive confirmation of what everyone already knows.
The T0 Solution
We asked a different question: What if we captured data before the selection bias occurs?
This led us to T0 — the moment of origin. Specifically, the original pitch deck a founder creates before they’ve talked to anyone.
Think about what a first pitch deck contains:
- Raw founder psychology
- Unfiltered market assumptions
- Genuine (not coached) financial thinking
- Authentic team dynamics
- Real competitive positioning
This data has never been systematically captured. No one has a database of original pitch decks with outcome tracking.
Until now.
Building the T0 Database
With PULSE, our AI-powered pitch deck analysis engine, we’re building something unprecedented: a database of startup DNA captured at the moment of creation.
Here’s what we’re extracting:
Quantitative Signals
- Financial projection patterns
- Market sizing approaches
- Growth rate assumptions
- Burn rate expectations
- Valuation anchors
Qualitative Signals
- Narrative structure
- Problem articulation clarity
- Solution presentation confidence
- Team positioning
- Competitive framing
Meta Signals
- Deck design sophistication
- Information organization
- Emphasis patterns
- What’s included vs. omitted
- Storytelling coherence
The Model Advantage
With T0 data, our models learn from uncontaminated signals. We’re not pattern-matching on success survivors. We’re identifying the raw characteristics that correlate with future outcomes.
Early results are promising:
| Metric | Traditional Models | T0 Models |
|---|---|---|
| Prediction Accuracy | 62% | 79% |
| False Positive Rate | 34% | 18% |
| Signal-to-Noise | Low | High |
(Based on internal validation against 600+ startups with known outcomes)
The Compounding Advantage
Here’s what makes this approach particularly powerful: it compounds.
Every pitch deck we analyze adds to our training data. Every outcome we track validates or refines our models. Every cycle makes our predictions sharper.
Traditional data providers are stuck. Their data is historical, static, and shared with everyone. Our data is proprietary, growing, and captured at the only moment that matters.
What This Means for VCs
If you’re an investor, T0 data changes your workflow:
- Earlier conviction: Identify promising startups before the crowd
- Better filtering: Reduce false positives with uncontaminated signals
- Deeper diligence: Understand founder psychology from day one
- Competitive edge: Access insights no one else has
The venture capital industry has been flying blind, using data that’s already been filtered by the outcomes we’re trying to predict.
T0 data is how we finally learn to see.
Our next post will explore WHISPER — how we transform T0 signals into actionable prediction scores. Subscribe to be notified.
Written by
Mariana Canet
Head of Data
Part of the Xylence team building the predictive intelligence layer for global capital.