Your Data Is Lying
The Multi-Million Dollar Difference Between Blocked and Booked

Every property manager I know obsesses over AirDNA data. They benchmark their performance, underwrite new acquisitions, and make pricing decisions based on scraped occupancy metrics.
Here's the problem: that data is systematically overstating your market's actual paid performance by 8-10 percentage points. And almost nobody is talking about it.
The Blocked-Night Problem
The core distortion is simple: "non-revenue nights" like owner stays and maintenance holds get miscounted as paid bookings.
According to Key Data's analysis of occupancy metrics, in the U.S., owner and hold nights each average approximately 10% annually. That means roughly 20% of calendar inventory generates no revenue.
Globally, AllTheRooms reports that hosts block their calendars around 20% of the time.
If scraped datasets misclassify even half of these non-revenue nights as booked, paid occupancy is overstated by 8-10 percentage points.
Get more insights like this
Weekly STR tech updates. No spam.
How Scraped Data Actually Works
When a scraper views a calendar, it sees dates as either "available" or "unavailable." The "unavailable" status covers:
- Paid bookings
- Owner stays
- Maintenance blocks
- Platform-imposed restrictions
As peer-reviewed research in PLOS One notes, this ambiguity "likely results in some overestimation of occupancies."
AirDNA explicitly acknowledges this limitation. Their enterprise documentation states they "can not perfectly distinguish between booked and blocked nights."
The May 2024 Reclassification
This isn't theoretical. In May 2024, AirDNA retroactively reclassified "Advanced Notice" and "Preparation Time" days from booked to available, altering historical baselines back to 2022.
That means if you compared your 2023 performance to 2022 market data before May 2024, your baseline was wrong.
The Overestimation Math
Key Data provides a concrete example: a property with 44 guest nights, 12 owner nights, and 8 hold nights would show:
- Calendar Occupancy (scraped view): 64%
- Adjusted Paid Occupancy (direct view): 54%
That's a 10-point discrepancy on a single property. Scale that across a market, and your underwriting assumptions are fundamentally flawed.
| Non-revenue nights | Misclassification rate | Occupancy overstatement |
|---|---|---|
| 10% (Conservative) | 50% | 5 points |
| 20% (Typical US) | 50% | 10 points |
Why It Varies by Market
The magnitude of bias varies significantly by market type.
Leisure and Seasonal Markets: Owner usage and seasonal maintenance are prevalent. Key Data reports that hold occupancy alone can reach nearly 16% in some periods. Expect higher bias; a correction of 10-20 percentage points may be necessary.
Urban and Professionalized Markets: Fewer owner stays, more standardized availability. Bias is likely lower but present. A correction of 5-10 percentage points is a reasonable baseline.
The Revenue Modeling Problem
Revenue estimates from scraped data often assume a booked night was sold at the "last available rate" observed before booking. According to AirDNA's own documentation, this fails to account for:
- Discounts: Weekly, monthly, or annual discounts are invisible to scrapers
- Off-Platform Bookings: Direct bookings may have different pricing
- Channel Blind Spots: Scrapers can't identify channel-specific pricing variations
The Provider Landscape
| Provider | Source Type | Blocked vs Booked Handling | Limitations |
|---|---|---|---|
| AirDNA | Scraped | ML classification; admits imperfect distinction | 92% accuracy claim dates to 2015 training data |
| Key Data | Direct PMS + Scrape | Explicitly separates guest/owner/hold nights | ~700k properties via 65+ PMS integrations |
| AllTheRooms | Scraped | Tracks booked, blocked, vacant separately | Provides "Adjusted Occupancy" definitions |
What You Should Actually Do
For Revenue Management: Use Adjusted Paid Occupancy (Guest Nights / Total minus Owner minus Hold) as your primary metric. Relying on Calendar Occupancy creates false signals of high demand.
For Underwriting: When using scraped data for new acquisitions, apply a "haircut" to both occupancy and ADR. As Key Data recommends, a conservative approach reduces scraped RevPAR by 10-15% unless validated otherwise.
For Due Diligence: When evaluating data providers, ask specifically:
- "What is the precision of your booked-vs-blocked model for current data?"
- "How do you handle Preparation Time and Advanced Notice blocks?"
- "What percentage of your data comes from direct PMS integrations versus scraping?"
The Bottom Line
Scraped data captures the breadth of supply but systematically overstates paid demand. Direct-source data captures true demand but not the whole market.
The smart operators use both: direct-source data to anchor financial KPIs like RevPAR and Adjusted Paid Occupancy, while leveraging scraped data for broad supply analysis.
Without local calibration, you're making decisions based on numbers that are systematically too optimistic. And in a market where margins are already compressed, that's a luxury you can't afford.
Discussion