Your Data Is Lying

Every property manager I know obsesses over AirDNA data. They benchmark their performance, underwrite new acquisitions, and make pricing decisions based on scraped occupancy metrics.

Here's the problem: that data is systematically overstating your market's actual paid performance by 8-10 percentage points. And almost nobody is talking about it.

The Blocked-Night Problem

The core distortion is simple: "non-revenue nights" like owner stays and maintenance holds get miscounted as paid bookings.

According to Key Data's analysis of occupancy metrics, in the U.S., owner and hold nights each average approximately 10% annually. That means roughly 20% of calendar inventory generates no revenue.

Globally, AllTheRooms reports that hosts block their calendars around 20% of the time.

If scraped datasets misclassify even half of these non-revenue nights as booked, paid occupancy is overstated by 8-10 percentage points.

How Scraped Data Actually Works

When a scraper views a calendar, it sees dates as either "available" or "unavailable." The "unavailable" status covers:

Paid bookings
Owner stays
Maintenance blocks
Platform-imposed restrictions

As peer-reviewed research in PLOS One notes, this ambiguity "likely results in some overestimation of occupancies."

AirDNA explicitly acknowledges this limitation. Their enterprise documentation states they "can not perfectly distinguish between booked and blocked nights."

The May 2024 Reclassification

This isn't theoretical. In May 2024, AirDNA retroactively reclassified "Advanced Notice" and "Preparation Time" days from booked to available, altering historical baselines back to 2022.

That means if you compared your 2023 performance to 2022 market data before May 2024, your baseline was wrong.

The Overestimation Math

Key Data provides a concrete example: a property with 44 guest nights, 12 owner nights, and 8 hold nights would show:

Calendar Occupancy (scraped view): 64%
Adjusted Paid Occupancy (direct view): 54%

That's a 10-point discrepancy on a single property. Scale that across a market, and your underwriting assumptions are fundamentally flawed.

Non-revenue nights	Misclassification rate	Occupancy overstatement
10% (Conservative)	50%	5 points
20% (Typical US)	50%	10 points

Why It Varies by Market

The magnitude of bias varies significantly by market type.

Leisure and Seasonal Markets: Owner usage and seasonal maintenance are prevalent. Key Data reports that hold occupancy alone can reach nearly 16% in some periods. Expect higher bias; a correction of 10-20 percentage points may be necessary.

Urban and Professionalized Markets: Fewer owner stays, more standardized availability. Bias is likely lower but present. A correction of 5-10 percentage points is a reasonable baseline.

The Revenue Modeling Problem

Revenue estimates from scraped data often assume a booked night was sold at the "last available rate" observed before booking. According to AirDNA's own documentation, this fails to account for:

Discounts: Weekly, monthly, or annual discounts are invisible to scrapers
Off-Platform Bookings: Direct bookings may have different pricing
Channel Blind Spots: Scrapers can't identify channel-specific pricing variations

The Provider Landscape

Provider	Source Type	Blocked vs Booked Handling	Limitations
AirDNA	Scraped	ML classification; admits imperfect distinction	92% accuracy claim dates to 2015 training data
Key Data	Direct PMS + Scrape	Explicitly separates guest/owner/hold nights	~700k properties via 65+ PMS integrations
AllTheRooms	Scraped	Tracks booked, blocked, vacant separately	Provides "Adjusted Occupancy" definitions

What You Should Actually Do

For Revenue Management: Use Adjusted Paid Occupancy (Guest Nights / Total minus Owner minus Hold) as your primary metric. Relying on Calendar Occupancy creates false signals of high demand.

For Underwriting: When using scraped data for new acquisitions, apply a "haircut" to both occupancy and ADR. As Key Data recommends, a conservative approach reduces scraped RevPAR by 10-15% unless validated otherwise.

For Due Diligence: When evaluating data providers, ask specifically:

"What is the precision of your booked-vs-blocked model for current data?"
"How do you handle Preparation Time and Advanced Notice blocks?"
"What percentage of your data comes from direct PMS integrations versus scraping?"

The Bottom Line

Scraped data captures the breadth of supply but systematically overstates paid demand. Direct-source data captures true demand but not the whole market.

The smart operators use both: direct-source data to anchor financial KPIs like RevPAR and Adjusted Paid Occupancy, while leveraging scraped data for broad supply analysis.

Without local calibration, you're making decisions based on numbers that are systematically too optimistic. And in a market where margins are already compressed, that's a luxury you can't afford.