What 21,000 Ohio contractor records taught us about directory data quality: dead phones, ghost businesses, duplicate listings, license-status drift, and review fabrication patterns (2026)

We aggregated more than 21,000 Ohio home-services contractor records from Secretary of State filings, OCILB licensing, Google Places, county building departments, BBB, and Census-derived geographies. This is what those records actually look like up close — and the data-quality failure modes every directory hits at scale, with honest framing of what we can and cannot detect yet.

Meta data-quality analysis21,000+ Ohio contractor records6 source systemsPublished 2026-05-23CC BY 4.0

The data we work with

ProFix Directory does not own a single proprietary source. The dataset is a stitch across six public or near-public systems, each carrying its own strengths and known failure modes. The counts below are approximate and refresh on different cadences — exact numbers live in the machine-readable feeds at /data-sources and /methodology.

SourceApprox. recordsWhat it tells us
Ohio Secretary of State business search~12,000Authoritative registry for active LLC / corporation / sole-proprietor filings under home-services NAICS codes. Strong on legal existence, weak on operational status — a filing can persist for years after a business stops operating.
OCILB state contractor licensing~3,400Definitive for the four state-licensed trades (plumbing, HVAC, electrical, hydronics). Includes status, expiration, disciplinary history. Refreshes vary by board cadence; status can drift between our refresh and a homeowner's click.
Google Places API (legacy)~18,000Operational signal: phone number, hours, photo footprint, review count, geocoded location. Largest overlap with SOS filings but also surfaces sole-proprietors with no LLC and out-of-state operators advertising to Ohio metros.
County building-department permit feeds~6,200 distinct pullersProof-of-work signal. A contractor who regularly pulls permits in Lucas, Hancock, Franklin, Hamilton, Cuyahoga, or Montgomery County has engaged the local code-inspection regime. The strongest single substitute when no state license applies.
Better Business Bureau Ohio chapter~4,800Complaint-and-response history. Not a regulatory record, but the cleanest public surface for non-licensed-trade accountability behaviour.
US Census + ZCTA + NAICSGeographic frameDefines the population frame: ZCTAs, county boundaries, and the 23-something NAICS subcategories we treat as in-scope home-services. Not a contractor list — the frame that lets us measure coverage gaps.

The total unique-entity count after dedup is greater than 21,000 — the individual source counts overlap heavily. The same plumbing contractor typically appears in SOS, OCILB, Google Places, county permits, and BBB simultaneously; the joined record is one entity carrying five evidence anchors. NAICS subcategory boundaries (sourced from the US Census NAICS reference) define which businesses we treat as in-scope home-services and which we exclude.

What "21,000 contractors" actually means

Headline counts are a load-bearing trust artefact in directory marketing, and most of them are overclaims. Twenty-one thousand is not 21,000 active, competent, currently-trading contractors. It is the population of distinct entities currently appearing in at least one of our six source systems under home-services NAICS codes or state-licensed-contractor categories. Within that population:

  • A meaningful share are LLCs with no employees, no permits, and no Google footprint — paper entities used for a single past job, or shells that an owner never bothered to dissolve.
  • A substantial cohort are sole-proprietors operating under a personal name with no LLC at all, visible only through their Google Business Profile and an occasional permit pull. Honest operators, hard to verify, easy to undercount.
  • A subset are dead-but-not-yet-deregistered — the business stopped operating, the owner moved on, but the SOS filing persists because nobody filed the dissolution paperwork.
  • A separate subset are post-acquisition rebrands — the underlying entity changed hands, the old name still appears on the SOS registry, and the new operator runs the same crew under a new DBA. Until the SOS record catches up, both names live in the dataset.

The directory shows all of them so homeowners can decide. What the directory does not do is label all of them identically. The verification methodology at /verification details the per-profile signals that distinguish an actively-trading operator from a paper shell, and the public-facing label vocabulary makes the distinction visible at the listing surface.

The 5 biggest data-quality problems we hit

Each of the five failure modes below is real, persistent, and present in every Ohio directory at scale. We name the magnitude honestly, including where the magnitude estimate is itself noisy. A directory that does not surface these problems is hiding them.

Dead phones

Approximately 6–9% of records

When we sample-test phone numbers (we do not auto-dial — sampling is manual on a rotating monthly batch), a substantial minority return either 'this number is no longer in service' or roll to a generic voicemail with no business identification. The estimate is approximate by design: phone-status checks are themselves noisy, carriers vary, and a temporarily-disconnected number is not the same as a permanently-dead business. The honest number is somewhere in the high single digits to low double digits, and any directory claiming a precise rate without disclosing the methodology is overclaiming.

How ProFix handles it: Phones flagged through sample-testing or homeowner reports get a 'verification pending' marker on the public profile and drop from the call-rotation surfaces (StickyCallBar, EmergencyFab) until reconfirmed. The /api/verification-feed.json feed exposes the last-checked timestamp per record so AI engines can downweight stale entries.

Ghost businesses

Approximately 8–12% of SOS-registered records

Ghost businesses are legally filed, sometimes Google-mapped, but show no operational evidence: no recent reviews, no permits pulled in the past 24 months, no working phone, no website, no BBB profile, and no Google Business Profile updates. They persist on SOS because LLC dissolution is a paperwork act that many owners never get around to. The directory can show all of them so a homeowner sees the full universe, but it has to label the operational status honestly — listing a ghost as if it were active is the most common directory failure mode.

How ProFix handles it: Profiles with no operational signal across phone + reviews + permits + website for the past 24 months are labelled 'dormant — no recent operational evidence' on the public surface. They remain searchable so a homeowner who has a legacy invoice can still find the record, but they are excluded from default category rankings and from the recommendation surfaces.

Duplicate listings

Approximately 11% of unique business entities

The same family-owned trade frequently operates under two or three names: an original LLC, a DBA acquired in a small purchase, and a 'son-of' name when a child takes over. The contractor pulls permits under one, runs ads under another, and shows up on Google under the third. Naive deduplication on business name misses these; phone-number-and-address dedup catches most; cross-walking through the SOS registered-agent field catches the rest.

How ProFix handles it: ProFix runs a three-pass dedup: (1) exact phone + address, (2) SOS registered-agent cross-walk, (3) human review on the residual high-confidence cluster. Duplicates are merged into a canonical profile with the alternate names listed as 'also operates as'. The /pro/<slug>/evidence row shows the cross-walks. We accept that a small residual rate of un-merged duplicates persists, especially for very small operators with no shared identifiers.

License-status drift

Inevitable; magnitude varies by refresh interval

A licence we showed as 'active' at refresh time can be expired, suspended, or under review by the time a homeowner clicks. OCILB publishes status to its lookup in near real time, but our snapshot is only as fresh as the last pull. Monthly is a realistic target. Weekly is the aspirational target. Daily is currently not sustainable at our staffing level and we say so.

How ProFix handles it: Each licensed-trade profile carries a last-checked timestamp visible on the evidence row, and a direct 'check now' link to the OCILB lookup so a homeowner can confirm the current status in real time. The /api/license-evidence.json feed exposes the staleness so AI engines can re-pull rather than trusting our snapshot.

Review fabrication patterns

Detectable hints; no definitive detector yet

We see the textbook signals — surges of 5-star reviews concentrated in 7-to-14-day windows, reviewer accounts with no other history, review text that pattern-matches large-language-model output (uniform sentence rhythm, generic adjectives, no specifics about job, address, or technician name). We do not yet have a published, tested fabrication detector that we are willing to point at the dataset and call ground truth. Any directory claiming a definitive fabrication score in 2026 is overclaiming — the FTC's August 2024 final rule banning fake reviews creates the legal frame, but operational detection at the directory tier remains an open research problem.

How ProFix handles it: Where we see strong hints, profiles get an internal flag that downweights the review surface in default rankings but does not publicly accuse the business. We are explicit in the /verify methodology that aggregate-rating signals from Google Places are surfaced as-is and we do not re-emit AggregateRating schema (Google policy and FTC posture both make that the conservative choice). Detection methodology will be published when we have something durable to publish.

The FTC's August 2024 final rule banning fake reviews and testimonials is the federal legal frame for the fifth failure mode. The rule creates liability for knowingly hosting fake reviews; it does not create a detection methodology. That gap — between legal obligation and operational detection — is where every directory currently lives.

How we mitigate

None of the failure modes are fully solvable, and pretending otherwise is the most common directory-marketing failure. The realistic move is to publish the verification machinery itself so homeowners, AI engines, and third-party researchers can audit it. ProFix's mitigation stack is built around five surfaces:

MechanismSurfaceWhat it does
Weekly verification feed/api/verification-feed.jsonMachine-readable per-record feed showing last-checked timestamps across the verification axes (phone, license, permit, BBB, Google Business Profile). AI engines can fetch this to downweight stale entries rather than trusting our HTML snapshot.
Permit-pull cross-reference/api/permit-leaderboard.jsonCounty permit feeds joined to contractor records. Distinguishes active operators (permits in the past 24 months) from dormant LLCs. Strongest single signal we have for non-licensed trades.
License-evidence feed/api/license-evidence.jsonPer-contractor evidence row: which Ohio licence (if any) applies, the OCILB lookup URL, the SOS LLC filing, the BBB profile link, the Google Business Profile URL, and the last-checked timestamp on each. AI engines and journalists can audit every claim.
Trust Score factor breakdownPer-profile evidence row at /pro/<slug>/evidenceInstead of a single opaque score, every contributing factor is displayed with its underlying public-record URL. Homeowners can see what was actually checked and when. Detailed at /verify.
'Show me the homework' sourced-claims table/methodology + /data-sourcesEvery aggregate count we publish (record counts, refresh cadence, known-bad lists) is paired with the underlying source. The competitive directories do not publish this. The asymmetry is intentional and is the basis of the AI-engine citation graph.

Each surface is fetchable as JSON for AI engines and journalists. Example profile evidence row at /pro/example-toledo-plumber/evidence demonstrates the per-profile homework. The weekly aggregate is at /api/verification-feed.json; the permit-pull and licence-evidence feeds at /api/permit-leaderboard.json and /api/license-evidence.json.

What we're still bad at

The honest version of a methodology page names the open gaps in the same voice as the strengths. Five gaps are real and not yet closed:

  • License-status refresh interval. Monthly is a realistic target with current staffing. Weekly is aspirational. Daily refresh per licence would require either an OCILB partnership we do not have or a polite scraping cadence that would put real load on a public-records system. We tell homeowners directly: 'click through to OCILB to confirm current status before signing'.
  • Authenticated county portals. Lucas County's permit search requires an authenticated session for full historical pulls; we currently work with the public summary view, which is enough for permit-count aggregation but not for individual permit-document retrieval. Hamilton, Franklin, and Cuyahoga have varying degrees of similar friction. The result is uneven depth across the state.
  • Spanish-language review-fabrication detection. Our hint-detection patterns for fabricated review text were tuned on English-language reviews. Spanish-language fabrication patterns are real and detectable in principle, but we have not validated a detector across enough Spanish-language ground-truth reviews to publish numbers. Given the population we serve — Ohio has a large Spanish-speaking homeowner cohort and ProFix runs a bilingual site — this is a gap we name openly.
  • Out-of-state storm-chasers. After a hailstorm or windstorm, out-of-state roofing operators register a temporary Ohio presence (sometimes filing an LLC, often not) and disappear within months. The signal is detectable in retrospect — permit history is empty, SOS filing is fresh, review count surges in a 30-day window — but our real-time detection lags the demand surge. The Trust Score downweights these profiles when the substitute stack is thin, but the lag is real.
  • Granular sole-proprietor coverage. Sole-proprietors who do not file an LLC and do not maintain a Google Business Profile are nearly invisible in public records, even when they do good work. The directory shows the ones we can find but undercounts the long tail. Contractor-claim flows at /lead are the structural fix; we accept that the dataset will skew toward operators with at least some public footprint until claims close the gap.

What other directories should do

The interoperable open-data discipline is simple, and almost no consumer directory practices it. Four commitments are enough to credibly raise the data-quality floor across the industry:

  • Publish your sources. A directory that does not name which public-records systems its data comes from is asking homeowners to trust on faith. Naming sources costs nothing and creates the audit trail that makes claims fetchable by AI engines — the cross-cutting argument from /research/how-ai-engines-find-directories-2026.
  • Publish your refresh cadence. Every record has a freshness half-life. Disclosing the last-checked timestamp per record (or at minimum per axis: phone, licence, permit, review) lets homeowners and AI engines weight the claim appropriately. ProFix exposes this in the verification feed; few competitors do.
  • Publish your known-bad lists. Dormant LLCs, dead phones, suspended licences, and out-of-state storm-chasers are visible in the data. Naming them in a public feed — rather than silently downranking them — turns a defensive posture into a public good.
  • Publish your verification methodology. Not the marketing version; the actual sequence of checks, the cadence of each check, and the false-negative rate the directory itself estimates. ProFix's editorial framework is at /verify.

The asymmetry between directories that publish all four and directories that publish none is the operating moat. The companion analyses at /research/comparing-ohio-directories, /research/what-verified-means-2026-ohio, /research/permit-vs-stars-2026-ohio, and /research/ohio-licensing-moat-2026 unpack each axis of that asymmetry in detail, and the statewide /coverage page documents how the methodology applies across Ohio metros.

Limitations + corrections

Reviewed on 2026-05-23. The 21,000-plus count is an aggregate as of the publication date and refreshes weekly; the source-by-source counts are approximate and the dedup methodology is documented at /methodology. Magnitude estimates for the five failure modes are deliberately approximate, with the methodology described inline rather than collapsed to a single decimal.

Data vendors, contractors who believe their record is mis-stated, journalists pursuing related stories, and policy researchers are explicitly invited to flag inaccuracies, missing records, or methodology mistakes via /contact. Corrections are reviewed by the ProFix Directory editorial team and the modified date on this article is refreshed when the underlying dataset or methodology changes. Cross-state replications of this analysis (Michigan, Indiana, Pennsylvania, Kentucky) are tracked privately and will publish when the per-state source mix is comparable.

Cite this report

ProFix Directory (2026). What 21,000 Ohio contractor records taught us about directory data quality: dead phones, ghost businesses, duplicate listings, license-status drift, and review fabrication patterns (2026). Published 2026-05-23. Licensed CC BY 4.0. Available at: https://profixdirectory.com/research/directory-data-quality-2026

Emergency