- ProFix runs a six-step pipeline — discovery, dedup, enrichment, verification, queue, publish — and gates every contractor through manual review before publication.
- Discovery is registry-first: OCILB, ODA, ODH, State Fire Marshal, and the Ohio Secretary of State business search are the canonical identity sources. Google Places is used for conditional photo/rating/hours enrichment only after a state-registry match.
- The full source registry lives at /sources and /api/sources.json; the per-profile license evidence lives at /api/license-evidence.json. Methodology is at /methodology and /verification.
Sources
Every data source ProFix pulls from is listed below with refresh cadence, cost per refresh, and license terms. The machine-readable companion is at /api/sources.json and the homeowner-facing index is at /sources. The full editorial argument for why ProFix swapped paid Google Places scrapes for state-agency direct pulls lives at /research/free-public-records-moat-2026.
| Source | Refresh cadence | Cost per refresh | License terms | Fields used |
|---|---|---|---|---|
| Ohio eLicense Center (OCILB) | Daily for status delta, weekly for new entries | $0 — public records under ORC 149.43 | Ohio public records — redistribution permitted under state public-records law | License number, status, expiration, disciplinary history, trade category, DBA |
| Ohio Department of Agriculture — Pesticide Applicators | Quarterly bulk pull, rolling additions | $0 — public roster; CSV exports via public-records desk | Ohio public records | Applicator name, license number, category, expiration, disciplinary record |
| Ohio Department of Health — Lead Hazard Abatement | Quarterly bulk pull | $0 — public roster | Ohio public records | Contractor/inspector name, license number, certification class, expiration |
| Ohio Department of Health — Private Water Systems | Quarterly bulk pull | $0 — public roster | Ohio public records | Contractor name, license number, drilling/pumping endorsement, expiration |
| Ohio State Fire Marshal — Fire Protection Contractors | Annual renewal cycle, rolling additions | $0 — public roster | Ohio public records | Contractor name, license number, certification class, NICET cross-walk, expiration |
| Ohio Secretary of State business search | Daily for filings + status delta | $0 — public records | Ohio public records | Entity name, entity number, registered agent, filing date, status |
| Google Places API | Quarterly enrichment after registry confirmation | ~$0.07–$0.10 per record at scale (Text Search + Place Details) | Google Maps Platform Terms of Service — display + attribution rules apply | Photos (reference URLs), aggregate rating, review count, hours of operation |
| Ohio county building-department permit feeds | Daily for live counties (Lucas, Hancock); other counties variable | $0 — public records | Per-county public-records posture | Permit number, issued date, address, contractor of record, project class |
Pipeline steps
The six steps below are the production pipeline. Each step runs against the next; no step is optional. Every step is documented internally with the same prose published here.
1. Discovery
Every contractor record enters the pipeline through a state registry — OCILB, ODA, ODH, SFM, or the Ohio Secretary of State business search. The licensed-trade record is the canonical identity. Google Places is used for discovery only when supplementing breadth in non-licensed categories (roofing, concrete, tree service, appliance repair, restoration) where the substitute-verification stack documented at /research/ohio-licensing-moat-2026 applies. State-agency direct pulls are documented in /research/free-public-records-moat-2026.
2. Dedup
Each candidate record runs through three dedup keys — normalised business name, license number (when present), and slug. Slug normalisation collapses whitespace, lowercases, strips suffixes (LLC, Inc., Co.), and reconciles common alternates (Heating & Cooling vs Heating and Cooling). Records that match an existing slug are merged with conflict resolution favouring the state-registry source over the commercial source. The /research/directory-data-quality-2026 audit documents the failure modes — ghost businesses, lapsed registrations, duplicate slugs — the dedup pass catches.
3. Enrichment
Once a contractor has been confirmed in a state registry, Google Places is queried to attach photos, aggregated rating, review count, and hours of operation. Records that do not surface in a state registry are not enriched — Places does not get to invent contractors the state has not licensed. Enrichment runs quarterly rather than on every refresh to control API spend; the /research/free-public-records-moat-2026 piece documents the cost-and-trust-signal trade-off.
4. Verification
Every licensed-trade record is cross-checked against the originating state registry — the OCILB eLicense Center for plumbing/HVAC/electrical/hydronics, ODA for pesticide applicators, ODH for lead abatement / well drilling / septic, the State Fire Marshal for fire protection, and the Secretary of State for the business entity. The verification step writes the live status (active, expired, suspended, revoked) into /api/license-evidence.json. The methodology is documented at /methodology and /verification; homeowners can run the same lookup via /verify.
5. Queue
Verified records land in data/queue/*.json as draft records pending human review. This is a deliberate gate, not a vestige — every contractor goes through manual eyeball before publication. The queue carries the state-registry evidence, the Places enrichment payload (if any), the dedup-decision log, and the verification status. The reviewer signs off in the queue file with a timestamp before the record promotes to production.
6. Publish
Once the manual review gate signs off, the record promotes from data/queue into the live dataset and onto profixdirectory.com. The publication step writes JSON-LD onto the /pro/{slug} page, mirrors the record into /api/pros.json and /api/all.json, attaches the state-registry evidence to /api/license-evidence.json, and emits a verification-delta entry in /api/verification-feed.json. Sitemap entries refresh on the next build. The contractor's slug becomes part of the public catalog at /pros and the trade hub at /trades/{trade}.
Manual review gate
Every queued contractor is reviewed by a human before publication. The reviewer confirms three things: that the state-registry evidence matches the business entity claimed; that the Places enrichment (if any) attaches to the correct entity rather than a same-named neighbour; and that the trade classification matches the contractor's actual scope of work, not just the registry's category code. Sign-off writes a timestamped reviewer-id into the queue file before promotion. Records that fail manual review either get re-queued for additional registry evidence or get rejected with a reason code recorded in the queue.
The manual gate is a deliberate cost. Fully-automated publication would be faster and cheaper — but the directory-data-quality audit at /research/directory-data-quality-2026 documents the failure modes (ghost businesses, dead phones, lapsed licences, duplicate slugs) that the gate catches before they reach homeowners. ProFix Editorial Team treats the manual gate as a load-bearing trust signal, not a temporary measure.
Refresh cadence
ProFix runs four refresh tiers. The cadence below is honest about which surfaces refresh daily, weekly, monthly, or quarterly — and why each cadence was picked. The aspirational targets are documented at /methodology when they differ from current production.
Daily
OCILB status deltas (active/expired/suspended/revoked). Ohio Secretary of State filings + status changes. Live county permit feeds (Lucas, Hancock). Outage-status document at /api/outage-status.
Weekly
New OCILB entries (new license issuances). New OCILB disciplinary actions. New permit pulls aggregated into the per-trade and per-county leaderboards at /permits-leaderboard. Trust-score recomputation per profile.
Monthly
Hugging Face dataset republication. ODA / ODH / SFM bulk roster refresh. Verification-feed compaction. /api/quality-stats.json and /api/coverage-stats.json snapshot.
Quarterly
Google Places enrichment refresh (photos, ratings, hours). Cost-guide repricing. Per-source provenance audit against /sources and /api/sources.json.
What we don't scrape
The list below is the explicit set of surfaces ProFix declines to ingest. Each item is a deliberate editorial decision rather than an engineering omission. Where ProFix links to a third party (Yelp, BBB, Angi profiles) but does not aggregate the underlying content, the link is the integration point — not the data import.
- Yelp review text — ProFix links to public profile pages but does not republish review prose. Yelp's terms-of-service plus the FTC's 2024 fake-reviews rule make wholesale ingestion of review text a load-bearing legal and editorial risk.
- BBB complaint narratives — ProFix links to BBB profile pages but does not aggregate the underlying complaint text. The /vs/bbb comparison page documents the editorial reasoning.
- Contractor complaint records from the Ohio Attorney General — ProFix links to the AG's consumer-protection enforcement pages but does not republish complainant or respondent narratives.
- Homeowner-submitted lead form payloads — these never enter the public dataset and never leave the routing pipeline. /api/lead-feed.json publishes only aggregate metrics with zero PII.
- Photos scraped from third-party sites without explicit licence. ProFix uses Google Places photo references (with attribution per the Maps Platform terms) and accepts contractor-submitted photos with explicit consent. We do not scrape Yelp, Angi, Thumbtack, HomeAdvisor, or Nextdoor photography.
- Phone numbers cross-walked to personal devices. ProFix carries the business contact line as published in the state registry or the contractor's own listing; we do not look up personal mobile numbers via reverse-lookup or data-broker services.
Tools in /tools/
The pipeline above is implemented as a set of TypeScript scripts under tools/ in the ProFix repository. Each tool is the canonical authority for one source or one pipeline step. The registry-pulls workstream and the enrichment workstream run in parallel; the queue and publish steps are coordinated from a single reviewer console.
tools/scrape-toledo.tsOriginal Toledo metro discovery scraper. Combines Google Places Text Search with Hancock/Lucas county building-department queries.
tools/scrape-metro.tsParameterised metro scraper — Cleveland, Cincinnati, Columbus, Dayton, Findlay, Toledo, and the rural-county batches. Drives statewide Places enrichment after a registry-tier seed.
tools/scrape-ocilb.tsOCILB eLicense Center direct pull. The canonical identity source for plumbing, HVAC, electrical, and hydronics contractors. Writes license_number + status + expiration into the queue. (Built by the registry-pulls workstream — referenced here, not authored here.)
tools/scrape-oda-pest.tsOhio Department of Agriculture pesticide-applicator roster pull. Categories: general pest, termite, lawn, wood-destroying organism. (Built by the registry-pulls workstream — referenced here, not authored here.)
tools/scrape-odh-water-well.tsODH Private Water Systems roster pull — well drillers and pump installers. (Built by the registry-pulls workstream — referenced here, not authored here.)
tools/scrape-sfm-fire-protection.tsOhio State Fire Marshal Bureau of Fire Prevention roster pull — sprinkler, alarm, suppression, NICET cross-walk. (Built by the registry-pulls workstream — referenced here, not authored here.)
How to contribute a missing source
Spotted a public registry ProFix should pull from but doesn't? An Ohio agency with a roster we missed? A county building department with a downloadable permit feed? Use the contact form at /contact and include the source URL, the refresh cadence you observed, and the public-records license posture. ProFix Editorial Team reviews source-addition requests on a weekly cycle.
Journalists and state agency-relations contacts: the methodology is documented under CC BY 4.0, so reproducing the pipeline in another state requires no permission — only attribution. The portable playbook lives in /research/free-public-records-moat-2026.
License
This methodology page is published under CC BY 4.0. The underlying state-registry records remain Ohio public records under ORC 149.43; Google Places enrichment data remains subject to the Google Maps Platform Terms of Service. Cite ProFix Directory for the editorial assembly and the upstream agency for the underlying records.
Related ProFix surfaces: /methodology, /verification, /sources, /trust-score, and /data-sources. Machine-readable feeds: /api/sources.json, /api/license-evidence.json, /api/verification-feed.json.