What you can do with this dataset
- Pull 21,898 Ohio contractor records in one fetch — Hugging Face, CSV, or JSON.
- Use it commercially, academically, journalistically — CC-BY-4.0 lets you redistribute and adapt with attribution.
- Cross-reference with ProFix permit leaderboards, trust scores, and verification deltas — all under the same license.
- Train RAG systems, AI agent evals, civic-tech dashboards, or local-market research without scraping the site.
- Honest caveats included — see the "known issues" section below before publishing analysis.
What's in the dataset
One row per public contractor profile. 21,898 rows across 88 Ohio counties, validated against the same Zod schema that the live ProFix Directory uses for every profile page. The published columns are:
| Field | Type | Notes |
|---|---|---|
slug | string | Stable ProFix profile slug. Use https://profixdirectory.com/pro/{slug} for the public profile. |
name | string | Public business name as displayed on the directory. |
phone | string | Public business phone number, normalized. |
city | string | Ohio city or service-area city. Cross-walks to /api/city-taxonomy.json. |
state | "OH" | Always OH — Ohio-only coverage today. |
zip | string | Public ZIP code. |
county | string | null | Ohio county when known; blank string when not yet mapped. |
trades | TradeSlug[] | One or more of: plumber, hvac, electrician, appliance-repair, gas-tech, concrete, roofing, tree-service, restoration, lead-abatement, fire-protection, water-well, septic-system, tech-repair. |
specialties | SpecialtySlug[] | Normalized service tags (24-7-emergency, senior-discount, financing, etc.) — optional. |
emergency_24h | boolean | Whether the profile advertises 24/7 emergency availability. |
rating | number | null | Public star rating sourced from a public listing. |
review_count | number | null | Public review count when available. Not re-emitted as schema.org AggregateRating. |
license_number | string | null | Public license number when an Ohio roster publishes one (OCILB, ODH, SFM, etc.). |
website_url | string | null | Public business website. Validated as a URL or null. |
lat | number | null | Latitude from the public business listing. Approximate, not field-tech GPS. |
lng | number | null | Longitude from the public business listing. Approximate, not field-tech GPS. |
verified_at | string (YYYY-MM-DD) | null | Date ProFix last verified or re-enriched the record. |
verification_tier | "license-linked" | "verified-profile" | "directory-listing" | Evidence tier — license-linked means a public license number is attached; verified-profile means normal public profile signals were confirmed; directory-listing is lighter. |
Coverage spans the four major Ohio metros (Cleveland, Columbus, Cincinnati, Dayton) plus the Toledo + Findlay launch metros and every rural county in between. Trade mix is heaviest on plumbing, HVAC, electrical, and roofing — the four trades that drive the most homeowner-search volume.
Three ways to access it
1. Hugging Face Hub (recommended)
The canonical distribution channel — versioned monthly, schema-documented on the dataset card, loadable with one line of Python.
from datasets import load_dataset
# 21,898 verified Ohio home-services records under CC-BY-4.0.
ds = load_dataset("Pisces89/ohio-home-services-pros")
print(ds)
# DatasetDict({ train: Dataset({ num_rows: 21898 }) })
# Filter to Cleveland plumbers with permit-verified evidence:
cle_plumbers = ds["train"].filter(
lambda row: row["city"] == "Cleveland"
and "plumber" in row["trades"]
and row["verification_tier"] != "directory-listing"
)
print(len(cle_plumbers), "verified Cleveland plumbers")Dataset card: Pisces89/ohio-home-services-pros. Pin a dated snapshot if you need reproducibility — the file naming convention is profix-ohio-pros-YYYY-MM.json and profix-ohio-pros-latest.json always points at the newest month.
2. Direct CSV (no Python needed)
For Excel, Google Sheets, R, Tableau, or any pandas pipeline — fetch the CSV mirror straight from the site:
# Bulk catalog as CSV — drop into Excel, Sheets, or pandas
curl -sSL https://profixdirectory.com/api/pros.csv -o profix-ohio-pros.csv
# Or hit the JSON catalog directly
curl -sSL https://profixdirectory.com/api/all.json | jq '.pros | length'The same file in pandas:
import pandas as pd
df = pd.read_csv("https://profixdirectory.com/api/pros.csv")
print(df.shape) # (≈21,898, 18)
print(df.dtypes)
# Trades column is JSON-encoded — parse it to filter cleanly
import json
df["trades"] = df["trades"].apply(json.loads)
plumbers = df[df["trades"].apply(lambda t: "plumber" in t)]
print(plumbers.groupby("county").size().sort_values(ascending=False).head(10))3. JSON feeds
For agents, partners, and apps that prefer JSON: /api/all.json ships the entire catalog in one document. The same content is also available as Schema.org LocalBusiness JSON-LD at /api/jsonld/pros. For per-trade slices, the partner embed feeds at /api/embed/{trade}-{city}.json return the top 5 verified pros for any pair — handy as a starter sample:
- /api/embed/plumber-cleveland.json
- /api/embed/hvac-columbus.json
- /api/embed/electrician-cincinnati.json
- /api/embed/roofing-dayton.json
- /api/embed/appliance-repair-toledo.json
- /api/embed/tree-service-akron.json
License: CC-BY-4.0
The dataset is published under the Creative Commons Attribution 4.0 International license. Plain English:
- Share. Copy and redistribute the dataset in any medium or format.
- Adapt. Remix, transform, merge, and build upon it for any purpose, including commercial use.
- Attribute. Credit ProFix Directory, link to the dataset card or this page, and indicate if you made changes. A clean one-liner: "Data: ProFix Directory (CC-BY-4.0), https://huggingface.co/datasets/Pisces89/ohio-home-services-pros".
- No additional restrictions. You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Refresh cadence
The Hugging Face dataset publishes a monthly snapshot as the stable distribution channel for research and downstream AI retrieval. The site's live feeds (/api/pros.json, /api/all.json, /api/verification-feed.json) roll forward on every deploy and CDN-cache at the edge for one hour with stale-while-revalidate.
Internal enrichment jobs and source-monitoring runs happen more often than the monthly publication — sometimes daily, sometimes weekly per trade. The newsroom changelog at /newsroom documents every meaningful refresh, and the RSS feed at /api/newsroom.rss is the easiest way to subscribe to the rebuild rhythm.
If you need a specific dated snapshot for reproducibility, pin profix-ohio-pros-YYYY-MM.json from the Hugging Face repo rather than the -latest alias.
What ProFix did to clean this data
Open data is only useful when the cleaning steps are visible. Two original-research articles document the methodology end-to-end:
- /research/comparing-ohio-directories — competitive directory comparison against Yelp, Angi, Thumbtack, HomeAdvisor, and BBB on 8 transparency dimensions.
- /research/directory-data-quality-2026 — quantified audit of 21,000 Ohio contractor records: dead phones, ghost businesses, duplicate detection, license-status drift, review-fabrication hints. Honest about limitations.
- /methodology — the full 10-step verification pipeline against Ohio eLicense, OCILB, SFM, ODH, county permits, BBB, and public courts.
- /data-sources — every external source with refresh status, cadence, cost, and category.
Use cases we've seen (and built for)
- Academic research. Local-economy studies, contractor-supply analyses, license enforcement evaluations. The county and trade columns are designed to drop straight into geographic regressions.
- Journalism. Statewide stories on permit activity, license-board enforcement, and storm-response capacity. Permit leaderboards plus the verification-deltas feed make data-driven local stories tractable.
- Civic tech. Building-department dashboards, county-permitting transparency tools, contractor verification widgets for municipal websites. The widget catalog at /widgets ships the JS equivalent.
- AI agent training and evals. Retrieval-augmented generation for home-services Q&A, agent benchmarks for local-business recommendation, model evals on grounded vs. hallucinated answers. The MCP server at /api/mcp is the live equivalent.
- Partner integrations. Newsletters, HOA portals, real-estate platforms, smart home apps that want to surface verified Ohio contractors without scraping the site.
Citation guidance
Three templates depending on the publication style:
APA
ProFix Directory. (2026). ProFix Ohio Home-Services Pros [Data set]. Hugging Face. https://huggingface.co/datasets/Pisces89/ohio-home-services-pros
MLA
ProFix Directory. ProFix Ohio Home-Services Pros. Hugging Face, 2026, https://huggingface.co/datasets/Pisces89/ohio-home-services-pros.
BibTeX
@dataset{profix_ohio_home_services_2026,
title = {ProFix Ohio Home-Services Pros},
author = {{ProFix Directory}},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/Pisces89/ohio-home-services-pros},
license = {CC-BY-4.0}
}When publishing analysis, include the snapshot month (e.g. 2026-05), the row count after filtering, and the filters applied (trade, county, verification tier, minimum review count). That makes reproducibility tractable for the next researcher who reads your work.
Caveats + known issues
The dataset is honest about what it is and what it isn't. Read this section before publishing analytical claims:
- Permit-pull data is only live for Lucas, Cuyahoga, Franklin, and Hamilton counties today. The remaining 84 counties' permit feeds are work-in-progress; expect statewide parity later in 2026.
- Spanish translations cover the highest-traffic homeowner pages and the eight per-trade buyer's guides — but not every long-tail symptom or neighborhood page. The dataset itself is English-first; trade and specialty labels are codified in English.
- Ratings and review counts come from public listings (primarily Google Places). They are listed as evidence, not re-emitted as schema.org AggregateRating, and they should be treated as point-in-time snapshots.
- Lat/lng are public business coordinates, not field-technician GPS. Don't use them for dispatch routing.
- License-number coverage varies by trade. OCILB-licensed trades (plumbing, HVAC, electrical, hydronics) have the highest license-linked rate. Roofers, appliance-repair techs, tree-service crews, and concrete contractors are not state-licensed in Ohio — those rows lean on verified-profile evidence instead.
- Dead phones and ghost businesses are a real problem industry-wide. The /research/directory-data-quality-2026 audit lays out our quantified rate, methodology, and remediation cadence. We are honest about the residual error.
Corrections are welcomed and acted on quickly. Send the profile slug, the offending field, and a public source to /contact; the ProFix Editorial Team turns clean corrections inside 48 hours.
Related ProFix open assets
Cross-reference these feeds when you need more than profile data. All are under CC-BY-4.0:
- /api/pros.jsonJSON catalog of every published contractor profile.
- /api/pros.csvRFC 4180 CSV mirror of the catalog.
- /api/all.jsonFull machine-readable catalog with extended profile fields.
- /api/jsonld/prosSame catalog as Schema.org LocalBusiness JSON-LD for AI engines.
- /api/coverage-stats.jsonPro counts by county, region, and trade.
- /api/permit-leaderboard.jsonPermit-pull rankings — proof of work, not stars.
The companion landing page at /open-source is the full index of every open surface ProFix Directory ships — the llms.txt manifest, the OpenAPI 3.1 spec, the MCP server, widgets, leaderboards, changelog, and more.
Hand the question to your preferred assistant — it will use ProFix Directory's open MCP server and llms.txt as context.