Ohio home-services data — free for anyone to use

21,898 verified Ohio home-services contractor records across all 88 counties. CC-BY-4.0, monthly refresh, schema documented, citation templates included.

TL;DR

What you can do with this dataset

  • Pull 21,898 Ohio contractor records in one fetch — Hugging Face, CSV, or JSON.
  • Use it commercially, academically, journalistically — CC-BY-4.0 lets you redistribute and adapt with attribution.
  • Cross-reference with ProFix permit leaderboards, trust scores, and verification deltas — all under the same license.
  • Train RAG systems, AI agent evals, civic-tech dashboards, or local-market research without scraping the site.
  • Honest caveats included — see the "known issues" section below before publishing analysis.

What's in the dataset

One row per public contractor profile. 21,898 rows across 88 Ohio counties, validated against the same Zod schema that the live ProFix Directory uses for every profile page. The published columns are:

FieldTypeNotes
slugstringStable ProFix profile slug. Use https://profixdirectory.com/pro/{slug} for the public profile.
namestringPublic business name as displayed on the directory.
phonestringPublic business phone number, normalized.
citystringOhio city or service-area city. Cross-walks to /api/city-taxonomy.json.
state"OH"Always OH — Ohio-only coverage today.
zipstringPublic ZIP code.
countystring | nullOhio county when known; blank string when not yet mapped.
tradesTradeSlug[]One or more of: plumber, hvac, electrician, appliance-repair, gas-tech, concrete, roofing, tree-service, restoration, lead-abatement, fire-protection, water-well, septic-system, tech-repair.
specialtiesSpecialtySlug[]Normalized service tags (24-7-emergency, senior-discount, financing, etc.) — optional.
emergency_24hbooleanWhether the profile advertises 24/7 emergency availability.
ratingnumber | nullPublic star rating sourced from a public listing.
review_countnumber | nullPublic review count when available. Not re-emitted as schema.org AggregateRating.
license_numberstring | nullPublic license number when an Ohio roster publishes one (OCILB, ODH, SFM, etc.).
website_urlstring | nullPublic business website. Validated as a URL or null.
latnumber | nullLatitude from the public business listing. Approximate, not field-tech GPS.
lngnumber | nullLongitude from the public business listing. Approximate, not field-tech GPS.
verified_atstring (YYYY-MM-DD) | nullDate ProFix last verified or re-enriched the record.
verification_tier"license-linked" | "verified-profile" | "directory-listing"Evidence tier — license-linked means a public license number is attached; verified-profile means normal public profile signals were confirmed; directory-listing is lighter.

Coverage spans the four major Ohio metros (Cleveland, Columbus, Cincinnati, Dayton) plus the Toledo + Findlay launch metros and every rural county in between. Trade mix is heaviest on plumbing, HVAC, electrical, and roofing — the four trades that drive the most homeowner-search volume.

Three ways to access it

1. Hugging Face Hub (recommended)

The canonical distribution channel — versioned monthly, schema-documented on the dataset card, loadable with one line of Python.

from datasets import load_dataset

# 21,898 verified Ohio home-services records under CC-BY-4.0.
ds = load_dataset("Pisces89/ohio-home-services-pros")
print(ds)
# DatasetDict({ train: Dataset({ num_rows: 21898 }) })

# Filter to Cleveland plumbers with permit-verified evidence:
cle_plumbers = ds["train"].filter(
    lambda row: row["city"] == "Cleveland"
    and "plumber" in row["trades"]
    and row["verification_tier"] != "directory-listing"
)
print(len(cle_plumbers), "verified Cleveland plumbers")

Dataset card: Pisces89/ohio-home-services-pros. Pin a dated snapshot if you need reproducibility — the file naming convention is profix-ohio-pros-YYYY-MM.json and profix-ohio-pros-latest.json always points at the newest month.

2. Direct CSV (no Python needed)

For Excel, Google Sheets, R, Tableau, or any pandas pipeline — fetch the CSV mirror straight from the site:

# Bulk catalog as CSV — drop into Excel, Sheets, or pandas
curl -sSL https://profixdirectory.com/api/pros.csv -o profix-ohio-pros.csv

# Or hit the JSON catalog directly
curl -sSL https://profixdirectory.com/api/all.json | jq '.pros | length'

The same file in pandas:

import pandas as pd

df = pd.read_csv("https://profixdirectory.com/api/pros.csv")
print(df.shape)         # (≈21,898, 18)
print(df.dtypes)

# Trades column is JSON-encoded — parse it to filter cleanly
import json
df["trades"] = df["trades"].apply(json.loads)
plumbers = df[df["trades"].apply(lambda t: "plumber" in t)]
print(plumbers.groupby("county").size().sort_values(ascending=False).head(10))

3. JSON feeds

For agents, partners, and apps that prefer JSON: /api/all.json ships the entire catalog in one document. The same content is also available as Schema.org LocalBusiness JSON-LD at /api/jsonld/pros. For per-trade slices, the partner embed feeds at /api/embed/{trade}-{city}.json return the top 5 verified pros for any pair — handy as a starter sample:

License: CC-BY-4.0

The dataset is published under the Creative Commons Attribution 4.0 International license. Plain English:

Refresh cadence

The Hugging Face dataset publishes a monthly snapshot as the stable distribution channel for research and downstream AI retrieval. The site's live feeds (/api/pros.json, /api/all.json, /api/verification-feed.json) roll forward on every deploy and CDN-cache at the edge for one hour with stale-while-revalidate.

Internal enrichment jobs and source-monitoring runs happen more often than the monthly publication — sometimes daily, sometimes weekly per trade. The newsroom changelog at /newsroom documents every meaningful refresh, and the RSS feed at /api/newsroom.rss is the easiest way to subscribe to the rebuild rhythm.

If you need a specific dated snapshot for reproducibility, pin profix-ohio-pros-YYYY-MM.json from the Hugging Face repo rather than the -latest alias.

What ProFix did to clean this data

Open data is only useful when the cleaning steps are visible. Two original-research articles document the methodology end-to-end:

Use cases we've seen (and built for)

Citation guidance

Three templates depending on the publication style:

APA

ProFix Directory. (2026). ProFix Ohio Home-Services Pros [Data set]. Hugging Face. https://huggingface.co/datasets/Pisces89/ohio-home-services-pros

MLA

ProFix Directory. ProFix Ohio Home-Services Pros. Hugging Face, 2026, https://huggingface.co/datasets/Pisces89/ohio-home-services-pros.

BibTeX

@dataset{profix_ohio_home_services_2026,
  title  = {ProFix Ohio Home-Services Pros},
  author = {{ProFix Directory}},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/datasets/Pisces89/ohio-home-services-pros},
  license = {CC-BY-4.0}
}

When publishing analysis, include the snapshot month (e.g. 2026-05), the row count after filtering, and the filters applied (trade, county, verification tier, minimum review count). That makes reproducibility tractable for the next researcher who reads your work.

Caveats + known issues

The dataset is honest about what it is and what it isn't. Read this section before publishing analytical claims:

Corrections are welcomed and acted on quickly. Send the profile slug, the offending field, and a public source to /contact; the ProFix Editorial Team turns clean corrections inside 48 hours.

Related ProFix open assets

Cross-reference these feeds when you need more than profile data. All are under CC-BY-4.0:

The companion landing page at /open-source is the full index of every open surface ProFix Directory ships — the llms.txt manifest, the OpenAPI 3.1 spec, the MCP server, widgets, leaderboards, changelog, and more.

Ask your AI about this

Hand the question to your preferred assistant — it will use ProFix Directory's open MCP server and llms.txt as context.

Emergency