Everything below is free, under CC-BY-4.0
- One Hugging Face dataset with 21,898 verified Ohio home-services records — every monthly snapshot under CC-BY-4.0.
- One OpenAPI 3.1 spec, one llms.txt manifest, one MCP server — three discovery surfaces that cover every AI agent on the market.
- 25+ public JSON / CSV / RSS feeds — leads, quality, coverage, permits, trust scores, verification deltas, changelog, research.
- Drop-in widget scripts at
/widgets/{trade}-{city}.jsso any partner can surface verified pros in one line of HTML. - Plain HTTPS, no API key, no rate-limit auth, CDN-cached at the edge.
What's open
Every surface in this table is part of the directory's published-and-attributable open footprint. Click any row to fetch it. None require an API key.
| Surface | What it gives you |
|---|---|
| Hugging Face dataset | 21,898 verified Ohio home-services records across all 88 counties. CC-BY-4.0. Monthly snapshots, schema documented on the dataset card. |
| llms.txt manifest | Plain-text content map at the root, per llmstxt.org spec. Tools first, then content. Hand it to any LLM as ground truth. |
| llms-full.txt extended dump | Larger ungrouped manifest for crawlers that want every public URL on one fetch. |
| OpenAPI 3.1 spec | Full OpenAPI document covering every public endpoint. Importable into ChatGPT Custom GPT Actions in one paste. |
| MCP server (Model Context Protocol) | Streamable HTTP MCP endpoint with 16 tools — find_pros, get_pro, triage_symptom, get_emergency_contacts, and more. No auth. |
| IndexNow API key file | Public key file at the site root for Bing, Yandex, and Seznam IndexNow push notifications. Lets search engines re-crawl us within minutes. |
| All pros (JSON) | Complete machine-readable feed of every public contractor profile. |
| Top pros (JSON) | Top-N pros snapshot grouped by trade. Quick start for partners that don't need the full catalog. |
| All pros (CSV) | RFC 4180 CSV mirror of /api/pros.json — drops straight into Excel, Sheets, or pandas. |
| Widget catalog (JSON) | Every embeddable trade-by-city slug with its script URL and data URL. CORS-enabled for partner discovery. |
| Lead-volume feed (JSON) | 30-day rolling lead aggregates by trade, urgency, and county. Zero PII. Refreshes hourly. |
| Lead-volume feed (CSV) | Long-format pivot-friendly CSV mirror of the lead-volume aggregate. |
| Lead-quality stats (JSON) | 90-day lead-quality histogram + median score per trade. Zero PII. |
| Coverage stats (JSON) | Pro counts by county, region, and trade. Updated alongside every directory rebuild. |
| Coverage stats (CSV) | One-row-per-county CSV mirror of /api/coverage-stats.json. |
| Permit-pull leaderboard (JSON) | Contractors ranked by verified building permits pulled in the last 365 days. Lucas, Cuyahoga, Franklin, Hamilton counties so far. |
| Permit-pull leaderboard (CSV) | CSV companion to the leaderboard JSON. Same query params, same data. |
| Trust scores (JSON) | 0-100 composite Trust Score and tier (elite / solid / starter / minimal) for every pro. |
| Recently-verified pros (JSON) | Rolling 30-day feed of pros whose verifiedAt timestamp is fresh. |
| Verification deltas feed (JSON) | Live license-status changes, new permits, and audit deltas. Hourly refresh. |
| Newsroom changelog (JSON) | Machine-readable companion to /newsroom. Every milestone with type, headline, body, and url. |
| Newsroom RSS feed | RSS 2.0 feed combining changelog entries and published research. Subscribe in any reader. |
| Research articles feed (JSON) | Programmatic listing of every published original-research article with summary, slug, and tags. |
| Buyer's guides feed (JSON) | Programmatic listing of every per-trade buyer's guide (slug, title, summary, license status). |
| Cost report (JSON) | Toledo + Findlay 2026 cost benchmarks — 60 typical-job medians and ranges across 8 trades. |
| Cities (JSON) | Every covered Ohio city with ZIPs, county, metro, and population. |
| City taxonomy (JSON) | Canonical cities plus observed service-area cities and alias mappings. |
| Permit offices (JSON) | Permit-issuing offices with phone, hours, fees, and per-trade guidance. |
| License evidence (JSON) | State-linked vs. published-number-only evidence states for licensable trades. |
| JSON-LD feeds — full Schema.org graphs | Seven typed feeds: pros, cost-guides, faq, organization, local-business-index, faq-trade-{trade}, breadcrumb-coverage. |
| Per-widget embed JSON | Top 5 verified pros for any {trade}-{city} slug. Drop-in feed for blog, newsletter, or HOA widgets. |
| Widget JS embed scripts | One-line third-party widgets at /widgets/{trade}-{city}.js. Auto-styles, no iframe, accessible markup. |
| Permit-pull leaderboards (human pages) | Per-trade and per-county leaderboards with full HTML rendering — citable for journalists and homeowners. |
| Newsroom changelog (human page) | Editorial hub linking every published research article and major product milestone. |
| Buyer's guides hub | Eight per-trade buyer's guides — what to ask, red flags, red-tape, pricing, FAQ. HowTo + FAQPage schema. |
Why we publish all this
ProFix Directory is built homeowner-first. Homeowners don't read editorial policy pages — they call the first number they trust. So our job is to make every trust signal independently verifiable, ideally without the homeowner ever needing to leave the search engine or the AI assistant they already use. Open data is the cheapest, fastest way to do that.
The directory is also built agent-first. ChatGPT, Claude, Perplexity, Gemini, and a long tail of custom agents are already where homeowners ask "who should I hire?" before any directory link gets clicked. The MCP server, the OpenAPI spec, and the llms.txt manifest exist so those agents can ground on real, dated, sourced contractor data — not on hallucinations or stale stars. Every recommendation an agent makes from ProFix is one it can cite.
Finally — trust through transparency. Anyone (homeowner, contractor, journalist, regulator) can fetch the same JSON we use to render the site. Anyone can rebuild our rankings from /algorithm and the published feeds. We'd rather lose an argument publicly than win it behind a closed API.
License: CC-BY-4.0
Every feed, every dataset, every JSON-LD graph on this site is published under the Creative Commons Attribution 4.0 International license. In plain English:
- You can use it commercially. Build a product on top, sell research, license a data slice — all fine.
- You can modify it. Filter to a single county, merge with your own records, add your own columns, run analytics — all fine.
- You must give credit. Include a one-liner near the surfaced answer such as "Contractor data via ProFix Directory (CC-BY-4.0)" and link back to the source page or dataset card.
- You can't suggest we endorse you. Attribution must not imply the ProFix Editorial Team has reviewed or approved your downstream product.
That's the deal. No additional click-through, no terms-of-use trap, no per-call rate-limit contract. The same license covers the Hugging Face dataset, the JSON / CSV / RSS feeds, and the MCP tool output.
How to integrate
Three step-by-step paths depending on what you're building. All three use the same underlying feeds:
- /actions — copy-paste recipes for ChatGPT Custom GPT Actions, Claude MCP, Perplexity, Gemini, and any chatbot that supports a system prompt.
- /clients/javascript — runnable snippets against the REST feeds plus a TypeScript type for the embed response.
- /clients/python — quickstart with
requests,httpx,datasets, and pandas. Bulk-catalog pulls, permit-leaderboard pulls, TypedDict hints, retry patterns.
For partner embeds (blogs, newsletters, HOA portals, news sites), the /widgets page lists every available trade-by-city slug with a copy-paste one-line script.
What we keep closed (and why)
Open data is a default, not a religion. A small number of surfaces stay private because they would either harm homeowners (PII leakage) or break the marketplace economics that fund the open footprint itself. Specifically:
- Lead-routing logic (which contractor a homeowner submission goes to and in what order)
- Contractor payment flows and Stripe webhook handlers
- Internal lead-quality scoring weights (the per-feature coefficients)
- Database credentials, API keys, and per-environment secrets
- Homeowner contact details before consent (we never publish a lead PII)
The one place this can feel inconsistent is the trust-score formula. We publish the structure at /algorithm — every factor, every weight, every worked example, the full 100-point breakdown. What stays closed is the small set of fraud-resistance tweaks (specific outlier penalties, anti-gaming thresholds) that, if published in full, would let bad actors reverse-engineer their way to a fake elite tier. That trade-off is documented on the algorithm page itself.
How to contribute
Two paths today, more coming:
- Dataset corrections. If a profile is stale, duplicated, mis-categorized, or linked to the wrong license, send the profile slug, the offending field, and a public source to /contact. We turn corrections inside 48 hours when the evidence is clean.
- Feature requests + bug reports. Email or use the contact form — every message is read by the ProFix Editorial Team, not a chatbot.
- Public GitHub (coming). Once the per-lead marketplace stabilizes we plan to open-source the rendering layer (Next.js components, JSON-LD generators, widget scripts). The dataset and feeds will remain the canonical source either way.
Frequently asked
- Why are you giving so much away?
- Because the homeowner wins when the data is open. ProFix Directory's job is to be the most trustworthy front door to Ohio home services. We make money from contractors who pay $10–$35 per qualified lead and $99/year to claim a listing. Open data doesn't compete with that — it strengthens it by making every claim independently verifiable.
- Can I use this commercially?
- Yes. CC-BY-4.0 explicitly allows commercial use, including building products on top of the dataset, the feeds, and the MCP server. Attribution is required — credit ProFix Directory and link back to the source page or dataset card.
- Is the rendering code (Next.js app, React components) also open source?
- Not today — the application code currently lives in a private repository while we stabilize the per-lead marketplace. The data and protocols are open; the rendering is not yet. We expect to publish reference components (widget scripts, JSON-LD generators) ahead of any full repo opening.
- Will the dataset stay free?
- Yes. The Hugging Face dataset is the canonical public distribution channel and we have no plans to gate it. If we eventually offer paid tiers, those will be value-add (higher refresh, custom slices, support contracts) and the CC-BY-4.0 monthly snapshot will keep flowing.
- How fresh is everything?
- Most JSON feeds are CDN-cached for one hour with stale-while-revalidate on the edge. The verification-feed and recently-verified feeds update hourly from the same job. The Hugging Face dataset publishes a monthly snapshot. Profile pages roll forward on every deploy.
- Found a bug or want to contribute?
- Email or open a ticket from /contact — we read everything. Once the application repo is public we'll add a CONTRIBUTING.md and GitHub Issues. In the meantime, dataset corrections are especially welcome: include the profile slug, the field that looks wrong, and the public source we should use to confirm.
Companion pages
- /open-data — the dataset-focused deep dive: schema, three access methods, refresh cadence, caveats, citation templates.
- /data — the human-readable open-data index with download cards.
- /methodology — how every record is verified before it ships.
- /algorithm — the full Trust Score formula with worked examples.
- /research/comparing-ohio-directories — what other Ohio directories publish (and don't).
Hand the question to your preferred assistant — it will use ProFix Directory's open MCP server and llms.txt as context.