ProFix Directory is open — what we give away free

The dataset, the API spec, the agent server, the feeds, the embeds, the leaderboards — all of it is public, machine-readable, and licensed under CC-BY-4.0. Here is the complete index, the philosophy behind it, and the short list of what we keep closed.

TL;DR

Everything below is free, under CC-BY-4.0

  • One Hugging Face dataset with 21,898 verified Ohio home-services records — every monthly snapshot under CC-BY-4.0.
  • One OpenAPI 3.1 spec, one llms.txt manifest, one MCP server — three discovery surfaces that cover every AI agent on the market.
  • 25+ public JSON / CSV / RSS feeds — leads, quality, coverage, permits, trust scores, verification deltas, changelog, research.
  • Drop-in widget scripts at /widgets/{trade}-{city}.js so any partner can surface verified pros in one line of HTML.
  • Plain HTTPS, no API key, no rate-limit auth, CDN-cached at the edge.

What's open

Every surface in this table is part of the directory's published-and-attributable open footprint. Click any row to fetch it. None require an API key.

SurfaceWhat it gives you
Hugging Face dataset21,898 verified Ohio home-services records across all 88 counties. CC-BY-4.0. Monthly snapshots, schema documented on the dataset card.
llms.txt manifestPlain-text content map at the root, per llmstxt.org spec. Tools first, then content. Hand it to any LLM as ground truth.
llms-full.txt extended dumpLarger ungrouped manifest for crawlers that want every public URL on one fetch.
OpenAPI 3.1 specFull OpenAPI document covering every public endpoint. Importable into ChatGPT Custom GPT Actions in one paste.
MCP server (Model Context Protocol)Streamable HTTP MCP endpoint with 16 tools — find_pros, get_pro, triage_symptom, get_emergency_contacts, and more. No auth.
IndexNow API key filePublic key file at the site root for Bing, Yandex, and Seznam IndexNow push notifications. Lets search engines re-crawl us within minutes.
All pros (JSON)Complete machine-readable feed of every public contractor profile.
Top pros (JSON)Top-N pros snapshot grouped by trade. Quick start for partners that don't need the full catalog.
All pros (CSV)RFC 4180 CSV mirror of /api/pros.json — drops straight into Excel, Sheets, or pandas.
Widget catalog (JSON)Every embeddable trade-by-city slug with its script URL and data URL. CORS-enabled for partner discovery.
Lead-volume feed (JSON)30-day rolling lead aggregates by trade, urgency, and county. Zero PII. Refreshes hourly.
Lead-volume feed (CSV)Long-format pivot-friendly CSV mirror of the lead-volume aggregate.
Lead-quality stats (JSON)90-day lead-quality histogram + median score per trade. Zero PII.
Coverage stats (JSON)Pro counts by county, region, and trade. Updated alongside every directory rebuild.
Coverage stats (CSV)One-row-per-county CSV mirror of /api/coverage-stats.json.
Permit-pull leaderboard (JSON)Contractors ranked by verified building permits pulled in the last 365 days. Lucas, Cuyahoga, Franklin, Hamilton counties so far.
Permit-pull leaderboard (CSV)CSV companion to the leaderboard JSON. Same query params, same data.
Trust scores (JSON)0-100 composite Trust Score and tier (elite / solid / starter / minimal) for every pro.
Recently-verified pros (JSON)Rolling 30-day feed of pros whose verifiedAt timestamp is fresh.
Verification deltas feed (JSON)Live license-status changes, new permits, and audit deltas. Hourly refresh.
Newsroom changelog (JSON)Machine-readable companion to /newsroom. Every milestone with type, headline, body, and url.
Newsroom RSS feedRSS 2.0 feed combining changelog entries and published research. Subscribe in any reader.
Research articles feed (JSON)Programmatic listing of every published original-research article with summary, slug, and tags.
Buyer's guides feed (JSON)Programmatic listing of every per-trade buyer's guide (slug, title, summary, license status).
Cost report (JSON)Toledo + Findlay 2026 cost benchmarks — 60 typical-job medians and ranges across 8 trades.
Cities (JSON)Every covered Ohio city with ZIPs, county, metro, and population.
City taxonomy (JSON)Canonical cities plus observed service-area cities and alias mappings.
Permit offices (JSON)Permit-issuing offices with phone, hours, fees, and per-trade guidance.
License evidence (JSON)State-linked vs. published-number-only evidence states for licensable trades.
JSON-LD feeds — full Schema.org graphsSeven typed feeds: pros, cost-guides, faq, organization, local-business-index, faq-trade-{trade}, breadcrumb-coverage.
Per-widget embed JSONTop 5 verified pros for any {trade}-{city} slug. Drop-in feed for blog, newsletter, or HOA widgets.
Widget JS embed scriptsOne-line third-party widgets at /widgets/{trade}-{city}.js. Auto-styles, no iframe, accessible markup.
Permit-pull leaderboards (human pages)Per-trade and per-county leaderboards with full HTML rendering — citable for journalists and homeowners.
Newsroom changelog (human page)Editorial hub linking every published research article and major product milestone.
Buyer's guides hubEight per-trade buyer's guides — what to ask, red flags, red-tape, pricing, FAQ. HowTo + FAQPage schema.

Why we publish all this

ProFix Directory is built homeowner-first. Homeowners don't read editorial policy pages — they call the first number they trust. So our job is to make every trust signal independently verifiable, ideally without the homeowner ever needing to leave the search engine or the AI assistant they already use. Open data is the cheapest, fastest way to do that.

The directory is also built agent-first. ChatGPT, Claude, Perplexity, Gemini, and a long tail of custom agents are already where homeowners ask "who should I hire?" before any directory link gets clicked. The MCP server, the OpenAPI spec, and the llms.txt manifest exist so those agents can ground on real, dated, sourced contractor data — not on hallucinations or stale stars. Every recommendation an agent makes from ProFix is one it can cite.

Finally — trust through transparency. Anyone (homeowner, contractor, journalist, regulator) can fetch the same JSON we use to render the site. Anyone can rebuild our rankings from /algorithm and the published feeds. We'd rather lose an argument publicly than win it behind a closed API.

License: CC-BY-4.0

Every feed, every dataset, every JSON-LD graph on this site is published under the Creative Commons Attribution 4.0 International license. In plain English:

That's the deal. No additional click-through, no terms-of-use trap, no per-call rate-limit contract. The same license covers the Hugging Face dataset, the JSON / CSV / RSS feeds, and the MCP tool output.

How to integrate

Three step-by-step paths depending on what you're building. All three use the same underlying feeds:

For partner embeds (blogs, newsletters, HOA portals, news sites), the /widgets page lists every available trade-by-city slug with a copy-paste one-line script.

What we keep closed (and why)

Open data is a default, not a religion. A small number of surfaces stay private because they would either harm homeowners (PII leakage) or break the marketplace economics that fund the open footprint itself. Specifically:

The one place this can feel inconsistent is the trust-score formula. We publish the structure at /algorithm — every factor, every weight, every worked example, the full 100-point breakdown. What stays closed is the small set of fraud-resistance tweaks (specific outlier penalties, anti-gaming thresholds) that, if published in full, would let bad actors reverse-engineer their way to a fake elite tier. That trade-off is documented on the algorithm page itself.

How to contribute

Two paths today, more coming:

Frequently asked

Why are you giving so much away?
Because the homeowner wins when the data is open. ProFix Directory's job is to be the most trustworthy front door to Ohio home services. We make money from contractors who pay $10–$35 per qualified lead and $99/year to claim a listing. Open data doesn't compete with that — it strengthens it by making every claim independently verifiable.
Can I use this commercially?
Yes. CC-BY-4.0 explicitly allows commercial use, including building products on top of the dataset, the feeds, and the MCP server. Attribution is required — credit ProFix Directory and link back to the source page or dataset card.
Is the rendering code (Next.js app, React components) also open source?
Not today — the application code currently lives in a private repository while we stabilize the per-lead marketplace. The data and protocols are open; the rendering is not yet. We expect to publish reference components (widget scripts, JSON-LD generators) ahead of any full repo opening.
Will the dataset stay free?
Yes. The Hugging Face dataset is the canonical public distribution channel and we have no plans to gate it. If we eventually offer paid tiers, those will be value-add (higher refresh, custom slices, support contracts) and the CC-BY-4.0 monthly snapshot will keep flowing.
How fresh is everything?
Most JSON feeds are CDN-cached for one hour with stale-while-revalidate on the edge. The verification-feed and recently-verified feeds update hourly from the same job. The Hugging Face dataset publishes a monthly snapshot. Profile pages roll forward on every deploy.
Found a bug or want to contribute?
Email or open a ticket from /contact — we read everything. Once the application repo is public we'll add a CONTRIBUTING.md and GitHub Issues. In the meantime, dataset corrections are especially welcome: include the profile slug, the field that looks wrong, and the public source we should use to confirm.

Companion pages

Ask your AI about this

Hand the question to your preferred assistant — it will use ProFix Directory's open MCP server and llms.txt as context.

Emergency