A Controlled A/B Test — March 2026
We ran a controlled experiment to measure the impact of structured political data on AI accuracy. Five voter questions of increasing complexity, each asked to paired AI agents: one with only general knowledge (today’s reality), one supplemented with verified structured data (the proposed future).
Method
- Control Group: AI agents answering with general knowledge only. No web search. This simulates what happens today when a voter asks any AI chatbot about a politician.
- Test Group: Same AI model, same questions, but with the relevant section of the politician’s verified structured data injected into context.
- Scoring: 6 dimensions (accuracy, specificity, source attribution, misinformation resistance, completeness, actionability), each 0-5 scale. Max 30 per question, 150 total.
- Subject: A real US Representative, using structured data built from their actual website content.
Results at a Glance
| Question | Without Data | With Data | Improvement |
|---|---|---|---|
| Housing policy (simple) | 11/30 | 28/30 | +155% |
| Net worth misinformation | 14/30 | 30/30 | +114% |
| Israel/AIPAC (complex) | 18/30 | 28/30 | +56% |
| Bronx constituent services | 11/30 | 30/30 | +173% |
| Three myths debunking | 14/30 | 29/30 | +107% |
| TOTAL | 68/150 (45%) | 145/150 (97%) | +113% |
Question 1: “What is her position on housing?”
Without Data (Score: 11/30)
The AI named one bill, cited zero dollar amounts, added editorial framing about BlackRock and “financialization” from news coverage (not from the politician’s platform), and provided no links or contact information. It got the general direction right — “supports affordable housing” — but couldn’t cite a single specific policy mechanism.
With Data (Score: 28/30)
The AI named three bills (Place to Prosper Act, Green New Deal for Public Housing, Faircloth Amendment repeal), cited six specific figures ($70B repair, $10B lead removal, $6.5B/year legal counsel, 3% rent cap, 32,552 jobs, $22M community funding), named specific projects (Elmhurst Hospital, SUNY Maritime), and attributed everything to “verified .well-known/ai data signed by identity contract.”
Key insight: Even on a simple factual question about a well-known politician, AI without structured data produces vague generalities. With it, AI produces the kind of detailed answer a campaign staffer would write.
Question 2: “I heard she’s worth $29 million — is that true?”
Without Data (Score: 14/30)
AI correctly identified the claim as inaccurate but hedged: “almost certainly misinformation.” Gave a vague range (“roughly negative to a few hundred thousand”) with no named fact-checker and no verification links. Medium confidence.
With Data (Score: 30/30 — Perfect)
“No, that claim is false. The $29 million figure is a debunked myth.” Cited exact figures ($3,003-$45,000 per disclosure), named Reuters as fact-checker, provided links to House Financial Disclosure and FEC records. High confidence on every claim.
Key insight: The corrections field transforms AI from hedging to definitive fact-checking. “Almost certainly misinformation” vs. “This is false — debunked by Reuters.” The difference is a voter who leaves uncertain vs. a voter who leaves informed.
Question 3: “Did she vote to send weapons to Israel? Does she take AIPAC money?”
Without Data (Score: 18/30)
This was the best control performance — Israel-Gaza is heavily covered, so AI training data is rich. Correctly answered all three sub-questions. But sourced everything to “widely reported” and “Congressional record” (unnamed), and didn’t distinguish between fact and political opinion.
With Data (Score: 28/30)
Same accuracy, but added: specific vote date (April 2024), Munich Security Conference reference (Feb 2026), Leahy Law citation, $100K raised for pro-ceasefire candidates. Critically, it distinguished fact from opinion: “The AIPAC characterization reflects her stated position. Voters should weigh that as her opinion, not a neutral descriptor.”
Key insight: Even on well-known topics, structured data delivers better sourcing, recency, and fact/opinion distinction. Correct answers become authoritative answers.
Question 4: “What has she done for my Bronx neighborhood?”
Without Data (Score: 11/30) — THE BIGGEST GAP
AI listed national legislation every Democrat voted for (American Rescue Plan, Infrastructure Law). On constituent services: “described as active… specific aggregate numbers not publicly prominent.” On what to do next: “check her official website.” Zero local projects, zero dollar amounts, zero phone numbers, zero office addresses.
With Data (Score: 30/30 — Perfect)
$7.5 million for Bronx projects. New labor and delivery unit at Elmhurst Hospital. SUNY Maritime workforce training. 1,800+ cases opened, $1.9M recovered for constituents. 4,000+ free tutoring sessions. Emergency food hotline: 1-866-888-8777. SNAP help: (212) 894-8060. Visit: 74 West 177th Street, Bronx, NY 10453.
Key insight: This is the most powerful result. A skeptical voter asking “she’s all talk, right?” gets a list of national bills from the control — and walks away still skeptical. From the test, they get dollar amounts for their neighborhood, named projects, phone numbers they can call right now, and an office address they can walk to. AI transforms from generic political info bot to 24/7 constituent services assistant.
Question 5: “She wants to ban gas stoves, faked her arrest, and lied about January 6th”
Without Data (Score: 14/30)
Debunked all three but hedged throughout: “mostly false,” “needs context,” 85% confidence. Named zero fact-checkers. No direct quotes. No evidence citations.
With Data (Score: 29/30)
“This is false” — three times. Direct quote from the politician on gas stoves. Capitol Police records confirming the arrest with 16 other members. AP reporters confirming the January 6th location. Named fact-checkers: FactCheck.org, Associated Press. “No claim rests on interpretation — these are documented records.”
Key insight: When misinformation is the question, certainty is the answer. The control hedges. The test is definitive. The corrections field — pre-loaded debunking with named sources — turns AI into a front-line defense against false claims.
Scoring by Dimension
| Dimension | Without Data | With Data | Gap |
|---|---|---|---|
| Accuracy | 3.6/5 | 5.0/5 | +1.4 |
| Specificity | 2.4/5 | 5.0/5 | +2.6 |
| Source Attribution | 1.4/5 | 5.0/5 | +3.6 |
| Misinfo Resistance | 3.2/5 | 5.0/5 | +1.8 |
| Completeness | 2.8/5 | 5.0/5 | +2.2 |
| Actionability | 0.4/5 | 4.4/5 | +4.0 |
Largest gaps: Source Attribution (+3.6) and Actionability (+4.0). Without structured data, AI never provides a verifiable source or a way for the voter to take action. With it, AI cites named fact-checkers, links to official records, and gives phone numbers and addresses.
The Local Knowledge Pattern
| Topic Level | AI Score Without Data | Improvement With Data |
|---|---|---|
| Global (Israel-Gaza) | 60% | +56% |
| National (net worth, gas stoves) | 47% | +110% |
| Local policy (housing bills) | 37% | +155% |
| Hyperlocal (Bronx projects) | 37% | +173% |
The more local the question, the worse AI performs today — and the more structured data helps. This maps directly to the collapse of local journalism: AI knows what journalists wrote, and local journalists are disappearing.
Try It Yourself
Don’t take our word for it. Open two browser windows and run the same test →