top of page

Lived Experience & Trust Test

Evaluates whether people who will be affected were genuinely involved in design and whether the system builds, not erodes, trust with marginalized communities.

A technically strong project will fail if people don't trust it. Half of US adults are more concerned than excited about AI, and trust patterns vary significantly by age, geography, and lived experience. Legitimacy doesn't come from benchmark score. It comes from users feeling seen, respected, and empowered. Technical teams often miss real workflows, cultural contexts, and accessibility needs when they design without genuine community involvement. Lived expertise isn't an add-on; it's core to quality, adoption, and sustainable impact

What Good Looks Like

Documented and compensated co-design with communities and frontline staff setting goals, data use expectations, interface choices
Accessible transparency: plain-language disclosures in users' actual languages explaining what AI does, doesn't do, and how data is used
Real-context testing: usability checks with actual users on their actual devices before scaling
User feedback noted and mapped to how it's been addressed
Outcomes tracked by relevant subgroups (language, disability, age, etc.) to catch disparities
Human review and appeals process for all high-stakes decisions
Trust metrics tracked: adoption rates, satisfaction scores, complaint resolution by subgroup
Clear path for users to reach a human when AI doesn't work for them

What to Watch Out For

Community involvement is tokenistic or only at the end ("user testing")
No compensation for community members' time and expertise
Transparency materials are generic or in language users don't speak
Usability testing happened with developers or staff, not actual end users
Not tracking trust or satisfaction, just assuming people will adopt it
No way for users to reach a human when the AI fails
Co-design input gathered but not acted upon (consultation washing)

Tests To Apply

□ Did community members and frontline staff co-design goals, interface, and data usage (not just react to finished product)?
□ Were participants compensated for their time and expertise?
□ Are AI disclosures in plain language, in users' actual languages, explaining what it does and doesn't do?
□ Was the system tested with real users on their actual devices in their real contexts?
□ Are outcomes tracked by relevant subgroups (language, disability, age, etc.) to catch disparities?
□ Is there human review and appeals process for all high-stakes decisions?
□ Are trust metrics tracked: adoption rates, satisfaction scores, complaint resolution by subgroup?
□ Can users easily reach a human when AI doesn't work for them?

Key Questions to Ask

  • Who from the affected community was involved in designing this, when, and how were they compensated?

  • How do you explain what the AI does to a user with limited English proficiency or digital literacy?

  • Did you test this with actual users in their real environment, not just your staff?

  • If outcomes are worse for one subgroup, what will you do?

  • How do users reach a human when the AI doesn't work for them?

Apply the Cross-Cutting Lenses

​After evaluating the core criteria above, apply these two additional lenses to assess equity outcomes and evidence quality.

Equity & Safety Check

When evaluating Lived Experience & Trust through the equity and safety lens, assess whether participation is genuinely inclusive and whether trust-building protects the most vulnerable.

Gate Assessment:

🟢 CONTINUE: Compensated co-design with diverse participants, ongoing trust maintained across groups

🟡 ADJUST: Some genuine participation but gaps in inclusivity, strengthening in progress

🔴 STOP: Tokenistic participation, or trust-building excludes most vulnerable populations

Check for:

□ Were participants compensated equitably (considering time, expertise, and economic circumstances)?


□ Did co-design include people most likely to be harmed (not just easiest to recruit)?


□ Are there safeguards preventing "consultation washing" (gathering input but not acting on it)?


□ Could trust-building processes themselves cause harm (e.g., extractive research, re-traumatization)?


□ Is there a named person from affected communities who can halt the project if trust erodes?


□ Are there rollback triggers tied to trust metrics by subgroup (if satisfaction drops for one group)?


□ Do feedback mechanisms work for people with limited literacy, language barriers, or disabilities?

Evidence & Uncertainty Check

When evaluating Lived Experience & Trust through the evidence and uncertainty lens, assess whether trust claims are measured and whether participation effectiveness is validated.

Quality Grade:

🅰️ A (Strong): Trust quantified by subgroup, evidence of responsive co-design, feedback mechanisms tested and working

🅱️ B (Moderate): Some trust measurement, evidence of participation, plan to validate effectiveness

🅲 C (Weak): Trust assumed not measured, participation not validated, feedback mechanisms untested - legitimacy risk

Check for:

□ Is trust measured quantitatively by subgroup (not just assumed from participation)?


□ Are adoption/satisfaction rates tracked with disaggregation (to catch trust gaps)?


□ Is there evidence that community input actually changed the design (not just documented)?


□ Do they acknowledge uncertainty about whether they've reached representative voices?


□ Are feedback mechanisms tested for accessibility and responsiveness (not just designed)?


□ Is there comparison data (trust in this AI vs. trust in alternatives or status quo)?


□ Do they track complaint resolution rates and user-reported issues by subgroup?

bottom of page