PRACTICAL Framework

A decision architecture for equity-centered AI evaluation

How It Works

Core Tests

Cross-cutting Lenses

Recommendation

The Nine Tests

Privacy and Security

Personal data powers AI's ability to personalize at scale, but it also creates serious risks. For communities already over-surveilled, a leak or misuse can compromise safety and trust. AI introduces unique vulnerabilities that traditional software doesn't have.

Privacy and security aren't add-ons. Without them, even well-intentioned AI can widen inequities instead of closing them.

Green Flags

Red Flags

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

Vague "we may use your data for AI" language

No plan for handling breaches or notifying users

Third-party tool use without proper data agreements

[Read the full Privacy & Security Test]

Relevance & Urgency

We fund AI when it’s the best tool for a real bottleneck—throughput, wait times, accuracy, or reach—not because it’s fashionable. A needs assessment compares AI to simpler options and shows the work wouldn’t happen (or wouldn’t work) otherwise. We ask for a 60–90‑day pilot plan tied to specific benefits and equity outcomes. Evidence: gap statement, alternatives considered, early proof plan, ROI + equity metrics.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Relevance & Urgency Test]

AI-Specific Metrics

Belief isn’t proof. We require a causal plan (A/B, shadow, or DiD), baselines, and owners/cadence. Track three lenses together—efficiency (time/capacity), safety (errors/incident severity), and equity (subgroup outcomes)—so averages don’t hide harm. Show confidence intervals and pre‑declared thresholds for go/adjust/stop. Evidence: 1‑page eval plan, baseline vs pilot chart with CI, subgroup table, rollback triggers.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full AI-Specific Metrics Test]

Cost Realism

AI isn’t “free once built.” We expect line‑item TCO—build, run, govern—plus usage‑based forecasts (volume × tokens × price) and controls to prevent runaway spend. Include scale math (10× volume), caching/cheaper‑model strategies, and a named cost owner. If vendor pricing shifts or usage spikes, the plan should still hold. Evidence: TCO sheet, spend controls, scale scenario, fallback model choices.
Report‑only: Environmental footprint (ops)—estimated CO₂e range + reduction plan.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Cost Realism Test]

Timeline Clarity

Pilots drift without calendar discipline. We ask for a 30/60/90 plan with go/adjust/stop checkpoints, pre‑registered success criteria (efficiency, safety, equity), and scoped protections (e.g., assistive use before automation). Timebox the unglamorous work—data access, policy reviews, staff training—so you can ship safely and decide with evidence. Evidence: dated milestones, success thresholds, mid‑point review, decision memo template.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Timeline Clarity Test]

Implementability (Feasibility)

Great ideas fail on people, data, and workflow fit. We assess skills and capacity, data quality/rights, integration into real processes, and accessibility (devices, languages, assistive tech). If there are gaps, the plan names partners and sequencing. Usability checks with real users are a must. Evidence: capability map, data readiness notes, integration diagram, training plan, accessibility checklist, usability findings.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Implementability Test]

Change Resilience

Models drift; vendors change. We expect version pinning, a change log, shadow/A‑B for new features, watch‑metrics with alert thresholds, and safe‑degrade modes (tighten human‑in‑the‑loop or fall back to human‑only) if metrics slip. Document how you’ll adapt to policy/vendor shifts and keep users informed. Evidence: drift dashboard, deployment plan, fallback matrix, change log.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Change Resilience Test]

Accountability & Oversight

Humans own outcomes. We ask for named owners (product/ops, metrics, incident), a kill switch, rollback triggers, and a simple incident process (detect → notify → fix → learn). Public‑facing transparency about AI use builds trust; internal governance materials speed approvals. Evidence: owner list, policy snippet, incident SOP, transparency text, oversight cadence.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Accountability & Oversight Test]

Lived Expertise & Trust

Trust is earned with people, not just metrics. We look for co‑design with frontline staff and communities, plain‑language notices, feedback/appeals routes, and proof of adoption by subgroup (not just topline). The goal is tools people choose to use—and that improve outcomes for those furthest from service. Evidence: co‑design notes, consent/notice copy, feedback pipeline, usage/retention by subgroup.

Green Flags

Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

[Read the full Lived Expertise & Trust Test]

Sources

PRACTICAL
Nine Tests for Evaluating AI that Impacts People

Privacy & Security

Is personal data collected and used with consent, minimized, and protected?

Cost Realism

Are build/run/govern costs known now and at 10× scale with controls in place?

Change Resilience

Will the system monitor drift and fail safely if models or vendors change?

Relevance & Additionality

Is AI the best tool rather than a simpler alternative for a current problem?

Timeline Clarity

Are there 30/60/90 decisions and stop conditions to avoid pilot purgatory?

Accountability & Oversight

Who owns it, who can stop it, and how are incidents handled?

Attribution & AI metrics

Can outcomes be proven to be the result of AI rather than other factors?

Implementability

Are the right staff, data, workflows, and accessibility in place for success?

Lived expertise & Trust

Were intended beneficiaries involved in design and do subgroup outcomes hold?

PRACTICAL Framework

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

Green Flags

Red Flags

PRACTICAL Nine Tests for Evaluating AI that Impacts People

Impacts AI

PRACTICAL
Nine Tests for Evaluating AI that Impacts People