top of page

PRACTICAL Framework

A decision architecture for equity-centered AI evaluation

How It Works

9

Core Tests

2

Cross-cutting Lenses

1

Recommendation

The Nine Tests

P
R

Privacy and Security

P

Personal data powers AI's ability to personalize at scale, but it also creates serious risks. For communities already over-surveilled, a leak or misuse can compromise safety and trust. AI introduces unique vulnerabilities that traditional software doesn't have.
 

Privacy and security aren't add-ons. Without them, even well-intentioned AI can widen inequities instead of closing them.

Green Flags 
Red Flags

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

Vague "we may use your data for AI" language

No plan for handling breaches or notifying users

Third-party tool use without proper data agreements

Relevance & Urgency

R

We fund AI when it’s the best tool for a real bottleneck—throughput, wait times, accuracy, or reach—not because it’s fashionable. A needs assessment compares AI to simpler options and shows the work wouldn’t happen (or wouldn’t work) otherwise. We ask for a 60–90‑day pilot plan tied to specific benefits and equity outcomes. Evidence: gap statement, alternatives considered, early proof plan, ROI + equity metrics.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

AI-Specific Metrics

A

Belief isn’t proof. We require a causal plan (A/B, shadow, or DiD), baselines, and owners/cadence. Track three lenses together—efficiency (time/capacity), safety (errors/incident severity), and equity (subgroup outcomes)—so averages don’t hide harm. Show confidence intervals and pre‑declared thresholds for go/adjust/stop. Evidence: 1‑page eval plan, baseline vs pilot chart with CI, subgroup table, rollback triggers.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

Cost Realism

C

AI isn’t “free once built.” We expect line‑item TCO—build, run, govern—plus usage‑based forecasts (volume × tokens × price) and controls to prevent runaway spend. Include scale math (10× volume), caching/cheaper‑model strategies, and a named cost owner. If vendor pricing shifts or usage spikes, the plan should still hold. Evidence: TCO sheet, spend controls, scale scenario, fallback model choices.
Report‑only: Environmental footprint (ops)—estimated COâ‚‚e range + reduction plan.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

Timeline Clarity

T

Pilots drift without calendar discipline. We ask for a 30/60/90 plan with go/adjust/stop checkpoints, pre‑registered success criteria (efficiency, safety, equity), and scoped protections (e.g., assistive use before automation). Timebox the unglamorous work—data access, policy reviews, staff training—so you can ship safely and decide with evidence. Evidence: dated milestones, success thresholds, mid‑point review, decision memo template.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

 Implementability (Feasibility)

I

Great ideas fail on people, data, and workflow fit. We assess skills and capacity, data quality/rights, integration into real processes, and accessibility (devices, languages, assistive tech). If there are gaps, the plan names partners and sequencing. Usability checks with real users are a must. Evidence: capability map, data readiness notes, integration diagram, training plan, accessibility checklist, usability findings.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

Change Resilience

C

Models drift; vendors change. We expect version pinning, a change log, shadow/A‑B for new features, watch‑metrics with alert thresholds, and safe‑degrade modes (tighten human‑in‑the‑loop or fall back to human‑only) if metrics slip. Document how you’ll adapt to policy/vendor shifts and keep users informed. Evidence: drift dashboard, deployment plan, fallback matrix, change log.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

 Accountability & Oversight

A

Humans own outcomes. We ask for named owners (product/ops, metrics, incident), a kill switch, rollback triggers, and a simple incident process (detect → notify → fix → learn). Public‑facing transparency about AI use builds trust; internal governance materials speed approvals. Evidence: owner list, policy snippet, incident SOP, transparency text, oversight cadence.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

Lived Expertise & Trust

L

Trust is earned with people, not just metrics. We look for co‑design with frontline staff and communities, plain‑language notices, feedback/appeals routes, and proof of adoption by subgroup (not just topline). The goal is tools people choose to use—and that improve outcomes for those furthest from service. Evidence: co‑design notes, consent/notice copy, feedback pipeline, usage/retention by subgroup.

Green Flags 
Red Flags

Avenir Light is a clean and stylish font favored by designers. It's easy on the eyes and a great go-to font for titles, paragraphs & more.

Clear opt-in consent with easy opt-outs

Only collect data that's needed

Encryption, access controls and audit logs

C
T
I
C
A
L
L
A

PRACTICAL
Nine Tests for Evaluating AI that Impacts People

Privacy & Security

Is personal data collected and used with consent, minimized, and protected?

Cost Realism

Are build/run/govern costs known now and at 10× scale with controls in place?

Change Resilience

Will the system monitor drift and fail safely if models or vendors change?

Relevance & Additionality

Is AI the best tool rather than a simpler alternative for a current problem?

Timeline Clarity

Are there 30/60/90 decisions and stop conditions to avoid pilot purgatory?

Accountability & Oversight

Who owns it, who can stop it, and how are incidents handled?

Attribution & AI metrics

Can outcomes be proven to be the result of AI rather than other factors?

Implementability

Are the right staff, data, workflows, and accessibility in place for success?

Lived expertise & Trust

Were intended beneficiaries involved in design and do subgroup outcomes hold?

bottom of page