Page Title
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.
Privacy and Security
P
Have we minimized data and maximized consent, security, and user control, especially for over‑surveilled communities?
Why This Matters Personal data is both the backbone of one of AI’s biggest unlocks - providing personalized solutions at scale - one of its biggest issues - creating security risks for the data given to it. and also an emerging security risk for communities that are often already over-surveiled or under-served. Technology has generally improved the delivery of social impact. AI is unique in that it’s level of force-multiplication is unprecedented, but the risks are greater than ever. For communities already over-surveilled or under-served, the stakes are higher: a leak or misuse can compromise safety and privacy. The social sector should follow the lead of federal regulators who are already tightening rules accordingly: AI’s progression is seemingly defined by its shift from technology being reactive to proactive. This is evidenced by the emerging “agentic” era, where AI can make decisions and take actions across tools. This further creates immense potential and risks ([1]). For an AI solution to actually expand equity — quicker service delivery, improved accessibility, or contextually-aware customizations — privacy and security must be embedded foundationally, not bolted on later. Federal regulators are already tightening rules accordingly. The FTC updated COPPA to protect children by preventing their data from being shared without a separate opt-in and stronger retention/deletion safeguards. ([2]). The HHS requires concrete safeguards (access controls, encryption, audits) when AI systems touch protected health information. ([3]). The wheel doesn’t need to be reinvented. In fact, the public sector has resources that can be adapted such as NIST frameworks and federal AI governance memos that all point to the same idea: optimize useful data use and minimize harms through consent, minimization, and continuous risk management. ([4])
What a Good Solution Looks Like Opt-in Data Consent: Gives plain-language notices for when AI will be used with data including clear expectations around retention/deletion; Uses plain language opt-in approach to prevent beneficiaries from feeling “tricked”; Provides alternate channels as truly equitable solutions should not make consent a condition for basic services ([2]) Data minimization: Gathers only what the task truly needs; Avoids “just in case” hoarding to potential exposure. ([4]) Encryption, Role-based access & MFA; Keeps audit logs of who accessed what and why; Restricts access to data to designated owners; Employs best security measures ([3]) Third party tool specific security measures: Anonymizes data by stripping direct identifiers and other PII whenever possible; Secure the right contracts: a DPA at minimum; BAA for HIPAA contexts to avoid taking on their liability. Policy guardrails: Has an explicit, public-facing AI policy that states AI cannot make high-stakes decisions about people without human review. Consent reminders & easy opt-outs: Highlights how to reach a human in user flow; Provides reminders and easy-to-access opt outs for personal data usage; Publishes what data is collected, why, and for how long it’s kept Live monitoring: Implements mechanism that can automatically flag misuse; Sets alerts on error spikes or unexpected outputs Incident response plans: Notifies affected users; Spells out steps for mitigation and addressment to protect trust and reduce harm when things go wrong. ([5])
Relevance & Urgency
R
Is AI demonstrably the right tool for this urgent problem, compared with simpler alternatives?
Why This Matters Deploying AI at scale is expensive and difficult. Any good AI solution should be able to clearly demonstrate that AI is the optimal way to solve a real, timely problem - rather than simply following the “shiny new toy”. Those building human-centered solutions, have a responsibility to the communities they serve to maximize every dollar and resource. While AI can be a great enabler in doing so, chasing hype by forcing AI to solve a problem could lead to overlooking simpler, higher quality solutions. A needs assessment demonstrating clear value for AI over alternative solutions is table stakes for a human-centered AI solution’s sustainability.For a human-centered AI solution to be sustainable, a needs assessment clearly demonstrating the value of AI over alternative solutions is an essential prerequisite. The market reality backs this up: many AI pilots don’t reach production because of unclear value population, and some analysts warn that over 40% of “agentic AI” projects may be scrapped by 2027 due to unclear value and rising costs. ([7]) The strongest AI projects are those tackling problems that are already urgent and costly. If the challenge wouldn’t justify effort or investment on its own, layering AI is unlikely to change that. Especially when serving people who are facing current barriers, close a real gap now and have a credible path to value measure soon.
What a Good Solution Looks Like Clear gap with quantitative support: Names a concrete bottleneck (e.g., “median wait time is 41 days”) and who is most affected (e.g., non-English speakers, rural clients). Ideally includes a human, baseline comparison (e.g. 10% quicker than human support) Needs assessment: Defines intended benefit upfront([6]); Shows clear consideration of cheaper or simpler alternatives along with clear reasoning for the choice of AI likely tied to throughput, accuracy, or reach Duplication scan: Lists nearby tools/pilots and how this solution differentiates in specific, isolated ways; partners where possible instead of rebuilding. Early proof plan: Commits to a 60–90-day pilot with go/no-go checkpoints, Shadow tests or A/B tests to isolate AI’s effect rather than total infrastructure to avoid a common failure-to-scale driver. ([7]) ROI + equity together: Defines one efficiency metric (e.g., hours saved per case) and one equity metric (e.g., wait-time drop for the slowest-served subgroup). Scope fit for risk: Keeps pilot use cases low-risk (internal-facing, “read-only” permissions); Avoids high-stakes decisions until benefits and safeguards are proven ([6]) Agentic caution: Limits agentic AI to narrow tasks with cost caps and human review, Acknowledges warnings about unclear value and cost blowups. ([8])
AI-Specific Metrics
A
Do we have concrete, AI‑attributable metrics with baselines, subgroup cuts, and rollback thresholds?
Why This Matters Believing that AI is the right tool for a problem is one thing; proving it is another. Another significant technological shift with AI is that identical inputs can lead to vastly different outputs. That’s an inherent feature of the tech that is further complicated by it’s potential to deliver biased results, hallucinate false information, and, especially with solutions built on foundation models, experience sudden changes to capabilities (also know as model drift). General outcome metrics like ‘number of users served’ aren’t enough, because they don’t show what the AI itself is adding. Good AI-specific metrics isolate whether the technology is responsible for the change. Measures like time saved, error rates, and subgroup equity outcomes are more reliable signals in early pilots as they point directly to the AI’s contribution rather than just overall activity. Without a clear AI evaluation plan, teams risk confusing correlation with causation. Positive outcomes may be attributed to AI when they actually stem from other factors (e.g. staff, funding or even luck), thereby overlooking limitations of their AI integration and tanking their scalability or sustainability. This can also be a huge issue for equity. Accenture (2024) and Stanford CRFM (2023) both found that some GenAI pilots that looked “successful” at the top line actually created uneven outcomes when you broke down results by subgroup, meaning average numbers hid widening disparities. For AI to help marginalized communities, there needs to be proof that it actually narrows gaps, not just moves numbers around. Before deployment, HCAI solutions should define intended benefits with metrics and have mechanisms in place to monitor for and respond to degradation.
What a Good Solution Looks Like Gap-closing focus: Proves the AI is closing an identified, quantifiable gap like cutting down wait times for services or increasing throughput for case reviews. Stable-early metrics: Tracks metrics like time saved and error rates are more reliable in early pilots than benchmark scores and are early indicators of potential problems (Stanford HAI, 2025). Three-lens approach: Measures at least of each of the core metric types ROI (efficiency): How much capacity the AI frees up. Safety: Whether the AI avoids harmful errors Equity: Whether outcomes are fair across groups. Disaggregated Results: Prevents disparities that are hidden through aggregation. Breaks down by subgroups to catch inequities early and promote sustainability Status-quo Baseline: Demonstrates that AI is objectively better than “the old way” which is critical to incentive adjusting behavior. Validates that “cost of wrong answers” is lower with AI than without Defined Key Decisions Upfront: Detects and resolves issues quickly and efficiently due to the near-0 tolerance for errors; Defines intended benefits, degradation monitoring strategy, metrics ownership, result review cadence and rollback thresholds
Cost Realism
C
Do we understand total cost of ownership now and at 10× scale, and have controls to prevent runaway spend?
Why This Matters Just because AI is efficient, does not mean that it is cheap. It’s true that prototyping is quicker, cheaper and more accessible that ever, but real-world usage includes various costs. These include API calls to foundation models, high data storage fees, and the need for experts to scale, monitor and fix issues. AI usage often grows faster than expected as it’s cost structure represents yet another departure from traditional technology. AI solutions incur costs every time they send a prompt to a foundation model. Even as unit prices keep falling (the cost to get GPT-3.5-level answers fell ~280× between Nov 2022 and Oct 2024), yet total bills still rise as organizations scale usage and adopt more powerful models. ([9]) Finance and engineering teams are already flagging this. The percentage of organizations actively managing AI spend and building cost visibility has more than doubled in the last year ([10]). Organizations that rent AI cloud compute see an average of ~40% year-over-year growth in usage as they experiment and scale.[11] While cost scaling with tech deployment isn’t new, the slope for AI can much be steeper than typical software because AI bills rise with every prompt, longer contexts, and more powerful models—even as unit prices fall. Bottom line: if a solution doesn’t show its total cost of ownership (TCO)—build and run— there’s a high chance that a promising HCAI demo can’t be sustained.
What a Good Solution Looks Like TCOs divided into line items; Avoids the “AI—$X” black box represented by a lump sum and includes at a minimum: Build (setup, integrations), Run (tokens/inference, vector DB, storage/egress), and Govern (monitoring, evaluation time, model updates). Usage-based forecasting. Models costs around expected volume (e.g., monthly conversations × average tokens per turn × price per million tokens) with best/base/worst-case scenarios; Calculates how costs change if adoption doubles; Includes forecasts for training, inference, storage, and egress—not just licenses ([12]) Controls to prevent runaway spend. Caps budgets through explicit hard-line thresholds, per-user/request quotas, and alerts when spend or tokens exceed thresholds as per the standard cloud cost-control practice for genAI detailed by AWS. ([13]) Cost/quality trade-off plan. Defines explicit, interpretable lines for when they’ll switch to smaller/cheaper models, cache results; Optimizes prompt lengths and other tech stack decisions for efficiency and quality; Cuts costs with assurance of not hurting outcomes. Spend tracking: Names a cost owner and the tools/process to monitor AI costs; Confirms pricing tiers/discounts in writing; includes a fallback if vendor prices change Scale realism. Shows scale math (how OPEX changes at 10× volume) and acknowledges infra realities (GPU/serving costs rise with adoption). ([14]); Shares potential scaled cost structure during diligence Sustainable funding. Defines clear post-funding revenue path including operating budget, payer, or pricing) so services don’t collapse when raised money ends.
Timeline Clarity
T
Are milestones, safety gates, and go/no‑go decisions time‑boxed and tied to evidence?
Why This Matters For HCAI, clear timelines protect people from broken promises and endless pilots that never reach them. AI demos often look “almost ready,” but much of the hardest work is the process of converting demos to production. This includes things like policy reviews, data access and security checks, staff training, workflow integrations, and monitoring setup. Without a calendar-backed plan for those steps, pilots drift—burning time, budgets, and trust while communities wait. Some analysts even warn of a potential pilot purgatory and predict that over 40% of “agentic AI” projects may be canceled by 2027 because timelines don't allow value to be shown or cost to be controlled fast enough.[8][8] Though there’s room for nuance, HCAI solutions should prioritize quality over speed when they’re in tension. Timelines that account for the most common blockers (workforce capacity, policy compliance, and up-to-date use policies [15]) force teams to do the unglamorous work up front which may delay a product launch, in exchange for assurance that the tool can be used safely and consistently. That work is also critical to making a solution scalable, as tackling those problems upfront are much easier than retroactively adjusting a large, deployed system. GAO’s recent reviews echo this: workforce capacity, policy compliance, and up-to-date use policies are common blockers that timelines must account for explicitly. Timelines aren’t just important to launch - AI should also be treated as a living system that must be monitored continuously and de-commissioned when risk exceeds tolerance, preventing potential harm created by open-ended experiments.[16][16] Timelines are also accountability for funders. Clear start date, mid-point checks, and decision date (scale, adjust, or stop) force teams to show evidence (e.g., reduced wait times and no widening equity gaps) before expanding impact claims.
What a Good Solution Looks Like 30/60/90 Plans with Stakes: Names start, mid-point, and day-90 decision (go/adjust/stop); Defines intended benefits and implements minimum safety practices by set dates. [6]; Pre-registered success criteria: Anchors to quantitative, pre-agreed metrics (efficiency, safety, equity); Defines what “good enough to scale” means (e.g., ≥25% wait-time reduction and no equity gap widening); Defaults to pause if metrics aren’t met Go/No-Go checkpoints: Segments timelines into pieces that compare results to targets; Triggers rollbacks if risk or cost exceeds thresholds Time boxed “plumbing”: Includes integration with real workflows, staff training, data access approvals, and privacy/security reviews; Stops pilots if not fully met. Scoped protections: Limits AI to low-risk use (assist, draft, triage suggestions) until safety and equity metrics are stable; waits for a later phase with fresh approvals for higher-stake automation Documented handoff to “steady state: Includes the first 90 days of scale-up: who owns it, how training continues, how monitoring is done, and when the next review happens (e.g., quarterly); Plans sunset and user notice in case pilot fails
Implementability (Feasibility)
I
Do we have the people, data, systems, and accessibility to run this safely in the real world?
Why This Matters Building has never been more accessible due to AI’s ability to democratize expertise and work efficiently. But it’s much easier to sketch a great AI idea than to run one. Most HCAI don’t fail due to the AI’s functionality; but rather because of the difficulty of launching and sustaining the tool with the necessary people, data, and systems. [17]. People: There are justified concerns over job displacement stemming from AI. It’s true that AI is already replacing a lot of processed work (CITE), but effective HCAI solutions depend on humans more than ever. Deploying automated solutions in high-risk contexts requires human oversight and lived expertise. Data: AI is only as good as the data given to it. The best model infrastructure can’t help AI systems if its data, which is essentially its source of truth, isn’t treated as high priority. Feasibility hinges on data that is high correct, complete, and representative of all users Systems: HCAI are never “just” AI - they require necessary infrastructure to ensure that the tool is secure, monitorable, and trust worthy. Just as a beautiful home requires plumbing, AI requires supportive infrastructure. Implementability begets equity. If an MVP doesn’t fit the languages, accessibility needs or other context of intended beneficiaries, the benefits go to the easiest-to-reach users while everyone else waits, often exacerbating divides. Feasibility for HCAI solutions is all about fit whether that’s with real users, real data or real workflows.
What a Good Solution Looks Like Ownership and Authority: Names product/operations owners, data privacy lead, and “frontline” users, defines decision makers for rollbacks Data readiness: Maps consent, sharing and vendor terms; documents available fields, quality snapshots, access rights, constraints, and update frequency Embedded Systems and Operations: Monitors for quality, safety, equity and cost; Lays out incident response plan, rollback mechanisms, and production deployment processes; Educates and demystifies AI for staff-facing tool Practical Scope: Prioritizes low-risk tasks for initial deployments; Proves accuracy and oversight before higher-stakes automations Usability: Demonstrates simple contextually-aware UX with little-to-no barrier of entry for use; Embeds mechanisms for users to provide feedback or get connected with a human Accessibility: Works on users actual devices and otherwise meets them at the proper contexts; Supports relevant languages and assistive tech; Names gaps and lays out plan to address them
Change Resilience
C
Can the system withstand vendor/model drift and degrade safely while we fix issues?
Why This Matters As mentioned throughout this framework, AI isn’t static. Model drift is the real phenomenon that AI that worked yesterday can slip tomorrow, often by vendor-made decisions. When building on top of foundational models, as most do, users feel those shifts even if code hasn’t been touched. That’s risky in human-impact contexts: small changes can redefine functionality and raise error rates or widen equity gaps unless. It’s critical that HCAI solutions have mechanisms to be on top of potential changes and are able to correct course. Public guidance even assumes that change is the default/ Explainability is critical to resilience. Being able to see sources, reasoning, or uncertainty empowers people to understand why the AI responded as it did. For users, this furthers trust by facilitating their ability to notice shifts, flag issues, and choose safe alternatives. Internally, it allows teams to move quickly by pinpointing where issues are stemming from. Change resilience can be complex, but following compliance guidelines can kill two birds with one stone. Not only will it protect an HCAI solution from legal (and often ethical) risks, but it also promotes trustworthy design. Early compliance is a feature - it reduces approval friction, signals reliability to partners, and becomes a real differentiator as the AI landscape keeps shifting.
What a Good Solution Looks Like Version control: Ties production to specific model, prompt or retrieval versions; updates change log when anything updates [18] Transparent interfaces: Presents key sources and rationale to users; Provides insight into confidence or limits to make shifts noticeable and reportable NIST Publications Watch metrics: Defines short set of metrics to watch (accuracy, escalation/override rates, disaggregated deltas, cost spikes) with alert thresholds and named owners [6] Shadow or A/B Deployments: Deploys features in shadow mode to test on real data and confirm no regression before full rollout; runs A/B tests to confirm positive user impact over status quo or alternatives [18] Safe fallback modes: Degrade systems if metrics slip by limiting capabilities, tightening human in the loop, or defaulting to a temporary human-only mode until fixes are deployed [6][6] “Risk & Change” Dossier: Documents (even simply) intended benefits and metrics, model and data inventories, latest eval results, know limits, and incident history; Updates regularly to make it a living document [16] Compliance as a Feature: Includes procurement-ready artifacts (benefit definition, monitoring plan, change records) up front to speed approvals and build trust [19] Contractual Vendor Resilience: Pushes for vendor agreements that include advance change notice, access to earlier models when possible, and portability for knowledge bases; Plans alternatives by naming backup models and modes [8]
Accountability & Oversight
A
Who is answerable, how do users appeal, and how do we stop/rectify when harm appears?
Why This Matters AI is primarily known for changing how work gets done, but it also changes the nature of control. AI can proactively make decisions, prescribe advice and trigger follow up steps. What will never move to the machine is responsibility. If an AI creates a negative outcome, the humans who decided to deploy it are responsible. When no one is able to answer for what the system does or no one is watching, the tool is viewed as unaccountable and dangerous. HCAI solutions are especially complex. A generic B2B AI product can underwrite a small percentage of errors as a business cost. An HCAI solution could be immediately viewed as untrustworthy or dangerous even from one unideal outcome. While the primary incentive for safety is to avoid harm, setting up the infrastructure for accountability and oversight helps an HCAI solution avoid eroding trust, weakening consent, and inviting reputational and ethical risks. Accountability and oversight are the primary protectors for users and legitimacy. They clarify who owns decisions, who explains them, and who answers when things go wrong. They prevent users from feeling locked out into an opaque result. There’s internal incentives too. Accountability as a feature reduces friction with partners and speeds up review processes. Oversight is a vehicle for learning as incidences become signals on what to improve to better guide product decisions. The benefits of clear ownership and transparency are distributed. They tell communities, funders and regulators that the organization intends to be answerable over time. It orients AI to keep it in service of people, builds trust when uncertainty is unavoidable and sustains a secure foundation needed to scale.
What a Good Solution Looks Like Named Owners: Documents who is accountable for a system (product/ops lead), who monitors for quality and safety (metrics owner) and who decides when to launch, pause and rollback. [16] Emergency Stops and Rollback Triggers: Implements ability to manually override and stop the system quickly; Lists triggers that automatically move the system to safer modes (e.g. human-review) when thresholds are crossed (e.g. error spikes, complaints, subgroup gaps) [20] Incident Response Plan: Outlines comprehensive plan for addressing incidents including: how to detect, who triages, how users are notified, and how fixes are verified and logged; Records incidents and changes for audits and learnings[18] Explainability: Provides key sources, rationale and uncertainty to empower users to choose safer options; ensures staff is able to access and answer why a decision was made to support reviews and appeals [16] Transparency and Feedback: Discloses AI use; Implements feedback mechanisms to flag errors routed to named owner; Provides access to alternative human resources. Safe Deployments and Controlled Rollouts: Runs regression tests and evaluations on representative behavior before going live; Rolls out features incrementally [20] Policy-ready Foundation: Creates core artifacts like an AI policy, benefit statement, monitoring plan, change log, and incident record leading to reduced approval friction and signaled reliability to stakeholders.[16]
Lived Expertise & Trust
L
Have affected people shaped the design, and do they understand, trust, and control the system?
Why This Matters The most important part of an HCAI solution is not the AI; it’s people who will use it. A technically strong project will fail if people do not trust it. Legitimacy doesn’t come from scoring high on evaluations; adoption hinges on users feeling seen, respected and empowered. Public sentiment around AI generally ranges from cautious to fearful. Half of US adults are more concerned than excited about AI, primarily citing societal risks. [21] . Trust patterns vary by age or geography [22], so trust can even vary just by who you’re trying to serve. Lived expertise is not an add-on; it’s core to quality. Technical wins often miss real workflows, constraints or other cultural contexts. Early, genuine involvement of those potentially affected by a system improves contextual fit and acceptance. As shown throughout the framework, transparency and explainability are core attributes of making AI trustworthy as they empower people to feel in control of the tools they’re using.
What a Good Solution Looks Like Document & Compensated Co-design: Collaborates with communities and frontline staff to set goals, determine expectations around data use, and determine interface choices; Compensates and records participation. .[23] Accessible Transparency: Spells out what the AI does, what it doesn’t do and how their data is used; Tailors disclosures to the language of users in straightforward words .[16] Real-context Testing: Implements usability checks with actual users and devices before scaling; Notes user feedback and maps to how feedback has been addressed [24] Disaggregated Equity Metrics: Tracks outcomes by relevant subgroups (e.g. language, disability status, etc.); Avoids presenting overall averages as standalone information; Conducts reviews if gaps widen between groups [21] Safe Boundaries: Includes human review and appeals for all high stakes decisions; States limits in product .[16] Feedback Action Pipeline: Provides channels to flag errors; Routes flagged errors to named owners with timelines; Shares fixes and summaries back in plain language .[24] Tracked, Not Assumed Trust: Measures adoption, satisfaction, and complaint-resolution by subgroup to quantify trust; Shares to partners or publicly when appropriate [16]
Section Title
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.
List Title
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.
List Title
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.
List Title
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.
List Title
This is a Paragraph. Click on "Edit Text" or double click on the text box to start editing the content and make sure to add any relevant details or information that you want to share with your visitors.
