Change Resilience
Evaluates whether the system can handle inevitable changes (e.g. model updates, vendor shifts, performance drift) without harming users.
AI isn't static. Model drift is real: AI that worked yesterday can slip tomorrow, often through vendor decisions outside your control. Small changes can raise error rates or widen equity gaps unless teams have mechanisms to detect and respond quickly. When serving people facing barriers, these shifts aren't just technical problems—they break trust and potentially cause harm. Change resilience means treating AI as a living system that needs continuous monitoring, version control, and safe fallback modes when things degrade.
What Good Looks Like
✓ Version control tying production to specific model, prompt, or retrieval versions with change logs
✓ Transparent interfaces presenting key sources and rationale to users (confidence/uncertainty visible)
✓ Watch metrics defined with alert thresholds and named owners monitoring for drift
✓ Shadow or A/B deployments testing changes on real data before full rollout
✓ Safe fallback modes that degrade gracefully (limit capabilities, tighten human-in-loop, revert to human-only)
✓ "Risk & Change" documentation tracking intended benefits, model/data inventories, eval results, known limits, incident history
✓ Contractual vendor agreements including advance change notice and access to earlier versions
✓ Backup models and modes identified in case primary vendor changes dramatically
What to Watch Out For
✗ No version control (can't identify which model/prompt version is in production)
✗ Results are black boxes to users (no explanation of sources or reasoning)
✗ No monitoring for performance degradation over time
✗ Haven't thought about what happens if vendor changes pricing or deprecates their model
✗ No plan for testing changes before deploying them to users
✗ No documented rollback process
✗ Vendor contracts don't include change notification or version access provisions
Tests To Apply
□ Is production tied to specific model/prompt versions with documented change logs?
□ Do users see key sources, rationale, or confidence levels (not just outputs)?
□ Are there defined metrics being watched with alert thresholds and named owners?
□ Do they use shadow or A/B testing before full rollouts?
□ Is there a safe fallback mode if metrics slip (e.g., tighter human review, pause automation)?
□ Do vendor contracts include advance notice of changes and access to prior versions if possible?
□ Is there a documented rollback process with evidence it works?
□ Are backup models/approaches identified in case primary vendor makes dramatic changes?
Key Questions to Ask
-
If your AI vendor changes their model tomorrow, how will you know and what will you do?
-
How do users understand why the AI gave them a particular result?
-
What metrics are you watching to detect if performance degrades, and who's responsible for responding?
-
What's your backup plan if this vendor raises prices 10× or shuts down?
-
Have you tested your rollback process—does it actually work?
Apply the Cross-Cutting Lenses
After evaluating the core criteria above, apply these two additional lenses to assess equity outcomes and evidence quality.
Equity & Safety Check
When evaluating Change Resilience through the equity and safety lens, assess whether system changes could disproportionately harm certain user groups.
Gate Assessment:
🟢 CONTINUE: Equity maintained across model updates, testing with diverse users before changes
🟡 ADJUST: Monitoring exists, drift detected with mitigation in progress
🔴 STOP: No monitoring by subgroup, or detected drift harming vulnerable users with no response
Check for:
□ Are performance changes tracked separately by subgroup (to catch if updates help some but hurt others)?
□ Could model drift affect accuracy differently for users with different language, dialect, or cultural context?
□ Are there rollback triggers if equity gaps widen after system changes?
□ Is there a named owner monitoring for disparate impact when models/prompts update?
□ Do users have a way to report when "the AI started acting differently" (feedback loop)?
□ Are safety incidents tracked by subgroup to detect if changes increase harm to vulnerable populations?
□ Is there a process to test changes with diverse users before full deployment?
Evidence & Uncertainty Check
When evaluating Change Resilience through the evidence and uncertainty lens, assess whether they can detect and respond to performance degradation with evidence.
Quality Grade:
🅰️ A (Strong): Version control implemented, continuous monitoring with alerts, shadow testing before changes, documented rollback capability
🅱️ B (Moderate): Some monitoring, plan for handling changes, testing approach defined
🅲 C (Weak): No version tracking, no change detection, dependent on vendor with no protections - high drift risk
Check for:
□ Is production tied to specific model/prompt versions with documented change logs?
□ Are performance metrics tracked continuously (not just at launch)?
□ Are there defined thresholds that trigger alerts when performance degrades?
□ Do they use shadow deployments or A/B tests before rolling out changes?
□ Is there baseline data to compare performance before/after changes (with statistical tests)?
□ Do they acknowledge uncertainty in how vendor changes might affect their system?
□ Are contractual protections in place (advance notice of changes, access to prior versions)?
□ Is there a documented rollback process with evidence it works?
