Executive Summary
Most generative AI programs fail before deployment, not because the model is weak, but because the organization has not validated workflow fit, data quality, governance, or operating readiness. A readiness assessment should answer one business question with technical evidence: can we move from experimentation to repeatable production value?
NIST frames AI risk management around four operating functions: Govern, Map, Measure, and Manage. [1][2] Its Generative AI Profile extends that foundation for generative workloads, explicitly focusing on the distinct risks of GenAI systems and the actions organizations should take to manage them. [1][3] That is a useful structure because it prevents the common mistake of treating a GenAI pilot as only a model-selection exercise.
For executives, the assessment should produce an investment decision, a prioritized use-case portfolio, and a risk posture. For engineers, it should produce architectural constraints, evaluation criteria, integration requirements, security controls, and operational ownership.
What a Readiness Assessment Should Actually Prove
A serious readiness assessment does not ask whether employees like a chatbot demo. It asks whether a target workflow can be improved with acceptable risk, cost, latency, and operational complexity.
The assessment should prove six things:
- There is a measurable business outcome worth improving.
- The workflow contains reasoning or content-generation steps that are suitable for probabilistic systems.
- The organization can ground the model with current, authorized enterprise context.
- The platform can enforce security, observability, evaluation, and rollback.
- Human decision rights remain explicit where the cost of error is high.
- A team exists that can own the system after the pilot ends.
If any of those remain unresolved, the right decision is usually not "launch later." It is "narrow the scope until the system becomes governable."
The Seven Assessment Domains
1. Business Value and Workflow Fit
Start with a workflow, not with a model. Good candidates usually contain one or more of the following:
- High cognitive load and low physical complexity
- Repetitive knowledge retrieval or synthesis
- Large documentation surfaces
- Many handoffs caused by missing context
- Long cycle times caused by drafting, review, or triage
Poor candidates usually involve hard real-time control, strict deterministic correctness, or decisions that are regulated but currently undocumented.
The first deliverable should be a ranked use-case list with clear baseline metrics such as cycle time, cost per transaction, first-response time, escalation rate, analyst hours saved, or content throughput.
2. Process Design and Human Work Allocation
NIST notes that AI risk management requires a broad set of actors across the AI lifecycle and that AI risks differ from traditional software risks. [2] In practice, that means the assessment must examine who reviews outputs, who overrides them, who owns failure, and what happens when the model is uncertain.
For each workflow, define:
- Which steps remain human-owned
- Which steps can be model-assisted
- Which steps can be model-executed under policy
- What confidence or business rules trigger escalation
- What evidence must be stored for audit or review
This step matters because many failed pilots automate text generation but leave the approval path unchanged, which adds cost without reducing cycle time.
3. Data and Knowledge Readiness
Generative AI systems are only as useful as the context you can safely provide them. Data readiness is not just a vector database question. It includes:
- Source system quality
- Access controls and entitlements
- Document freshness and duplication
- Metadata quality
- Content segmentation strategy
- PII, PHI, IP, and regulated content handling
- Citation and provenance expectations
Ask the engineering team to prove that the model can retrieve the right context for at least twenty representative tasks. If retrieval quality is weak, the problem is usually content architecture, metadata, or permissions, not prompt wording.
4. Model and Evaluation Readiness
The NIST AI RMF emphasizes measurement as a core function. [2] In a GenAI program, that means you need an evaluation system before you need a scaling plan.
Assessment questions should include:
- What constitutes a good answer for this workflow?
- Can quality be judged with deterministic tests, human review rubrics, or model-based graders?
- How will hallucination, omission, policy violation, and unsafe tool use be detected?
- What is the acceptable error budget for the workflow?
- How will regression be detected when prompts, retrieval logic, or models change?
A lightweight but real evaluation harness should include a gold set of representative tasks, expected answer characteristics, rubrics for groundedness, completeness, policy compliance, and pass-fail thresholds tied to business impact.
5. Platform and Integration Readiness
This is the domain where platform engineers, DevOps, and AI engineers usually uncover the real blockers. A production-capable GenAI platform needs more than an API key. Minimum questions:
- How will identity propagate from user to model call?
- Where will prompts, policies, and tool schemas be versioned?
- How will secrets be managed?
- What logging is permitted and where will redaction occur?
- What are the latency budgets for retrieval, model inference, tool execution, and post-processing?
- How will rate limits, retries, fallbacks, and circuit breakers be handled?
- How will environments differ across development, test, and production?
6. Risk, Security, and Compliance Readiness
NIST states that the AI RMF is intended to help organizations incorporate trustworthiness considerations into the design, development, use, and evaluation of AI systems. [1] Translate that into concrete controls:
- Prompt injection and indirect instruction handling
- Data leakage prevention and Tenant isolation
- Output moderation and policy screening
- Role-based tool access
- Human approval gates for consequential actions
- Retention and deletion policies
- Incident response ownership
7. Operating Model and Skills Readiness
Many organizations have enough budget for a pilot but no team for production. You need named owners for:
- Use-case portfolio prioritization
- Prompt and workflow design
- Retrieval and knowledge quality
- Evaluation and release approval
- Platform reliability and cost control
- Responsible AI governance
- End-user training and feedback capture
A Practical Assessment Sequence
Use a four-stage sequence.
Stage 1: Frame the Decision
Define the business outcome, workflow boundary, risk tolerance, and the decision you want to make at the end of the assessment. Example decisions: "Proceed to a production pilot for customer-support knowledge drafting."
Stage 2: Baseline the Current Workflow
Capture the current-state process in enough detail to measure improvement later: Trigger, Inputs, Systems touched, Human roles, Decision points, Outputs, Failure points, and Metrics. Do not skip this step.
Stage 3: Build a Narrow Working Prototype
The prototype should test one bounded workflow with real enterprise context and realistic permissions. It should include retrieval, prompt logic, an evaluation loop, logging, and a simple human-review step.
Stage 4: Score and Decide
Score each use case across the seven domains using a simple rubric such as Green, Yellow, or Red.
| Domain | Green looks like | Yellow looks like | Red looks like |
|---|---|---|---|
| Business value | Clear KPI and sponsor | Useful but not tied to metric | Novelty project |
| Process fit | Clear assistive or agentic boundary | Human workflow unclear | Workflow unsuitable |
| Data | Authorized, fresh, structured enough | Retrieval possible with cleanup | Data access or quality broken |
| Evaluation | Gold set and thresholds exist | Rubric exists but weak coverage | Demo-only judgment |
| Platform | Secure path to production is known | Some controls missing | No viable production path |
| Risk | Risks documented with controls | Controls partial | Unknown or unowned risks |
| Operating model | Named owners and budget | Temporary staffing only | No durable ownership |
The output should be one of four decisions: Stop, Re-scope, Pilot, or Scale.
Common Failure Modes
The most common readiness mistakes are predictable:
- Starting with a broad enterprise chatbot instead of a specific workflow
- Confusing model quality with system quality
- Ignoring retrieval and permissions until late in the project
- Running pilots without evaluation baselines
- Treating governance as an approval step instead of a design input
- Assuming human review fixes a broken process automatically
- Funding experimentation without funding operational ownership
What the Final Assessment Package Should Contain
- Prioritized use-case portfolio
- Current-state and target-state workflow map
- Readiness scorecard across the seven domains
- Technical architecture sketch
- Evaluation plan and sample benchmark set
- Risk register with owners
- Cost and latency assumptions
- Recommendation: stop, re-scope, pilot, or scale
Closing View
The real purpose of a readiness assessment is not to prove that generative AI is exciting. It is to prove that a specific business outcome can be improved with controlled risk and repeatable operations. If you do that rigorously, feasibility turns into functionality. If you skip it, functionality turns back into theater.
That sequencing matters. If you have read "Architecting AI-First Business Processes: Technical Considerations in Generative AI Strategy," you should already know that workflow redesign only becomes credible after readiness work has identified which use cases are viable, governable, and measurable. [4]
References
[1] NIST, "AI Risk Management Framework," https://www.nist.gov/itl/ai-risk-management-framework
[2] NIST AI RMF Knowledge Base, "AI Risk Management Framework," https://airc.nist.gov/AI_RMF_Knowledge_Base/AI_RMF
[3] NIST, "Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile," https://doi.org/10.6028/NIST.AI.600-1
[4] Series Part 2, "Architecting AI-First Business Processes: Technical Considerations in Generative AI Strategy," ./ai-first-business-processes-strategy.md
[5] Series Part 3, "Choosing the Right Generative AI Framework: A Technical Comparison of OpenAI, Anthropic, and Stability AI APIs," ./generative-ai-framework-comparison-openai-anthropic-stability.md


