Prioritizing AI use cases for operators

Q: How many use cases should we evaluate at once?

Six to ten is the sweet spot. Fewer misses patterns; more dilutes attention and invites framework gaming where candidates are added to create favorable comparisons for a preferred project. Trim aggressively during discovery.

Q: Should we prioritize quick wins or strategically important projects first?

False choice when the framework is working. Quick wins that score low on value or fit are wasted slots. Strategic projects that score low on feasibility are the inverse failure. The matrix forces you to recognize both as the same mistake: ignoring one of the four signals.

Q: What happens if the top-scored project fails?

The sequencing step factors in organizational readiness, so the top project is unlikely to fail on non-technical grounds. If it fails technically, the four-signal scoring on the remaining candidates still stands — you have an ordered bench of alternatives ready.

Q: Can we run this ourselves without hiring a consultant?

Yes. The framework is public. Outside help adds value when the team lacks someone technical enough to score the feasibility dimension credibly. If you can credibly answer "could we ship this in eight weeks" in-house, run the framework yourselves.

Why most AI use-case lists never get built

Most teams do not have an AI problem. They have a prioritization problem dressed up in new vocabulary. The list of candidate projects grows — a ticket-triage agent, a RAG system on the knowledge base, a forecasting model, an internal copilot — and none of them ship, because nobody inside the company has a trusted way to decide which one goes first. Engineering capacity gets taxed not by the work, but by the indecision.

Call it the prioritization tax: the cost of carrying ten half-scoped AI ideas through quarterly reviews while shipping none of them. It compounds. Each quarter the list gets longer. Each quarter the team gets more tired of talking about AI and less interested in actually building it. The failure mode is not technical. The failure mode is organizational — a team that cannot decide between options because no option has been scored against a consistent rubric.

The fix is not another brainstorm. Another brainstorm is how you got here. The fix is a scoring framework that anyone in the room can follow, argue with, and reach the same ranking from independently. The point is not to be objective — no framework is objective — but to be legible: you can read the score, see why each candidate ranked where it did, and change an input if you disagree. Opinions become arguments about weights instead of arguments about gut feel.

This guide walks the framework we use on engagements. It works whether you have four candidates or forty, whether your team has shipped AI before or never. It does not replace judgment. It forces judgment into a place where you can see it.

The four signals that predict AI project success

Four inputs predict whether an AI project ships and produces value. They are boring inputs. That is the point. Novelty is not one of them.

Value to the business. Revenue it generates, cost it removes, risk it mitigates, or some combination. Measured in the units your CFO uses — dollars, hours saved at a specific hourly rate, SLA breaches avoided. Not “transformation” or “strategic alignment.” If you cannot express the value in a sentence that names a number and a timeframe, the candidate is not ready for scoring — it is ready for more discovery.

Data readiness. Does the data required to make the system work exist, in a queryable location, with sufficient quality and coverage? This is the signal most teams underweight, because it is unsexy and it kills the projects the loudest voices want to build. A brilliant retrieval system on top of a ticket database that is 40% empty fields will produce brilliant answers about the 60% of reality it can see. Score data readiness honestly or score the project too high.

Feasibility given the current team. Not “feasibility in general.” Can the people you have, in the time available, ship a working version of this system? A project that is technically easy but requires skills your team does not have and you cannot hire in time is not feasible. A project that requires building a novel agentic workflow is only feasible if someone on the team has shipped one before or is genuinely willing to fail in public while learning.

Strategic fit with what ships next. Does this project support the next six months of roadmap, or does it sit orthogonal to it? A well-scored AI project that has no business connection to the rest of the product is not a wrong project — it is a project for a different company at a different time. The fit dimension keeps the scoring anchored to the business, not to the enthusiasm of whoever advocated loudest.

These four signals are not exhaustive. They are the minimum. Teams that score well on all four tend to ship. Teams that score well on one or two and treat the others as “details” tend to produce demo footage and not much else.

A scoring framework for ranking your AI candidates

The framework is a matrix. Each candidate project gets a row. Each of the four signals gets a column. Each cell gets a score from 1 to 5, with 3 as “acceptable, not exceptional.” A weight column at the top lets you calibrate — if data readiness is the thing your team keeps underestimating, weight it 2× and let the math push that mistake out of your shortlist.

Here is how it runs in practice. We use a real engagement to make it concrete: Acme, a B2B SaaS company whose customer success team was spending two days of manual research on every new enterprise onboarding. The candidate list had six entries. We will walk through two of them.

Candidate 1 — Ticket-triage agent. Value: 3. The triage team was not the bottleneck; it was the research. Cost savings existed, but they were not the constraint. Data readiness: 4. Ticket data was clean and queryable. Feasibility: 4. Standard LLM+routing architecture. Strategic fit: 2. Would not unblock the next six months of onboarding work. Weighted total: 13.

Candidate 2 — Onboarding research retrieval (the eventual Acme RAG platform). Value: 5. Two-day research cycles were a direct revenue bottleneck; CS team was the constraint on expansion rollout. Data readiness: 4. Prior tickets, contract terms, integration history, product usage — all queryable across four systems with some plumbing. Feasibility: 4. Retrieval architecture was known; the team had a strong backend engineer willing to own it. Strategic fit: 5. Mapped directly to the “expansion revenue unlock” roadmap theme for the quarter. Weighted total: 18.

The matrix does not make the decision. It produces a ranked list that makes the decision obvious when the gap is wide, and forces a real conversation when the gap is narrow. Projects within 2 points of each other are genuinely close calls — the next step for those is not more scoring, it is field research on whichever dimension is muddiest.

The output of this phase is a ranked list. Not a prioritized list yet — ranked. Prioritization factors in the sequencing considerations in the next section. But the ranking is the legible artifact: six (or ten, or twenty) candidate projects with four scores each and a weighted total, in an order anyone on the team can defend.

The framework also informs how we weight projects for ROI measurement later — once a project ships, the same four signals become the axes against which you measure whether the investment paid off. Scoring them poorly up front makes the ROI question impossible to answer cleanly later.

Sequencing the shortlist: which project becomes the first build

Ranking is not sequencing. The top-scored project is not automatically the first one to build. Sequencing factors in three additional inputs the scoring matrix does not capture.

Dependencies. If the top-ranked project requires infrastructure or data pipelines the team has not built yet, and project rank three can ship with what exists today, rank three ships first. The dependency work happens in parallel, and the top-ranked project moves into the second slot. This is a sequencing rule, not a re-ranking — the top project is still the top project, it just ships second.

Organizational readiness. How much change is the team absorbing right now? If customer success just rolled out a new CRM in the same quarter, dropping a retrieval system into their workflow immediately is asking for a rejection regardless of how well the system works. “Good project, wrong month” is a real failure mode. Sequence around it.

The load-bearing criterion. The first AI project the organization ships sets the prior for every AI project after it. If the first project is technically correct but produces no visible business outcome — the dashboard works, the metric moves 2%, nobody cares — the second project starts with a headwind. The first build should be load-bearing: when it ships, someone at the exec level can point to a specific thing that got materially better. Pick a project that can clear this bar.

The Northwind engagement is a useful illustration of this. The top-scored project on the ranked list was an ambitious multi-step operations agent. The project that ended up shipping first — an internal copilot that operations actually used — was ranked second. The difference: Northwind had two prior failed AI pilots, so the sequencing criterion shifted. What the team needed first was a visible win, not the most sophisticated system. The agent stayed on the roadmap; the copilot shipped. Six weeks later, the copilot had 82% weekly active adoption and the room had changed its mind about AI. Then the agent became buildable.

Once the first project is sequenced, the next decision is whether to build it or buy it — which is the subject of a separate pillar, because the scoring that picked the project does not automatically tell you whether the right answer is to ship in-house, pay for a SaaS solution, or stitch both together.

Where operators get stuck: four common prioritization mistakes

Even with the framework, the same four mistakes recur across engagements. Name them up front so they do not derail the scoring round.

Over-weighting novelty. The project that sounds the coolest at the exec readout is rarely the project that produces the most value. Novelty is not a signal — it is a bias. If a candidate scores high on value, data readiness, feasibility, and fit but also happens to be unexciting, it is probably the right first build. Unexciting projects that ship and move metrics are how teams earn the right to build the exciting ones later.

Under-weighting data readiness. This is the most common failure mode, and it is specifically dangerous because it is invisible until the project is already staffed. A team scopes a brilliant system, assigns an engineer, runs for four weeks, and discovers the input data is 40% stale and riddled with legacy edge cases. By that point, sunk cost makes the project hard to kill. The fix is to score data readiness harshly — when in doubt, score it one lower. If the candidate still ranks high, trust the ranking.

Letting the loudest stakeholder anchor the list. The candidate list is supposed to be the product of systematic discovery. It often ends up being the list of projects the CTO, the VP of Product, or the enthusiastic AI champion is personally excited about. Those candidates are not wrong to include — but the list should also contain the projects nobody is excited about that would actually move the business. If every candidate on the list was proposed by the same person, the discovery phase failed.

Treating the first AI project as a research investment. “Let’s learn by building something small and see what happens” is a strategic frame for teams with slack and a long runway. Most companies do not have that. The first AI project is not a research budget — it is a commitment to production. If you go in willing to ship something that nobody uses, you will ship something nobody uses. Scope the first project to produce a real outcome, or scope something else.

How we run AI use-case prioritization engagements

Most of what is above is public. The framework is legible on purpose — we would rather a team run it well themselves than hire us to run it badly externally. The reason teams still engage us on prioritization is narrower than “we have a framework”: the scoring dimension that most often breaks in-house is feasibility, and scoring feasibility credibly requires someone who has shipped the kind of system being evaluated.

Our prioritization engagements run two to three weeks. The first week is discovery — we interview the people who would own each candidate project, we read the data the project would run on, and we kill any candidate where the feasibility or readiness signal is too weak to continue. By the end of week one, the list is typically half its original size. Week two runs the scoring, produces the ranked list, and walks the team through the sequencing considerations specific to their organization. Week three, if needed, writes the build plan for the first project — not a strategy document, a concrete engineering plan with architecture, dependencies, owners, and a four-to-eight-week delivery target.

The output is a ranked and sequenced plan with a clear first deliverable. The team owns it. If we also build it — that decision runs through the build-vs-buy framing in a separate conversation. The prioritization work stands on its own.

If you have ten AI ideas on a roadmap and no legible way to decide which one ships first, book a 30-minute assessment. We will walk through the four signals against your top candidates on the call. If the framework is enough, you will leave with a head start and no invoice. If the feasibility dimension is the sticking point, we will have a productive conversation about what an engagement looks like.

FAQ

How many use cases should we evaluate at once? Six to ten is the sweet spot. Fewer misses patterns — a matrix with three rows is just a short list with weights on it. More dilutes attention and invites framework gaming, where candidates are added to create favorable comparisons for a preferred project. Trim aggressively during discovery.

What if our data isn’t ready for any of the top candidates? Data readiness is one of the four signals specifically because this is the default state. Ship the highest-value project you CAN execute on today, plan the data work as a parallel track, and revisit the ranking in a quarter. Waiting until the data is perfect is how teams spend two years not shipping AI.

Should we prioritize quick wins or strategically important projects first? This is a false choice when the framework is working. Quick wins that score low on value or fit are wasted slots — they ship, they produce no meaningful outcome, and they consume the slot a real project could have used. Strategic projects that score low on feasibility are the inverse failure. The matrix forces you to recognize both patterns as the same mistake: ignoring one of the four signals.

How long does a prioritization engagement take? Two to three weeks with outside help for six to ten candidates, start to ranked + sequenced plan. Longer than a workshop, shorter than a strategy consulting engagement — deliberately.

What happens if the top-scored project fails? The sequencing step factors in organizational readiness so the top-scored project is unlikely to fail on non-technical grounds. If it fails technically, the score from the remaining candidates still informs the next pick. The ranking produces a bench, not a single bet.

Can we run this ourselves without hiring a consultant? Yes. The framework is public, this pillar is public, and we would rather you run it well in-house than hire us to run it badly externally. Where we add value is when the team doesn’t have someone technical enough to score the feasibility dimension credibly.