Most AI bills have 60% waste. We find it.
A 1–2 week assessment that decomposes your AI spend, sizes every optimization lever, and hands back a 90-day plan. Typical finding: 55–75% reduction at constant volume.
$10,000–$20,000 One to two weeks · T&M with cap
A spend-decomposition treemap showing today's AI bill (one large block: "Frontier model, default routing") shrinking into a smaller block ("Right-sized model + cached + batched") with the difference labeled "$X recovered." Designed to be screenshotted by a CFO.You shipped AI. Finance is asking what it costs at 10x. Your team is guessing.
You shipped an AI feature. It works. The bill arrived, and now finance is asking “what does this cost at 10x?” — and the team that built it is guessing.
The honest answer is that most production AI workloads in 2026 leave 60–80% of their bill on the floor. Teams tuned a prompt to hit a quality bar on a single frontier model and never built the operational layer underneath. No prompt caching (or caching with a broken cache key). No tier routing (every request hits the most expensive model regardless of complexity). No batch API for the workloads that don’t need real-time. No vendor-mix evaluation — you’re paying retail to one vendor when 80% of traffic could go elsewhere for 30% of the cost.
Vendor-side optimization advisors won’t fix this. They optimize within their own product — and the largest single lever in BST’s playbook is often moving 80% of traffic away from the vendor recommending the optimization. You need a vendor-neutral read.
The math is small enough relative to the savings that visible pricing closes faster than any “request quote” gate would. $10K to find $400K/year of waste is a defensible line item on any CFO’s slide deck.
What you get
-
Cost baseline report
Current monthly spend decomposed by workload, model, vendor, and feature. The Pareto: typically 80% of cost concentrates in 2–3 workloads.
-
Optimization roadmap
Per workload, the levers that apply (routing, caching, batching, vendor mix, self-hosting) ranked by projected impact. Each lever sized in dollars, not hand-waves.
-
ROI calculator (yours to keep)
Populated spreadsheet projecting current vs. optimized cost at 0.5x / 1x / 2x volume. Includes payback period, sensitivity, and 12-month savings.
-
90-day implementation plan
Sequenced week-by-week. What ships first (always: caching + routing). What needs architectural change. What requires vendor migration. What needs a kill-switch first.
-
Walkthrough call
Engineering and finance both attend; we walk every assumption. Sensitivity ranges and decision points called out explicitly.
How it works
-
Workload mapping
Days 1–2Deliverable Spend decomposed; workload boundaries identified
We read your bills and your code. Spend decomposed; workload boundaries identified. The Pareto comes out of this phase — typically 80% of cost concentrates in 2–3 workloads.
-
Lever analysis
Days 3–4Deliverable Routing, caching, batching, vendor-mix evaluations
Routing, caching, batching, vendor-mix evaluations. Cache hit rates measured against your actual traffic, not a benchmark. The most common finding among "we already cache" clients is a cache-hit-rate problem — usually a variable in the prefix invalidating cache on every request.
-
Modeling & roadmap
Days 5–6Deliverable ROI projection, sensitivity analysis, 90-day plan
ROI projection, sensitivity analysis, 90-day plan. Every lever sized in dollars with explicit assumptions. If the assumptions hold and the levers are pulled correctly, the math is the math.
-
Delivery
Day 7Deliverable Walkthrough + handoff of all four artifacts
Walkthrough call with engineering and finance. Handoff of cost baseline, optimization roadmap, ROI calculator, and 90-day plan. Multi-vendor profiles extend through week 2.
Pricing
Engagement model Tiered scope
Compact tier (single workload, single vendor, < $20K/mo) lands at $10K / 1 week. Standard tier (2–4 workloads, single vendor, $20K–$100K/mo) lands at $14K / 1.5 weeks. Multi-vendor tier (mixed-vendor, > $100K/mo) lands at $18K–$20K / 2 weeks. Time-and-materials with a not-to-exceed cap. If we land in 4 days, you pay for 4 days. Decline criterion: if your AI spend is below $5K/month, we'll tell you the engineering effort exceeds the savings.
Anchor pricing reflects typical engagement ranges. Actual fees are scoped per engagement under time-and-materials with a not-to-exceed cap. Pricing shown does not constitute a binding offer.
Frequently asked questions
How long does this take?
What does this cost?
- Compact tier: $10K / 1 week (single workload, single vendor, < $20K/mo)
- Standard tier: $14K / 1.5 weeks (2–4 workloads, single vendor, $20K–$100K/mo)
- Multi-vendor tier: $18K–$20K / 2 weeks (mixed-vendor, > $100K/mo)
Why not a fixed price?
We're already on Vercel AI Gateway / OpenRouter / LiteLLM. Does that change anything?
We are regulated (HIPAA, SOC 2, ITAR). Can you do this?
What if your projected savings don't materialize?
Will this find anything if we already cache?
- No tier routing — every request hits the most expensive model
- Batch API unused for non-realtime workloads
- No vendor-mix evaluation — paying retail to one vendor when 80% of traffic could go elsewhere
What if our spend is below $5K/month?
Often combined with
-
AI Security Review
The wedge pair. Cost + Security under one MSA is the most common BST entry-point combination. Different findings, complementary fix paths.
-
Healthcare AI Compliance Review
When the workloads touch PHI. Cost optimization with a BAA-aware substrate evaluation.
Ready to know what your AI bill could be?
One to two weeks. Vendor-neutral. ROI calculator yours to keep. Scoping call books a 30-minute slot with a principal.