Case StudyApril 30, 2025·9 min read

From ₹39L to ₹23L/Month:
How a Series B AI Startup Cut
Their LLM Bill 41% in Two Weeks

Zero feature changes. Zero model quality degradation. Zero additional engineers. Just attribution data and two weeks of surgical fixes. Here is the full breakdown.

About this case study

This is a real TokenFin private beta engagement. The company name and identifiable details have been anonymised at their request. All numbers are real and unrounded. The CFO reviewed this post before publication.

The situation on day 1

45 people

Team size

Series B, B2B SaaS

₹39.2L

Monthly LLM spend

~$47,000 USD

Attribution visibility

No breakdown by feature

The product is a B2B SaaS tool that uses LLMs across three distinct surfaces: document analysis for enterprise customers, a customer support chatbot, and an internal operations layer for their own team.

Their LLM bill had grown 4.3× in 8 months — from ₹9.1L/month at launch to ₹39.2L/month. The CEO was tracking revenue growth (healthy). The CFO was tracking cost growth (alarming). The gap between the two was closing.

When we asked their CTO to explain where the ₹39.2L was going, the honest answer was: "GPT-4o, mostly. Some Anthropic. Our document feature is the biggest driver, we think."

"We think" is not a FinOps posture.

Day 1: Instrumentation — 20 minutes

We added TokenFin instrumentation to every LLM call in their codebase. The team had 3 developers on a call. It took 20 minutes. Every existing call path was wrapped with attribution metadata:

// Before: no attribution

const result = await openai.chat.completions.create(({model, messages})

// After: 3 attribution tags added

const result = await track((openai.chat.completions.create(({model, messages})),

{ feature: 'document-analysis', env: process.env.NODE_ENV,

team: 'product' }

)

Within 48 hours of instrumentation, the attribution dashboard had real data on every call path. What it showed was not what anyone expected.

The 4 findings (and how much each cost)

FINDING 01

Dev and staging were running production models

₹10.9L

monthly waste

The attribution data showed that 28% of total API spend was tagged with env: development or env: staging. These environments were calling gpt-4o — the most expensive model — with no rate limiting.

Cost breakdown: dev/staging vs production (before)

Production (revenue-generating)₹28.2L (72%)

Dev + staging (no revenue)₹11.0L (28%)

// The fix: 4 lines of config

const model =

process.env.NODE_ENV === 'production'

? 'gpt-4o'

: 'gpt-4o-mini' // 16.7× cheaper

GPT-4o-mini is 16.7× cheaper than GPT-4o on input tokens. Developers testing against it in dev/staging get functionally identical results for 95% of use cases. Saving: ₹10.9L/month.

FINDING 02

A batch pipeline had been silently retrying for 11 weeks

₹4.7L

monthly waste

The document analysis pipeline had a JSON parsing bug. When the LLM returned malformed JSON (roughly 12% of calls due to context truncation), the retry logic fired immediately — up to 5 times — with the full context window each time. Each retry cost the same as the original call. The bug had been live for 11 weeks before anyone noticed, because nobody was looking at call volume vs expected call volume.

Pipeline retry analysis

Documents processed / day2,840

Parse failure rate12.3%

Avg retries per failure3.8×

Effective API calls / day4,174 (not 2,840)

Wasted calls / day1,334

Monthly wasted spend₹4.7L

Fixes: (1) Add explicit JSON mode to the API call to force valid JSON output. (2) Add exponential backoff with max 2 retries. (3) Log parse failure rate as a dashboard metric so it is visible. Result: parse failure rate dropped to 1.2%, retry cost dropped to near-zero.

FINDING 03

Internal ops tooling ran GPT-4o for formatting tasks

₹1.8L

monthly waste

Their ops team used an internal tool to reformat customer data exports — a task that requires GPT-3.5-level intelligence at best. The tool was configured to use gpt-4o because that was the company default. No engineer had explicitly chosen it — it was inherited from a config file written in month 1. Downgrading this tool to gpt-4o-mini: zero functional change, ₹1.8L/month saving.

FINDING 04

Customer support was already efficient — leave it alone

₹0

no change needed

The chatbot, which was the highest-volume feature, was already well-optimised. Short system prompts (280 tokens). Conversation history truncated to last 6 turns. Response caching for repeated intents (covering ~31% of calls). This was their most cost-efficient feature at ₹0.0019/conversation. The right call: document it, set a budget alert, and leave it alone.

This is an important lesson. Attribution data does not just reveal waste — it also reveals what your team is doing right. Knowing that your support feature is efficient lets you invest confidently in scaling it.

The outcome — day 14

BEFORE

₹39.2L

→

AFTER

₹23.0L

SAVED

₹16.2L/mo

-41.3%

Finding	Monthly saving	Effort	Time to fix
Dev/staging model swap	₹10.9L	4 lines of config	1 hour
Retry loop fix + JSON mode	₹4.7L	~50 lines of code	1 day
Internal tooling downgrade	₹1.8L	1 config change	20 minutes
Support feature (no change)	—	—	—
Total	₹17.4L	~1.5 engineer-days	14 days to full effect

Note: actual saving is ₹16.2L due to some natural volume growth during the period. The optimisations themselves saved ₹17.4L.

What they built after

With attribution data live, they built three operational practices they now run permanently:

Monthly model review

Every month, pull the cost-per-feature-per-model report. Ask: has this feature's quality requirement changed? Has a cheaper model improved enough to handle it? The LLM market moves fast — what was true in January may not be true in July.

Budget alerts per feature

Set a spend threshold for each feature in TokenFin. If document-analysis exceeds ₹12L in a month, the on-call engineer gets a Slack alert. Not the CFO asking questions after the invoice — an alert when there is still time to act.

Cost review in every feature spec

Before any new LLM feature ships, the spec now includes: estimated calls/day, model choice justification, expected monthly cost, and whether dev/staging will use a cheaper model. Takes 10 minutes to write. Prevents months of unattributed spend.

What this actually means

₹16.2L/month saved is ₹1.94 crore per year. That is a mid-level engineer in India. It is 3 months of runway for a seed-stage startup.

None of these optimisations required a new engineering hire, a vendor change, or a product decision. They required visibility. The waste was always there. Attribution just made it visible.

Every AI team we onboard finds something in the first 48 hours. The question is not whether the waste exists. The question is how long you want to keep funding it.

Start your free TokenFin trial — attribution in 5 minutes →

← All posts Read: Why your OpenAI bill lies →

From ₹39L to ₹23L/Month:How a Series B AI Startup CutTheir LLM Bill 41% in Two Weeks

The situation on day 1

Day 1: Instrumentation — 20 minutes

The 4 findings (and how much each cost)

Dev and staging were running production models

A batch pipeline had been silently retrying for 11 weeks

Internal ops tooling ran GPT-4o for formatting tasks

Customer support was already efficient — leave it alone

The outcome — day 14

What they built after

From ₹39L to ₹23L/Month:
How a Series B AI Startup Cut
Their LLM Bill 41% in Two Weeks