How Supernova Powers India's AI Spoken English Tutor with Sarvam
"(Customer quote: to be updated. A two- to three-paragraph quote from a Supernova spokesperson covering the macro problem they set out to solve, why Sarvam was the right partner, and the outcome they have delivered for learners would mirror the structure of similar Sarvam case studies.)"
(Spokesperson name)
Title, , Supernova
Introduction
Across India, millions of learners are trying to build spoken English fluency on their phones, often late at night, often alone, and almost always in markets where the price of a single lesson matters. For an AI tutor to work in this setting, the voice layer underneath has to do something global ASR has never had to do. It has to understand an Indian accent, follow a sentence that drifts between English and a regional language, and respond in real time at a cost structure that holds up across price-sensitive geographies. For Supernova, the answer to all three arrived in the form of an AI tutor, Nova AI, built on Sarvam.
Background
Supernova is a mobile app offering a 24/7 AI tutor, Nova AI, for spoken English. The app delivers lessons that adapt to the learner's native language and supports multiple Indian languages including Hindi, Tamil, Telugu, and Bengali. The product is designed for quick fluency, building the kind of conversational confidence that comes only from being able to speak freely, make mistakes, and be understood.
The audience Supernova serves is largely first-time English speakers across price-sensitive markets in India, where the bar for product quality is no lower than anywhere else. For a learner in these markets, an AI tutor has to feel like a real conversation, in their language, every time they open the app.
The Problem
Supernova's earlier voice stack was holding the product back in three concrete ways, each with a direct line to the learner experience.
Accuracy on Indian languages, accents, and code-mixed speech was low. Learners were repeating themselves and being misheard mid-sentence, which is the fastest way to make someone give up on a language they are already nervous about speaking. Throughput was variable, which meant latency spiked unpredictably and conversations with Miss Nova felt robotic rather than responsive. Transcription costs were also high enough that scaling further into the price-sensitive regions Supernova was built for was becoming financially difficult.
The priority was to consolidate the voice layer on a single foundation that could:
- Transcribe speech accurately across Indian languages, accents, and code-mixed sentences
- Support natural translation when learners switch to their mother tongue to ask a doubt
- Deliver consistent, low-latency responses so conversations feel real rather than stilted
- Operate at unit economics that hold up across price-sensitive markets
Why Sarvam
(Supernova's detailed evaluation process, including the alternatives considered and the specific factors that tipped the decision toward Sarvam, would sit here: to be updated.)
Three things stood out across the comparison. The first was accuracy that was genuinely tuned for Indian languages, with a roughly 19% Word Error Rate across 11 Indian languages. The second was latency that was consistent enough to support real-time conversation, at around 300ms average response time for short utterances. The third was pricing that came in roughly 35% lower than OpenAI Whisper, which directly improved unit economics in the price-sensitive markets Supernova was built for. The combination of all three, rather than any one factor in isolation, made Sarvam the right foundation for the voice layer Miss Nova runs on.
Solution
Supernova consolidated its entire voice layer on Sarvam. Every learner utterance is routed through Sarvam's speech-to-text API in translate mode. English practice gets transcribed as is, and questions asked in Hindi, Tamil, Telugu, or Bengali are translated into English inside the same call. Whether the learner stays in English or switches mid-sentence into another language, it is all handled in a single API call. That is how a real conversation works in India, and now it is how Miss Nova works as well.
The fastest signal that the integration was the right call came in the very first week. Within seven days of first contact, Supernova's workloads had shifted to Sarvam's APIs and were scaling smoothly. Within two weeks of going live, Supernova had integrated with the speech-to-text endpoint in the translate mode, making translation central to the product rather than an afterthought. That decision turned out to be one of the defining design choices of the platform.
{Example}
(A more specific learner moment, ideally drawn from real product feedback, would further strengthen this section: to be updated.)
Impact
- Accuracy: Roughly 19% Word Error Rate across 11 Indian languages, with transcription error rates down 50% versus Supernova's earlier voice stack.
- Latency: Average response time of approximately 300ms for short utterances and sentences. End-to-end latency was cut in half versus the earlier stack, making conversations with Miss Nova feel responsive rather than robotic.
- Cost: Transcription costs came in roughly 35% lower than other competitors who were evaluated, directly improving unit economics in price-sensitive markets.
- Scale: Since its deployment, Sarvam's speech-to-text capability has demonstrated sustained, large-scale adoption, growing from 236,107 minutes transcribed in May 2025 to over 3.4 million minutes in April 2026, a 14x increase in under 12 months. Month-on-month growth peaked at 84% in June 2025, with multiple subsequent months sustaining 40% or higher growth. By early 2026, the platform was consistently transcribing over 1.8 million minutes per month, crossing 3.4 million minutes in both March and April 2026, reflecting the depth and scale at which Sarvam's STT is being used in production.
- Growth trajectory: The growth has come in distinct phases. Between May and July 2025, the platform established a baseline at around 13.6 million STT calls. Between August and October 2025, volume rose 236% to roughly 45.7 million calls as new product capabilities scaled. Between November 2025 and January 2026, the platform stabilised, with volume rising a more measured 17% to around 53.4 million calls. Between February and April 2026, volume rose another 85% to approximately 98.9 million calls before settling into a stable, high-volume run rate.
- Learner retention: (Quantitative impact on learner retention, including any uplift in lesson completion, repeat usage, or week-on-week active learners following the move to Sarvam, would sit here: to be updated.)
Looking ahead
(Supernova's forward-looking plans for its use of Sarvam, including any new languages, dialects, endpoints such as streaming or diarization, and any additional product features being explored, would sit here: to be updated.)
The partnership has already demonstrated that voice AI for Indian language learners can work at scale, at the cost structure required to reach price-sensitive markets, and at the conversational quality needed for a tutor that learners actually want to come back to. As Supernova continues to expand the reach of Miss Nova, and as Sarvam continues to advance its multilingual voice capabilities, the collaboration is set to play an increasingly meaningful role in how English fluency is built in India, one conversation at a time.