← Back to blog
Case StudyMarch 1, 202610 min read1,154 words

How we built a Hinglish chatbot for 2 million users when no NLP model existed

The Sony MAX Filmykaant chatbot served 2 million unique users and 1 million conversations per day. Here is the full technical and product story of how we built it — including the Hinglish NLP problem nobody had solved.

Conversational AINLPCase StudyChatbotProduct Delivery
MK

Mahroof K

Senior Program & Product Manager · PMP®

In 2017, Sony MAX came to us with a brief that sounded deceptively simple: build a Facebook Messenger chatbot for Bollywood fans.

The chatbot — which would become Filmykaant, the "Ultimate Bollywood Fan" — needed to answer questions about films, run trivia quizzes, share TV schedules, and engage users with gamified content. It needed to feel natural, fun, and distinctly Bollywood.

What we did not fully appreciate at the start was that "natural" for the target audience meant Hinglish — and Hinglish was a language that no NLP model at the time could reliably understand.

This is the story of how we solved that problem, what we built, and what the numbers looked like at the end.

The Hinglish problem

Hinglish is Hindi written in English script. It is not a formal language with standardised grammar or spelling. It is how hundreds of millions of people in India actually type in informal digital contexts.

"Yaar kal Sony MAX par kya aayega?" — "Friend, what's coming on Sony MAX tomorrow?"

"Bhai Amitabh ki best movie kaunsi hai?" — "Brother, what's Amitabh's best movie?"

These are completely natural queries for Filmykaant's audience. And in 2017, no existing NLP platform — not Dialogflow (then API.ai), not IBM Watson, not Wit.ai — had meaningful Hinglish support.

You could not just retrain an existing model. The training data did not exist. The vocabulary, the mixed-script patterns, the colloquial abbreviations — none of it was in any available dataset.

We had two options. Build a rigid, keyword-based rule system and accept that the user experience would feel stilted. Or build the NLP capability ourselves from scratch.

We chose the harder path.

Building the NLP pipeline

The first step was data collection. We could not train a model without training data, and we could not create training data without first understanding the shape of the problem.

We ran structured data collection exercises: surveys asking fans to type questions they would naturally ask a Bollywood chatbot. We collected thousands of examples. We asked people from different regions, different age groups, different levels of Hindi fluency — because Hinglish is not one dialect, it is a spectrum.

From this corpus, we identified the key intent categories: film queries, actor queries, schedule queries, quiz participation, general greetings and smalltalk, and a long tail of everything else.

We then hand-labelled the training data. Every utterance got an intent tag and entity annotations (film title, actor name, broadcast date, etc.). This was slow, painstaking work. There was no shortcut.

The model we built used a combination of intent classification and entity extraction. Given the mixed-script nature of Hinglish, we paid particular attention to normalisation — handling phonetic variations, common abbreviations, and code-switching between Hindi and English mid-sentence.

The fallback problem

Any NLP system built on a closed set of intents will encounter queries it cannot handle. The question is not whether users will go off-script — they will, always — but what happens when they do.

We designed a robust fallback layer with three tiers:

Tier 1 (confident match): The intent classifier returns a high-confidence result. The bot responds with the appropriate content.

Tier 2 (low-confidence match): The classifier returns a result but below the confidence threshold. The bot responds with a clarifying question to confirm intent before proceeding.

Tier 3 (no match): The query does not match any intent reliably. The bot acknowledges gracefully, redirects to a menu of popular actions, and logs the query for training data review.

Tier 3 was not treated as a failure mode. It was treated as a data collection mechanism. Every unhandled query was reviewed and, where appropriate, added to the training dataset. The model improved continuously over the deployment period.

The product: Filmykaant persona

Technical capability is necessary but not sufficient. The character of the bot mattered as much as its ability to understand queries.

We named the bot Filmykaant — a play on filmy (film-obsessed) and a classic Bollywood name. The persona was the "Ultimate Bollywood Fan": enthusiastic, trivia-obsessed, slightly dramatic, and always up for a quiz.

The content included:

  • Film and actor information from a curated database
  • Real-time TV schedule integration with Sony MAX's broadcast feed
  • A Bollywood trivia quiz with a leaderboard and daily challenges
  • Gamified engagement mechanics: streaks, achievement badges, and competitive rankings

The quiz became the viral growth engine. Users challenged friends. Leaderboard positions were shared on social media. The daily quiz format created a reason to come back every day.

The numbers

At peak, Filmykaant handled:

  • 2 million+ unique users over the campaign period
  • 1 million+ conversations per day at peak traffic
  • 25,000+ daily active users at sustained peak

These were not metrics we engineered specifically for press releases. They were the natural outcome of a product that worked for its audience.

The platform was recognised by Facebook as a Top 10 Global Chatbot and was selected for the FBStart programme — Facebook's support programme for promising Messenger applications.

What we got wrong

No case study is complete without the hard parts.

We underestimated the operational burden of content freshness. The film and schedule content needed to be updated continuously. We had built the integration architecture, but the operational processes for keeping it current required more ongoing effort than we had scoped. This was a product design oversight — we should have built better tooling for the Sony MAX team to update content without engineering involvement.

We over-indexed on the quiz feature initially. The quiz drove engagement, but it also meant some users who came to the bot for practical schedule information felt the product was too game-like. We recalibrated the default experience over time, but it would have been better to get this right in the initial design.

The Tier 3 fallback review process was manual for too long. We reviewed unhandled queries by hand for the first few months. This was valuable for model improvement but did not scale. Building a semi-automated review pipeline earlier would have accelerated the model's improvement.

What I carry forward

Filmykaant remains one of the projects I am most proud of in 12 years of technology delivery.

Not because it was technically the most complex thing we built. But because it required us to solve a genuinely novel problem — the Hinglish NLP gap — and the solution we built served 2 million people well.

The lesson I carry forward: the constraint is often the product. The absence of a Hinglish NLP model did not mean we could not build the product. It meant we had to build the model first. And in doing so, we built something that no competitor could easily replicate.

That is usually how the best products happen.


Mahroof K was the project lead for the Filmykaant chatbot at Cedex Technologies. He is currently available for senior roles in Program Management, Product Management, and AI delivery.

Continue reading

Related articles

AI & Technology

What I learned building an AI company before GPT-4 existed

8 min read

Program Management

The real difference between a Project Manager and a Program Manager

6 min read

AI & Technology

GenAI for enterprise: what actually works vs what looks good in a demo

9 min read

Have thoughts on this? Let's talk.

Happy to discuss program management, product strategy, and technology delivery.

Get in touch →