What I learned building an AI company before GPT-4 existed

In 2016, Facebook opened the Messenger platform to bots. There was no GPT, no Claude, no readily available large language model to call. If you wanted a chatbot to understand natural language, you built the understanding yourself — pattern by pattern, intent by intent, entity by entity.

That is what we did at Cedex Technologies. And it taught me more about AI product delivery than any course, certification, or conference ever could.

The context: building AI when AI was hard

When we started our Conversational AI practice, the dominant approach was rule-based decision trees. You mapped out every possible thing a user might say, drew branches for every response path, and hoped your users would stay inside the lines you drew.

They never did.

The first major project that forced us to grow beyond rule-based systems was a chatbot for a popular Indian TV channel. The requirement seemed simple: build a bot that could answer fan questions about Bollywood films, run quizzes, and share TV schedules.

The problem: the audience spoke Hinglish.

Hinglish is Hindi written in English script — a fluid, informal hybrid that no off-the-shelf NLP model at the time could handle. "Yaar kal kaunsi movie hai?" was a perfectly normal query. No intent classifier we could buy or download had ever seen a sentence like that.

So we built one.

The real lesson: constraints force creativity

We spent weeks collecting training data. We ran surveys. We asked people to type out the kinds of questions they would naturally ask a Bollywood chatbot. We labelled thousands of utterances by hand, trained classifiers, tested them, threw them out, and started again.

It was slow, expensive, and nothing like the slick "connect to API, deploy, done" narrative that surrounds AI today.

But here is what I learned: the constraint was the product.

Because we had to build the NLP layer ourselves, we understood it deeply. We knew exactly what the bot could and could not do. We could set honest expectations with the client. We could design the conversation flows around the actual capabilities of the model — not an imaginary, idealised version of it.

The bot launched. It served 2 million unique users. At peak, it handled 1 million conversations in a single day. It was named one of Facebook's Top 10 Chatbots globally and was selected for the FBStart programme.

None of that would have happened if we had taken shortcuts.

What changed with GPT and modern LLMs

When OpenAI released GPT-3, and then ChatGPT, and then GPT-4, I watched with a mix of awe and recognition.

The awe was obvious. The capability jump was extraordinary.

The recognition was subtler: the hard problems had not gone away. They had just moved.

In the pre-LLM era, the hard problem was understanding language. You had to build that capability.

In the LLM era, the hard problems are:

Reliability. How do you get a model that can generate anything to consistently generate the right thing?
Grounding. How do you stop a model from confidently hallucinating facts that do not exist?
Cost at scale. How do you serve millions of queries without your infrastructure bill consuming all the value you create?
Evaluation. How do you even measure whether your AI product is working well?

These are product and engineering problems, not AI problems. They require the same rigour, the same discipline, and the same user-centricity that any good product delivery demands.

What I would tell my 2016 self

If I could go back and advise the version of me who was annotating training data by hand in a small office in Kochi, I would say three things:

1. The model is not the product. The model is one component. The product is the whole system: the conversation design, the fallback handling, the human escalation path, the analytics, the feedback loops. Teams that treat the model as the product ship demos, not products.

2. Measure what matters to users, not what is easy to measure. We spent too long optimising for intent classification accuracy. What actually mattered to users was resolution rate — did they get the answer they needed? Those are not the same thing.

3. Human oversight is a feature, not a failure. Early on, we treated human takeover as a sign that the bot had failed. Later we realised it was a feature. The best conversational products we built had seamless human escalation built into the design from day one, not bolted on as an afterthought.

Where we are now

Today I lead product and program delivery for GenAI platforms. The technology has changed dramatically. The principles have not.

You still need to start with a real user problem. You still need to define what success looks like before you build anything. You still need to design for failure modes. You still need to measure what matters.

The teams that are shipping great AI products today are not the ones with the best models. They are the ones with the best product discipline.

That was true in 2016. It is still true now.

What I learned building an AI company before GPT-4 existed

The context: building AI when AI was hard

The real lesson: constraints force creativity

What changed with GPT and modern LLMs

What I would tell my 2016 self

Where we are now

Related articles

GenAI for enterprise: what actually works vs what looks good in a demo

How we built a Hinglish chatbot for 2 million users when no NLP model existed

The real difference between a Project Manager and a Program Manager