Khoj: The Trailblazing Journey of the First AI-Powered Fact-Checker in the Bengali Language

biggani orgSeptember 27, 20257 Mins read635 Views

In today’s digital era, we are surrounded by a flood of information. But in the midst of it all, how easy is it to tell what’s true and what’s false? Whether it’s your Facebook news feed, group chats on WhatsApp or Telegram, or news apps—false information, rumors, and half-truths float everywhere. For the millions who speak Bengali, this is a huge challenge. Without reliable means to verify facts, the consequences can be dire.

It is from this time of chaotic misinformation that “Khoj” was born—the first and most comprehensive AI-based fact-checking platform in Bengali.

Whether it’s text, images, historical events, or viral rumors spreading online, Khoj can become your trusted companion for verification—in your mother tongue. Its purpose is to restore the truth in the digital age, making it an essential tool for increasing digital literacy.

Imagine this: you suddenly notice a piece of news online—not covered by major media, but gaining traction through shares and comments on Facebook and WhatsApp, making it seem true:
“Millions of dollars were robbed from a particular bank, leaving thousands destitute.”

In such uncertain moments, Khoj can be your one and only reliance. In just seconds, AI sifts through sources, compares the reliability of the information, and presents you with a complete fact-check report. It tells you what to trust, where the information originated, and where suspicion may lie. This fact-checking doesn’t happen like magic, and it’s not because AI is all-knowing. It’s driven by a flash of technology.

To summarize briefly: first, the web, social posts, news archives, and multimedia are searched surrounding the news in question; then the reliability of those sources is evaluated, with contextual text and image analysis; and finally, a clear verdict—true, misleading, or false—is given, with sources cited. Let’s go deeper and understand how Khoj works.

How Khoj Works:

The backbone of Khoj is a brilliantly smart architecture developed by the Khoj team. It joins hands with AI to search for information from reliable sources and presents it in a way that is easy to understand. And don’t mistake it for merely a search engine—it’s an intelligent process that strives for precise results at every step. Let’s dive into the details:

(1) Curated List of Trusted Sources: We diligently curated a list of over 150 Bengali and English websites. These sites reliably cover current events—politics, economy, celebrities, science, religion, and health. Major news portals, fact-checking sites, scientific journals, and health platforms are all included. Why? Because if the source isn’t reliable, the whole process crumbles. Our robust list effectively eliminates the “garbage in, garbage out” risk. As a result, every outcome is backed only by well-known, validated sources, boosting your confidence. We aimed to make this a secure library, preserving only authentic and validated information.

(2) Query Optimization: AI’s First Step

When you ask a question, say, “Was there vote rigging in the DUCSU Elections?“, that’s when our AI gets to work. It transforms your simple question into precise, effective search keywords. This process is called query parsing.

For example, your question might be transformed into:

“DUCSU Election 2019 rigging”
“DUCSU Election vote fraud”
“DUCSU Election irregularities allegations”
“DUCSU Election result controversy”

Then, AI pairs these keywords with names of reliable news media and fact-checking sites for even more accurate searching.

Here, AI is specifically instructed through system prompts to generate the most effective search keywords. This greatly reduces irrelevant results and significantly increases the chances of finding correct information. This step is one of the things that sets Khoj apart from regular search tools. Think of it as sending your question to the perfect target like a laser-guided missile, saving both time and effort.

(3) Web Crawling and Filtering

As we saw, in the first step, the user’s question is broken down by Khoj’s query parsing into smaller, effective keywords. Then, the search begins, using the curated source list we built earlier.

This involves a web crawling pipeline, which scans each website’s HTML DOM structure and collects necessary content (headlines, meta tags, main text). For those unfamiliar with web crawling, it means fetching data from various websites, while the DOM structure refers to the internal layout and tags of a web page.

The collected documents are then scored using a Weighted Scoring Algorithm. Each source is weighted. Simply put, it determines what’s more important and what’s less so.
Three main metrics are used to assign these weights:

a) Recency Score – Measures how recent the content is based on its timestamp, separating new from old data.
b) Authority Score – Assesses the source’s trustworthiness by analyzing domain authority, PageRank, and trust signals.

c) Keyword Match Score – Measures where and how often keywords appear, and their relevance. This uses TF-IDF (Term Frequency–Inverse Document Frequency) to calculate keyword relevance in the headline and text.

Weighting Rule: Keyword matches in the title get more weight, while matches in the body text are weighted less. Using all metrics, a composite score (Composite Score: 0–1) is calculated for each source. In other words, it combines all the factors into one final number.
Finally, the top 8–10 sources are stored in JSON format. For those unfamiliar, JSON is a data format readable and usable by computers. This filtering process is made so precise it’s like finding a needle in a haystack. The result: irrelevant data is eliminated, surfacing only the most reliable, recent, and relevant information.

(4) Fallback Search: Leaving No Gaps

As we said earlier, Khoj searches for information for users from our curated sources. Now, suppose a claim appears that’s very new or unusual, and there’s no match in Khoj’s listed sources. In such moments, Khoj doesn’t just sit idle; it immediately switches to a special tool, the Tavily API. This combs through the whole web, but only collects information from credible and authentic sites.

This process includes a smart authority scoring system to block fake or irrelevant links. Plus, Khoj integrates 16 different API keys as backups—if one fails, another takes over. So, even with rate limits or usage issues, Khoj keeps running.

In short, this is Khoj’s ironclad safety net, ensuring that no matter how tough or complex the claim is, there’s always a way to find the truth. And thanks to this fallback system, Khoj’s accuracy rate is nearly 95%—higher than many international platforms.

(5) Report Generation: The Creativity and Accuracy of AI

Once Khoj has gathered all the data, the AI gets to work organizing it into a simple, reader-friendly report. Here, a three-tier AI fallback system is used—from primary to tertiary—making errors almost non-existent.

The report provides:

What each source says,
Where there is disagreement,
And whether the claim is ultimately true, false, or disputed.

This is where the real magic of AI becomes clear. First, it optimizes your question to extract relevant information; then arranges the results as a narrative, tailored to the cultural context.

Every report includes source links, so you can cross-check yourself. The whole experience feels like having a thoughtful discussion with an expert—simple, reliable, and highly convincing.

Figure: A simplified architecture of Khoj’s AI fact-checker process

Special Features: What Makes Khoj Unbeatable

To make Khoj a complete platform for truth-seeking in Bengali, we’ve added several impactful features:

Mythbusting: In the digital age, fake news and rumors have become a daily reality. Most people stop at “is this claim true or false?” But Khoj chooses a different path.

One of Khoj’s unique features is that it doesn’t just present facts; it explains things conversationally. For example, if you hear a superstition or scientific claim, AI will explain to you like a friend. It sets the context, presents scientific evidence, and finally, clearly debunks the myth with a story-like narrative.

The biggest strength of this feature is that it doesn’t just protect people from rumors, but also teaches critical thinking in the long run. That skill creates a form of digital immunity, which will be extremely useful in the information battles of the future.

“Muktijuddho Corner” (Liberation War Corner): Even today, there’s widespread misinformation, distorted history, and rumors about 1971—all of which confuse the new generation. In this context, Khoj introduces a unique feature: the Liberation War Corner.

Here, AI works like an experienced historian. It doesn’t just deliver facts—instead, it explains everything, from historical background, timelines, genocide statistics, war crimes, key personalities, and geographical details to cultural impacts, in simple and engaging storytelling. This feature is specially trained under the Retrieval-Augmented Generation (RAG) method, utilizing numerous books, government documents, diaries, and archival records related to the Liberation War. As a result, information comes not from AI speculation but from evidence-based, verified sources.

This makes it easy to detect incorrect narratives, false histories, and rumors about the Liberation War, ensuring that the true story of our struggle, sacrifice, and victory is not hidden behind distortion.

Multimedia Verification: Khoj doesn’t just verify text—it can check the authenticity of images too. From detecting AI-generated content to finding a picture’s original source via reverse image search, it covers it all. For text, Khoj uses AI detection and plagiarism checks to catch deepfakes or manipulations. Thanks to advanced API integration, Khoj is fully prepared for the multimedia era.

Manual Fact-Checking: The Inseparable Blend of AI and Human Touch

Alongside AI-powered features, the Khoj team regularly conducts manual fact-checks of rumors circulating online. The goal is simple: to block the easy spread of confusing or fake news. This combination of AI’s speed and human analytical skills has made Khoj a reliable standard for fact-checking.

Figure: Khoj at a glance

Limitations:

No system is perfect. No matter how extensive the source list is, information outside it may be missed—that’s the challenge of source bias. Without real-time updates, there may be slight delays. Sometimes, AI-generated summaries may make subtle mistakes, so it’s always important to check the main source links.

Conclusion:

Khoj is not just a platform—it’s a new initiative to boost digital literacy and detect misinformation in the Bengali language. Its hybrid pipeline—AI-driven search, source selection, and summarization—demonstrates that truth can become much easier to verify. We hope Khoj will help its users and gradually become more impactful. In every moment of your search for truth, Khoj will be by your side.

Take a tour of Khoj: khoj-bd.com

References:
[1] See here: khoj-bd.com/fact-checking-verification

affordablecarsales.co.nz