BanglaTense: The First Large-Scale Tense Dataset for Bangla—A New Frontier in NLP Research!

biggani orgFebruary 24, 20252 Mins read1.5k Views

A groundbreaking addition to Bengali Natural Language Processing (NLP) and Artificial Intelligence (AI) research—BanglaTense: the first standard tense (Tense) dataset for the Bangla language. This is the first large and reliable dataset created for tense detection and classification from Bengali text, set to open new horizons in Bangla NLP research.

A New Addition to Bangla NLP

Lack of sufficient
NLP datasets for the Bangla language has long been a challenge. While rich languages like English have advanced models and datasets, resources for Bangla are limited. Especially, if tense detection from Bangla text is not accurate, errors occur in machine translation, chatbots, and automatic text generation. BanglaTense has been developed to solve this limitation, playing a crucial role in interpreting both the meaning and context of the language.

BanglaTense: The Story Behind the Research

Under the leadership of researcher Md. Hasan Imam Bijoy (Lecturer, CSE, DIU) from Daffodil International University (DIU), the research team Umme Ayman (Lecturer, CSE, DIU) and Md. Monarul Islam Mithu (Lecturer, CSE, DIU) have for the first time developed a large and reliable tense dataset for the Bangla language. Their tireless efforts have paved the way for new opportunities in Bangla NLP research.

Steps in Dataset Creation

For the BanglaTense dataset, 17,819 sentences were collected from blogs, newspapers, social media, and literature. Afterwards, three linguists categorized these sentences into past, present, and future tenses. This is the first manually annotated tense dataset for Bangla, representing language as it is used in real life.

Features of the BanglaTense Dataset

The first large-scale tense dataset for Bangla
17,819 sentences divided into three tense categories
Manually annotated by linguists
Collected from blogs, newspapers, social media
Benchmark dataset for NLP research and development
Free and open source

Applications of BanglaTense

Natural Language Processing (NLP): Supports the development of advanced Bangla language models.
Automatic Translation Systems: Enables precise translation from Bangla to other languages.
Chatbots and Virtual Assistants: Makes it possible to create advanced and intelligent Bangla chatbots.
Automated Grammar Checking: Plays an effective role in Bangla grammar analysis.
Text Classification and Information Extraction: Useful for analyzing Bangla language data.

Future Plans

The next phase of BanglaTense will include complex linguistic research such as syntactic analysis and sentiment analysis (Sentiment Analysis). In addition, there are plans to extend this research to other languages, which will play a vital role in developing multi-lingual NLP including Bangla.

BanglaTense is a significant milestone in the digital revolution of the Bangla language, serving as an invaluable resource for researchers, developers, and students alike.