When we humans open our eyes, we instantly recognize the world around us—where the chair is, where the door is, who is a person and who is just a face in a poster. But to a computer, an image is initially nothing but countless colored dots—pixels (the tiny colored particles that make up a picture). Just as the human brain creates meaning from these dots, computer vision is about teaching computers to interpret— which group of pixels form a “person”, which ones make up the “road”, and which are a “wall”. This is what we call “visual intelligence”—the intelligence of understanding what is seen.
The recurring focus of Dr. Alimur Reza’s research is semantic segmentation (dividing an image into meaningful parts and labeling each part). Imagine a picture where a person is standing, with a wall behind and a window beside them. A basic system might simply say, “There is a person here.” But segmentation takes it a step further—it identifies exactly which parts of the image are the person, which parts are the wall, and which are the window. It’s as if the whole picture becomes a map, with each area clearly labeled. This is crucial for robots, because if a robot is to work inside a room, it needs to know where to walk, where to stop, what is an obstacle, and what is a safe path. Just like humans feel the wall with their hands when moving in the dark, robots need the science of “seeing” to understand direction.
The striking thing here is—even though advancing AI should, in theory, make “seeing” easier, in reality, this problem still hasn’t been fully solved. That’s because a major part of computer vision relies on machine learning (methods of learning from data) and deep learning (multi-layer neural network-based learning). In other words, how well a machine learns depends largely on what type of data it is shown, the nature of lighting and shadow, how complex the environment is, and the model’s capacity. Just as you might make mistakes with tough math problems without enough practice, a machine also gets confused—an object seen under different lighting or from a new angle may trip it up. These limitations are what drive research forward—because “teaching a machine to see” is not just about flashy demos, but about the long practice of making reliable decisions in the real world.
The impact of computer vision is therefore not just confined to laboratories. In urban traffic systems for detecting vehicles, camera-based monitoring to catch traffic violations, identifying product defects in factories, diagnosing crop diseases in agriculture—this “machine eye” can be applied everywhere. In Bangladesh, for instance, where traffic congestion and road management are big issues, or where quality control is vital in the garment industry, or analyzing drone images during river erosion or floods—computer vision has real-world usage in all these fields. In other words, no matter how “high-tech” the technology may sound, its roots are ultimately intertwined with everyday human needs.
That’s why Dr. Alimur Reza’s quote is not just a research statement; it’s a direction for the future of society. Because in the coming days, robots, smart cameras, and autonomous vehicles will all face one big question: Do they truly understand, or are they just calculating? Computer vision is steadily making that understanding a reality—and in this journey lies inspiration for the youth of Bangladesh: if you combine math, programming, and curiosity, you too can become part of the quest to create a “machine eye”.
In the full interview, Dr. Alimur Reza discusses his educational journey, research details, the future of robotics, and practical questions about AI usage in greater depth. Read the complete interview below and watch it on YouTube.

Leave a comment