কৃত্রিম বুদ্ধিমত্তা তথ্যপ্রযুক্তি

The Story of Artificial Intelligence: Why Is System Monitoring Important?

biggani orgAugust 11, 20257 Mins read883 Views

Written by: Mashiur Rahman

The midday sun had softened by then. The old corridor of the Faculty of Engineering—paint peeling gently off the walls, dust clinging to the windowpanes, yet time seemed to move differently there. Abir was approaching silently from the far end of the corridor. Middle-aged, calm eyes, a well-used leather bag slung over his shoulder, round-framed glasses in hand. Before starting the afternoon class, he paused for a moment, watched a couple of students tossing a football on the field outside, and then stopped himself with a question—Why do so many AI projects, with such good models, still collapse within just a few months?

Today, he decided to answer, but not in the language of a typical lecture—he’d do it as a story, through scenes, weaving technology with imagination. Entering the classroom, he wrote one word large and bold on the board—Monitoring. Then he glanced at the students with a gentle smile.

“You may think,” he said in a calm voice, “that machine learning is all about building models, feeding it data, and reading the output. But in reality, it’s a complete play. The first act is the data, then the pipeline and model, and finally the user. And in every act, there’s a different sentinel—monitoring. If the sentinel falls asleep anywhere, the play collapses.”

The class seemed to turn silent as one. Abir drew a long line on the board and started sketching tiny boxes along it—Ingestion, Validation, Transform, Feature Store, Model Serving, Observability, User Feedback. Next to each box, small arrows, sometimes circles—signs of feedback loops. Then, he began the first act.

Act One: The Door of Input Data—The First Sentinel’s Eyes

“Imagine you’ve been handed a huge library,” Abir said. “But half the books are torn, many have printing mistakes, some are so old that what’s written isn’t true anymore. Could you conduct modern research with such books?” Several heads shook in unison—‘No’. “That’s the story of input data,” Abir smiled, “In machine learning, data is the raw material. If the raw materials are bad, everything else becomes an uphill struggle.”

He wrote four words on the board—Volume, Distribution, Quality, Timeliness.

Volume—How will a model learn if there’s too little data? But if it’s excessive, it’s hard to process, costs more, and irrelevant data causes more problems.
Distribution—If there are mountains on one side and plains on the other, the model learns to walk up hills but stumbles on flat ground.
Quality—The quality of information is crucial. Missing values, duplicates, typos, incorrect encoding, outliers—if these irregularities show up, the model learns the wrong things.
Timeliness—How up-to-date is the data? Has the data source changed? If you measure today with yesterday’s data, you’re bound to be wrong.

“The first sentinel,” Abir said, “is the data ingestion monitor. Every day, every hour, even every minute—how many records came in, how many were dropped, how many were corrupted—it’s all logged. If the record rate suddenly drops, alert. If duplicate rates rise, alert. If missing fields increase, alert. The sentinel’s eyes must never close.”

Case Study 1: E-commerce Recommendation—The Curse of the Old Library
A Southeast Asian e-commerce platform, millions of users, thousands of products. They launch a new recommendation system—everyone’s happy for the first two weeks. In the third week, customers start complaining—the recommendations show old offers, out-of-stock products, even discontinued brands! The team investigates, only to find that in the ingestion pipeline, alongside the daily incremental data from the ‘flat-file’ drop folder, archived files from two years ago were slipping in occasionally. There was no hash-check or file-age-check; if the filename matched the pattern, it was swallowed. Result—the junk of history overshadowed the present.
Lesson learned—Duplicate file guards, timestamp fencing, data freshness SLAs, and schema-version checks must be mandatory in ingestion. If the sentinel stays awake at the very first door, the inner stage survives.

Act Two: The Corridor of the Data Pipeline—Second Sentinel with a Torch

Beside the drawn line, Abir sketched little stations—Validation → Cleaning → Transform → Feature Engineering → Feature Store. “Most mistakes pile up in this corridor,” he said. “This is where math, software, and the raw material of the real world all blend together.”

Range Check—Why is there a 345-year-old person in the age column? Why negative values in transaction amounts?
Distribution Monitoring—Yesterday ‘city=Dhaka’ was 30%, today it’s 75%—is that sudden spike an instrumentation bug?
Lineage—Which feature came from which source and how? If the calculations changed today from what they were yesterday, the model’s foundation shakes.
Dependency—If one service goes down, how many features are affected? Is there a backfill?
Schema Evolution—A new column gets added, but the downstream job wasn’t ready—crash, silent data loss, or worse, silent errors.

“Automated tests are essential here,” said Abir. “Not just unit tests on code, but on data too—data unit tests. Also, data profiling—day-over-day, week-over-week. When your distributions shift, the ‘drift meter’ should sound an alarm.”

Case Study 2: Banking Fraud Detection—The Invisible Trap of Timezones
A large bank’s fraud model analyzes the past 24 hours of transactions every morning. Lately, they’ve noticed—fewer suspicious transactions show up in the morning report, then there’s a spike in the afternoon report. An investigation reveals—an error in the pipeline’s UTC→local time conversion due to a daylight savings offset mistake. So, each day’s 6–8 hours of data was landing in ‘tomorrow’. Both model training and inference were affected.
The lesson—A single source of truth for timezone conversion (just one library/service), documented data lineage, and sliding window validation are crucial. Without a torchlight in the data pipeline corridor, only darkness prevails.

Act Three: The Model’s Core—Spotlight on the Stage

“Now, we’re at center stage,” said Abir. “Here the algorithm displays its skills. But remember, even the finest model turns obsolete quickly if the environment changes.”

He wrote on the board: Accuracy, Precision, Recall, F1, AUC, Calibration. “Performance isn’t just about a single number. Even a stellar AUC can be misleading without calibration. Without stability, today’s success turns to sand tomorrow.” Then he wrote—concept drift, data drift, data label lag.

“Drift,” Abir explained, “is the name for changes in the real world. If your model uses last year’s city map to measure today’s traffic—it’ll definitely fail. So, windowed baselines—take the last 7, 14, 30 days as your moving benchmarks—compare daily. Anomaly detection must catch sudden drops or spikes in metrics. And for failures, use shadow models—the new one runs alongside the old in the background—without disturbing users.”

Case Study 3: Risk Prediction in Healthcare—The Hidden Lesson of the Pandemic
A hospital network’s risk prediction model performed well before the pandemic. During the pandemic, patients’ profiles changed—age distribution, comorbidities, treatment protocols, admission patterns—everything. But the model stayed from the old days. Result—missed detections increased, resource allocation mistakes happened. Later, they started tracking drift signals with population statistics (age, gender, comorbidities), outcome shifts, and feature importance shifts. After weekly retraining, decentralized validation, and refining calibration curves, the model was back.
The lesson—A model is alive; if you don’t do regular checkups, it loses touch with reality.

Act Four: The User’s Door—The Last Sentinel in the Audience

“Final act,” Abir wrote on the board in large letters—UX × AI = Impact. “If the output trips on the way to the user, even the best model is wasted. Latency, explainability, trust, reliability—all are two sides of the same coin here.”

“Who do we mean by ‘user’? Customer support agents, truck drivers, doctors, bankers—each with their own needs. The language of output should vary accordingly. Sometimes a score is enough, sometimes a category, sometimes an explanation with reasons—‘I made this decision because…’”

Case Study 4: Logistics ETA—Right Time, Wrong Window
A logistics company rolled out ETA prediction. The model was good, errors dropped, but drivers weren’t using the app—five tabs on the screen, three charts, and to understand the score you needed training! The product team simplified the UI—one screen, ETA in large letters, three reasons below: “Traffic: High”, “Weather: Cloudy”, “Hub Queue: Medium”—with two buttons: Reroute and Alert Customer. Adoption rose from 30% to 82%.
The lesson—Users are not just customers, but judges as well. If complexity smacks them in the face, technology loses.

Under each box, tiny circles—metrics; beside them, a bell—alerts; and an arrow looping back to the start—feedback loop. Abir stepped away from the board and looked at the class—“The real beauty of this picture is: if you close your eyes at any point, the whole image falls apart.”

Silence in the Classroom

At this point, Abir paused silently. The class was still. Some were copying a little flowchart into their notebooks, others ticking boxes next to metrics. Abir knew, this silence was the sound of understanding. Then he began again—in the tone of a story, slow and deliberate.

“Imagine,” he said, “you’re sitting by a riverbank. The river is your data, the current is ingestion. In the middle, there’s a dam—the pipeline. If the dam has a crack, water will seep through slowly; at first, you won’t notice. Eventually, the water building up under the soil will turn your field salty. Later, you’ll notice the leaves withering—wondering why. But the actual mistake happened long ago. Just the same, model errors often appear late; the seeds were sown in the input, or the pipeline.”

“Artificial intelligence is like a stage play,” Abir smiled again, “The actors are all talented—some are data, some pipelines, some algorithms, and some UI. But unless the music, lights, and production all come together, the audience will leave. And who’s the audience? The user.”

Beside the flowchart, another picture—a runbook.

“Alongside monitoring, we need a runbook—what to do when an alert is triggered, written out step by step. For example—‘If feature drift > threshold: 1) Cut model live traffic by 25% (canary), 2) Turn on shadow model, 3) Immediately run batch validation against the last 14 days’ baseline, 4) For affected segments, show a ‘low confidence’ badge in the UI.’ Without a runbook, an engineer getting a late-night call ends up wandering the city without a map.”

Conclusion—Respect for the Sentinels

The class was over. Abir picked up his bag and glanced once more at the flowchart drawn on the board. “Remember,” he said softly, “AI isn’t magic; it’s a well-governed city. At each junction stands a sentinel—ingestion, pipeline, model, serving, and user. If the sentinels remain alert, the city is safe. If they fall asleep, the city becomes unrecognizable.”

The students slowly filed out. Gentle murmurs in the corridor, soft sunlight outside the window, and on the board, that one chalked word—Monitoring—glimmering quietly in the empty class. Abir stood at the door and looked back once more, as if revisiting his own picture. Then he left, heading for the next class—carrying this thought inside—What is measured, survives; what survives, improves.

affordablecarsales.co.nz