How to Monitor ML Data Drift with Evidently AI, Prometheus, and Grafana

Learn how to monitor ML data drift in real time with Evidently AI, Prometheus, and Grafana to catch model decay early and improve reliability.

How to Monitor ML Data Drift with Evidently AI, Prometheus, and Grafana

I want you to put yourself in the shoes of a data scientist who just deployed a model that predicts customer churn. Everything looks great for two weeks. Then the CEO storms into your office: “Why did our retention metrics drop by 15% last month?” You dig into the logs and find that your model has been predicting “will stay” for customers who clearly left. The data shifted—summer users behave differently than winter ones—and you had no idea. That’s exactly why I’m writing this article today. I’ve seen too many teams invest months in training a model only to watch it silently decay in production, because they had no real‑time monitoring. The solution isn’t just alerts; it’s a pipeline that constantly compares live data against your reference distribution, computes drift scores, and surfaces them on a dashboard that anyone can read. Let me show you how I built this system using Evidently AI, Prometheus, and Grafana.

The core problem is data drift—the statistical properties of your input features change over time. Have you ever noticed that the average transaction amount on your website spikes on Black Friday? If your model was trained on weekly averages from March, it will freak out. That’s one example. Concept drift is even worse: the relationship between features and the target changes. A credit‑risk model trained before a recession will fail when economic conditions shift. I remember the first time I saw a deployed model’s accuracy drop by 30% overnight—it was because a new product category appeared, and the model had never seen it. That’s when I decided to build a monitoring stack that doesn’t require a PhD to interpret.

Let’s start with the key pieces. Evidently AI gives you a Python library that calculates drift for numerical and categorical features using statistical tests (like Kolmogorov‑Smirnov or Jensen‑Shannon divergence). Prometheus is the de facto standard for metrics storage and scraping. Grafana turns those metrics into live dashboards. The trick is to glue them together in a scheduled loop that runs inside your FastAPI serving app.

First, you need a reference dataset—the distribution of features your model saw during training. I generate it from my training data and save it as a CSV. Then, every time a new batch of predictions is made, I collect the current feature values. Here’s the minimal drift detector I use:

from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

def compute_drift(reference: pd.DataFrame, current: pd.DataFrame) -> float:
    report = Report(metrics=[DataDriftPreset()])
    report.run(reference_data=reference, current_data=current, 
               column_mapping=ColumnMapping())
    drift_score = report.as_dict()["metrics"][0]["result"]["dataset_drift"]
    return drift_score

That drift_score is a number between 0 and 1. I set a threshold—say 0.15—and if the score exceeds it, the model is doing something wrong. But you don’t want to check drift manually. You want a background scheduler that runs every five minutes. I use APScheduler inside my FastAPI app:

from apscheduler.schedulers.asyncio import AsyncIOScheduler

scheduler = AsyncIOScheduler()

@scheduler.scheduled_job("interval", minutes=5)
async def check_drift():
    with open("data/reference.csv") as f:
        ref = pd.read_csv(f)
    current = fetch_recent_predictions()  # your logic
    score = compute_drift(ref, current)
    if score > 0.15:
        print("Drift detected!")

But printing isn’t enough. You need to expose the score as a Prometheus metric so it can be scraped. Here’s how I add it to the /metrics endpoint:

from prometheus_client import Gauge, generate_latest

drift_gauge = Gauge("model_drift_score", "Current data drift score")

@app.on_event("startup")
async def start_scheduler():
    scheduler.start()

@app.get("/metrics")
async def metrics():
    return Response(content=generate_latest(), media_type="text/plain")

Now, every time the scheduler runs, it updates the gauge. Prometheus scrapes it every 15 seconds. You can visualise it immediately in Grafana. I build a simple dashboard with a time‑series panel showing model_drift_score over the last hour, with a threshold line at 0.15. When the score crosses that line, Grafana fires an alert—an email, a Slack message, whatever you want.

But that’s not all. I also track the distribution of individual features. For example, if transaction_amount drifts while others stay stable, you can spot a specific business change (like a new pricing tier). I export per‑feature drift scores using Evidently’s ColumnSummaryMetric. Then I create a Prometheus gauge per feature:

feature_drift = Gauge("feature_drift_score", "Drift per feature", ["feature"])

for name, score in feature_scores.items():
    feature_drift.labels(feature=name).set(score)

Now my Grafana dashboard shows a table of all features with their drift scores. I can instantly see that hour_of_day drifted because our marketing campaign shifted from morning emails to evening push notifications.

Here’s the part that surprised me: monitoring drift doesn’t require storing all raw data forever. You only need a window of recent predictions. I keep a rolling buffer of the last 1,000 records in a Pandas DataFrame. Every five minutes, I run drift detection on that buffer compared to the reference. Memory usage stays low, and the scheduler overhead is negligible on a typical web server.

Now, what about concept drift? Data drift tells you the input changed, but concept drift tells you the model’s performance deteriorated even if the input looks normal. I combine both by logging actual outcomes when they become available (like churn labels after a month). I use Evidently’s TargetDriftPreset to compare recent predictions with reference predictions. If the distribution of predicted probabilities shifts, I know the model’s decision boundary is moving. I export that as another gauge.

I’ve learned that you don’t need complex Kubernetes operators or heavy infrastructure. In one project, I ran the entire monitoring pipeline on a single Raspberry Pi for a small IoT model. The key is to keep it simple: reference data saved as a CSV, a scheduler that runs every few minutes, Prometheus scraping locally, and Grafana displaying a red line. Anyone on the team can understand it.

Let me share a personal story. I once worked with a team that had a hotel booking model. They were retraining every month but still saw cancellations spike. We added drift monitoring and discovered that length_of_stay had shifted because a new corporate client always booked week‑long stays. The model was tuned for weekend getaways. We added a feature for booking source, retrained, and the problem vanished within a week. Without monitoring, they would have blamed random noise.

Now, I encourage you to like this article if you found it useful, share it with a colleague who’s still debugging with Excel spreadsheets, and comment below with your own war stories of silent model decay. The next step is to set up your own stack—you already have the code. Just clone the structure, run docker-compose up, and watch your models stay healthy.


As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!


101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!


📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!


Our Creations

Be sure to check out our creations:

Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools


We are on Medium

Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva

// Our Network

More from our team

Explore our publications across finance, culture, tech, and beyond.

// More Articles

Similar Posts