MLflow Tutorial: Track Experiments, Register Models, and Deploy Scikit-Learn APIs
Learn MLflow experiment tracking, model registry, and Scikit-learn deployment to build reproducible ML workflows and ship models faster.
Every serious machine learning project starts with excitement. You build a model, tune a few parameters, and the accuracy jumps. Then you try something else, and it gets even better. But a week later, when you need to reproduce that winning submission, you find yourself staring at a notebook full of cells executed in the wrong order, and you cannot remember which combination of learning rate and tree depth gave you that F1 score of 0.82. I lived that nightmare. After wasting a whole afternoon trying to reconstruct a single experiment, I decided there had to be a better way. That is when I discovered MLflow, an open-source platform that tracks everything from parameters and metrics to the model itself, and lets you serve it in production with a single command. In this guide, I will walk you through the exact workflow I now use: building a Scikit-learn pipeline, logging every run to a local MLflow server, comparing experiments, promoting the best model through the registry, and finally deploying it as a REST API.
First, you need to set up your environment. Install the required packages with pip:
pip install mlflow==2.11.1 scikit-learn==1.4.2 pandas==2.2.2 numpy==1.26.4 matplotlib==3.8.4 seaborn==0.13.2
Then launch the MLflow tracking server in your terminal:
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./mlruns --host 127.0.0.1 --port 5000
Open your browser and go to http://127.0.0.1:5000. You will see an empty UI ready to receive runs. The SQLite database stores all metadata about experiments, parameters, and metrics, while the artifact root holds the actual model files, plots, and any other outputs. This setup works perfectly for individual developers or small teams who want a lightweight experiment tracker without any cloud costs.
Now, let us prepare a dataset. I will use the Bank Marketing dataset from UCI, a classic binary classification problem where we predict whether a client subscribes to a term deposit. The data contains both numeric and categorical columns, which is exactly the kind of mixed data you face in real projects. Here is a function to load and split the data:
import pandas as pd
from sklearn.model_selection import train_test_split
def load_bank_marketing():
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional-full.csv"
df = pd.read_csv(url, sep=";")
df["y"] = (df["y"] == "yes").astype(int)
feature_cols = [c for c in df.columns if c != "y"]
X = df[feature_cols]
y = df["y"]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
return X_train, X_test, y_train, y_test
Have you ever tried to remember which imputation strategy you used in your last experiment? I have, and it is painful. That is why I build a pipeline that encapsulates every preprocessing step. The pipeline below uses a column transformer to handle numeric features with median imputation and standard scaling, and categorical features with most-frequent imputation and one-hot encoding. The classifier is a Gradient Boosting machine:
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import GradientBoostingClassifier
NUMERIC_FEATURES = ["age", "duration", "campaign", "pdays", "previous", "emp.var.rate",
"cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed"]
CATEGORICAL_FEATURES = ["job", "marital", "education", "default", "housing", "loan",
"contact", "month", "day_of_week", "poutcome"]
def build_pipeline(n_estimators=200, max_depth=4, learning_rate=0.05, subsample=0.8):
numeric_transformer = Pipeline([
("imputer", SimpleImputer(strategy="median")),
("scaler", StandardScaler())
])
categorical_transformer = Pipeline([
("imputer", SimpleImputer(strategy="most_frequent")),
("encoder", OneHotEncoder(handle_unknown="ignore", sparse_output=False))
])
preprocessor = ColumnTransformer([
("num", numeric_transformer, NUMERIC_FEATURES),
("cat", categorical_transformer, CATEGORICAL_FEATURES)
])
pipeline = Pipeline([
("preprocessor", preprocessor),
("classifier", GradientBoostingClassifier(
n_estimators=n_estimators, max_depth=max_depth,
learning_rate=learning_rate, subsample=subsample, random_state=42
))
])
return pipeline
Now comes the core part: training with full MLflow instrumentation. I set the tracking URI and experiment name once, then write a train function that logs parameters, metrics, the trained model, and even a confusion matrix image. I always use mlflow.autolog() for Scikit-learn estimators because it automatically captures most of the important values. But for models inside a pipeline, I prefer to log manually to have full control. Here is the function I use:
import mlflow
import matplotlib.pyplot as plt
from sklearn.metrics import f1_score, roc_auc_score, confusion_matrix, ConfusionMatrixDisplay
from mlflow.models import infer_signature
mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("bank-marketing-classification")
def train_and_log(n_estimators=200, max_depth=4, learning_rate=0.05, subsample=0.8, run_name="baseline"):
X_train, X_test, y_train, y_test = load_bank_marketing()
pipeline = build_pipeline(n_estimators, max_depth, learning_rate, subsample)
with mlflow.start_run(run_name=run_name) as run:
# Log parameters
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("max_depth", max_depth)
mlflow.log_param("learning_rate", learning_rate)
mlflow.log_param("subsample", subsample)
# Train
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)
y_proba = pipeline.predict_proba(X_test)[:, 1]
# Metrics
f1 = f1_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_proba)
mlflow.log_metric("f1", f1)
mlflow.log_metric("roc_auc", auc)
# Confusion matrix image
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.savefig("confusion_matrix.png")
mlflow.log_artifact("confusion_matrix.png")
plt.close()
# Log the model with a signature
signature = infer_signature(X_train, pipeline.predict(X_train))
mlflow.sklearn.log_model(pipeline, "model", signature=signature)
print(f"Run {run_name} finished. F1={f1:.3f}, AUC={auc:.3f}")
return run.info.run_id
Run this function a few times with different parameters. Then open the MLflow UI and click on each run. You can compare their metrics side by side. Have you ever wished you could see all your experiments in one clean table? That is exactly what the UI provides. You can even sort by F1 score and instantly identify the best run.
But tracking individual runs is only half the battle. The real power comes from the Model Registry. Once you have selected your champion model, you can register it, assign a stage like “Staging”, and later promote it to “Production”. Here is a script that does this programmatically:
import mlflow
from mlflow.tracking import MlflowClient
mlflow.set_tracking_uri("http://127.0.0.1:5000")
client = MlflowClient()
# Assume run_id of best model is known (e.g., from UI or automated search)
best_run_id = "your_run_id_here"
# Register the model
model_uri = f"runs:/{best_run_id}/model"
registered_name = "bank-marketing-gb"
result = mlflow.register_model(model_uri, registered_name)
# Transition to Staging
client.transition_model_version_stage(
name=registered_name,
version=result.version,
stage="Staging"
)
print(f"Model version {result.version} moved to Staging.")
Now that your model is in the registry, you can serve it as a REST endpoint with a single MLflow CLI command. The server loads the model from the artifact store and exposes a predict API:
mlflow models serve -m "models:/bank-marketing-gb/Staging" -p 5001
You can test it with a curl command:
curl -X POST http://127.0.0.1:5001/invocations \
-H "Content-Type: application/json" \
-d '{"dataframe_split": {"columns": ["age", "job", "marital", "education", "default", "housing", "loan", "contact", "month", "day_of_week", "duration", "campaign", "pdays", "previous", "poutcome", "emp.var.rate", "cons.price.idx", "cons.conf.idx", "euribor3m", "nr.employed"], "data": [[30, "admin.", "married", "university.degree", "no", "yes", "no", "cellular", "may", "mon", 200, 1, 999, 0, "nonexistent", 1.1, 93.994, -36.4, 4.857, 5191]]}}')
The response will contain the prediction probability and class. This is the exact same model you trained and promoted, now running in production.
I have leaned on this workflow for months. It saved me countless hours of manual logging and debugging. The question now is: will you give it a try? If this guide helped you see how experiment tracking and model deployment can be simple, please like this article, leave a comment with your own MLflow experience, and share it with a teammate who still uses random filenames to save models. Your feedback helps me write better content for the community.
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva