Build Real-Time Image Classification System with PyTorch FastAPI Complete Tutorial

deep_learning

Build Real-Time Image Classification System with PyTorch FastAPI Complete Tutorial

Learn to build a real-time image classification system using PyTorch and FastAPI. Complete tutorial covering CNN architecture, transfer learning, API deployment, and production optimization techniques.

Aug 15, 2025

Build Real-Time Image Classification System with PyTorch FastAPI Complete Tutorial

I’ve been fascinated by how quickly computers can understand images. It’s a problem I’ve wrestled with for some time - how to build systems that see like we do, but at machine speed. That curiosity led me to create a real-time image classification system, and today I’ll show you how I did it using PyTorch and FastAPI. Follow along as we build something practical together - I think you’ll find it useful for your own projects.

Setting up the environment is our first step. We need a clean structure to keep everything organized. Here’s how I arrange my projects:

mkdir -p src/{models,data,training,api,utils} tests
touch requirements.txt config.yaml README.md

Our dependencies are crucial. This requirements.txt file covers everything:

torch>=2.0.0
torchvision>=0.15.0
fastapi>=0.104.0
uvicorn[standard]>=0.24.0
python-multipart>=0.0.6
Pillow>=10.0.0

Configuration management saves time later. I use YAML files because they’re human-readable and flexible. This config.yaml handles our settings:

model:
  name: "resnet18"
  num_classes: 10
  pretrained: true

api:
  port: 8000
  max_file_size: 10485760

But how do we use this in code? This configuration loader makes settings accessible anywhere:

# src/utils/config.py
import yaml

class Config:
    def __init__(self, config_path="config.yaml"):
        with open(config_path) as f:
            self.settings = yaml.safe_load(f)
        
    def get(self, key):
        return self.settings[key]

config = Config()

Data preparation often takes more time than modeling. I created a custom dataset handler that works with any image directory structure:

# src/data/dataset.py
from torch.utils.data import Dataset
from PIL import Image

class CustomImageDataset(Dataset):
    def __init__(self, data_dir, transform=None):
        self.image_paths = [p for p in Path(data_dir).rglob('*.jpg')]
        self.transform = transform
    
    def __getitem__(self, idx):
        img = Image.open(self.image_paths[idx])
        return self.transform(img) if self.transform else img

Why did I choose this approach? Because real-world data is messy, and this handles various folder structures gracefully. For preprocessing, I recommend Albumentations - their GPU-accelerated transforms speed things up significantly:

import albumentations as A

transform = A.Compose([
    A.Resize(224, 224),
    A.Normalize(),
    ToTensorV2()
])

When building models, I start simple. This custom CNN gives a good baseline:

# src/models/custom_cnn.py
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super().__init__()
        self.layers = nn.Sequential(
            nn.Conv2d(3, 16, 3),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(16, 32, 3),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(32*54*54, num_classes)  # Adjust based on input size
        )
    
    def forward(self, x):
        return self.layers(x)

For production, transfer learning works better. ResNet models balance speed and accuracy well:

# src/models/transfer_learning.py
from torchvision import models

def get_model(config):
    model = models.__dict__[config.model.name](pretrained=config.model.pretrained)
    model.fc = nn.Linear(model.fc.in_features, config.model.num_classes)
    return model

Training requires careful monitoring. I use this training loop with early stopping:

# src/training/trainer.py
def train(model, train_loader, val_loader, epochs, device):
    optimizer = torch.optim.Adam(model.parameters())
    best_acc = 0
    
    for epoch in range(epochs):
        model.train()
        for images, labels in train_loader:
            outputs = model(images.to(device))
            loss = nn.CrossEntropyLoss()(outputs, labels.to(device))
            loss.backward()
            optimizer.step()
        
        # Validation phase
        model.eval()
        total, correct = 0, 0
        with torch.no_grad():
            for images, labels in val_loader:
                outputs = model(images.to(device))
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted.cpu() == labels).sum().item()
        
        acc = 100 * correct / total
        print(f'Epoch {epoch+1}: Accuracy {acc:.2f}%')
        
        # Early stopping
        if acc > best_acc:
            best_acc = acc
            torch.save(model.state_dict(), 'best_model.pth')

The API is where everything comes together. FastAPI makes endpoint creation surprisingly simple:

# src/api/main.py
from fastapi import FastAPI, UploadFile
from PIL import Image

app = FastAPI()
model = load_model()  # Your trained model

@app.post("/classify")
async def classify_image(file: UploadFile):
    img = Image.open(file.file)
    tensor = transform(img).unsqueeze(0)
    prediction = model(tensor).argmax().item()
    return {"class": class_names[prediction]}

To run it:

uvicorn src.api.main:app --reload --port 8000

Now you can send images via curl:

curl -X POST -F "file=@test.jpg" http://localhost:8000/classify

What about performance? I optimize by:

Using ONNX for model export
Enabling TorchScript compilation
Implementing request batching
Using async preprocessing

For monitoring, I add Prometheus metrics:

from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app)

Testing is non-negotiable. These pytest cases catch regressions:

# tests/test_api.py
def test_classify_endpoint():
    with open("test.jpg", "rb") as f:
        response = client.post("/classify", files={"file": f})
    assert response.status_code == 200
    assert "class" in response.json()

Building this changed how I see deployment pipelines. The PyTorch/FastAPI combination handles production loads beautifully while staying developer-friendly. What surprised me most was how quickly we went from experiment to usable API - just hours rather than days.

Give this approach a try in your next computer vision project. If you found this useful, share it with others facing similar challenges. I’d love to hear about your implementation - drop a comment about your experience!

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

deep_learning

Build Real-Time Image Classification System with PyTorch FastAPI Complete Tutorial

Our Creations

We are on Medium

Similar Posts

Build Multi-Class Image Classifier with PyTorch Transfer Learning: Complete Guide to Deployment

Complete CNN Guide: Build, Optimize, and Deploy Image Classification Models with Transfer Learning

Build Real-Time Emotion Recognition System Using CNN Computer Vision Transfer Learning Complete Tutorial

Build Real-Time Object Detection System with YOLOv8 and PyTorch: Complete Training to Deployment Guide

PyTorch Image Classification Pipeline: Transfer Learning, Data Preprocessing to Production Deployment Guide

Custom CNN Architecture Guide: Build PyTorch Image Classifiers from Scratch in 2024