deep_learning

Custom CNN for Multi-Class Image Classification with PyTorch: Complete Training and Deployment Guide

Build custom CNN for image classification with PyTorch. Complete tutorial covering data loading, model training, and deployment for CIFAR-10 dataset classification.

Custom CNN for Multi-Class Image Classification with PyTorch: Complete Training and Deployment Guide

I’ve been tackling image classification challenges recently, particularly with PyTorch, and wanted to share a practical walkthrough. Many resources cover fragments of the process, but stitching together a complete pipeline—from raw data to deployable model—reveals fascinating nuances. Why not explore this together using CIFAR-10? It’s approachable yet complex enough to demonstrate real-world considerations.

Setting up the environment is straightforward. We’ll need these core packages:

pip install torch torchvision matplotlib seaborn scikit-learn tensorboard

Here’s our foundational import block:

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
from torchvision.datasets import CIFAR10

# Configure device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Active device: {device}")

CIFAR-10 contains 60,000 tiny 32x32 RGB images across 10 categories. Small images force models to learn efficient features—have you considered how spatial compression affects feature extraction? We implement aggressive augmentation to simulate real-world variations:

train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomRotation(15),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
    transforms.RandomErasing(p=0.1)
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Our CNN architecture balances complexity and efficiency. Notice the incremental channel expansion—why do you think this pattern works better than arbitrary layer sizes?

class CompactCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(32),
            nn.Conv2d(32, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.25),
            
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.BatchNorm2d(64),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Dropout(0.35),
        )
        self.classifier = nn.Sequential(
            nn.Linear(64*8*8, 512),
            nn.ReLU(),
            nn.BatchNorm1d(512),
            nn.Dropout(0.5),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        return self.classifier(x)

Training incorporates several optimizations. The learning rate scheduler is particularly crucial—how might adaptive rate adjustment prevent overfitting?

model = CompactCNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode="max", factor=0.5, patience=3, verbose=True
)
criterion = nn.CrossEntropyLoss()

for epoch in range(30):
    model.train()
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    
    # Validation phase
    model.eval()
    val_loss, correct = 0, 0
    with torch.no_grad():
        for images, labels in val_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            val_loss += criterion(outputs, labels).item()
            _, predicted = torch.max(outputs.data, 1)
            correct += (predicted == labels).sum().item()
    
    val_acc = 100 * correct / len(val_dataset)
    scheduler.step(val_acc)  # Adjust learning rate

Evaluation goes beyond accuracy. This confusion matrix snippet reveals class-specific weaknesses:

from sklearn.metrics import confusion_matrix
import seaborn as sns

def plot_confusion_matrix(model, loader):
    model.eval()
    all_preds, all_labels = [], []
    with torch.no_grad():
        for images, labels in loader:
            images = images.to(device)
            outputs = model(images)
            _, preds = torch.max(outputs, 1)
            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.numpy())
    
    cm = confusion_matrix(all_labels, all_preds)
    plt.figure(figsize=(10,8))
    sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
    plt.xlabel("Predicted")
    plt.ylabel("Actual")

For deployment, we export with TorchScript:

scripted_model = torch.jit.script(model.cpu())
scripted_model.save("cifar10_cnn.pt")

This preserves model architecture while decoupling from Python runtime—essential for production systems. What other deployment considerations might arise in your projects?

The journey from data to deployable model involves numerous design choices. Each decision—augmentation intensity, regularization strength, topology depth—creates tradeoffs between accuracy, speed, and robustness. I’ve found iterative refinement based on validation metrics yields the best results. What techniques have worked well in your projects?

If you found this walkthrough helpful, share it with others exploring PyTorch. Questions or insights? Let’s discuss in the comments—I’ll respond to thoughts and suggestions.

Keywords: PyTorch CNN tutorial, custom CNN architecture, multi-class image classification, CIFAR-10 dataset, PyTorch model training, CNN data preprocessing, image classification pipeline, PyTorch model deployment, computer vision PyTorch, deep learning image classification



Similar Posts
Blog Image
How to Build a Production-Ready Named Entity Recognition (NER) System

Learn to build a fast, accurate, and scalable NER system using transformers, spaCy, and FastAPI for real-world applications.

Blog Image
Build Multi-Modal Sentiment Analysis with PyTorch: Combine Text and Images for Better Emotion Detection

Learn to build a multi-modal sentiment analysis system with PyTorch that combines text and images for superior emotion detection. Step-by-step guide included.

Blog Image
Build a Movie Recommendation System with Deep Learning: Complete Production Deployment Guide

Learn to build production-ready movie recommendation systems with deep learning. Complete guide covering neural collaborative filtering, deployment, and monitoring. Start building today!

Blog Image
Build Real-Time Object Detection System with YOLOv8 and PyTorch: Complete Training to Deployment Guide

Learn to build a complete YOLOv8 object detection system with PyTorch. Master training, optimization, and deployment for real-time detection applications.

Blog Image
Build Real-Time Object Detection System: YOLOv8 + OpenCV Python Tutorial for Beginners

Learn to build real-time object detection with YOLOv8 and OpenCV in Python. Complete tutorial covering setup, training, and optimization for production deployment.

Blog Image
Complete Guide: Build Multi-Class Image Classifier with TensorFlow Transfer Learning 2024

Learn to build a powerful multi-class image classifier using transfer learning with TensorFlow and Keras. Complete guide with code examples, data preprocessing, and model optimization techniques.