Building GANs with PyTorch: Complete Guide to Training Image Generation Networks from Scratch

deep_learning

Building GANs with PyTorch: Complete Guide to Training Image Generation Networks from Scratch

Master PyTorch GANs with our complete guide to building generative adversarial networks for image generation. Learn theory, implementation, training tips.

Nov 9, 2025

Building GANs with PyTorch: Complete Guide to Training Image Generation Networks from Scratch

I’ve always been fascinated by how computers can learn to create images that look real. It started when I saw AI-generated art for the first time and wondered how a machine could mimic human creativity. That curiosity led me to explore Generative Adversarial Networks, or GANs. Today, I want to share a practical guide to building and training GANs using PyTorch. We’ll generate images step by step, and I’ll include code snippets to make it hands-on. If you’re ready to dive into this exciting area, let’s get started.

GANs work by pitting two neural networks against each other. One network, called the generator, tries to create fake images. The other, the discriminator, learns to tell real images from fake ones. They improve together through competition. Think of it as a forger and an art expert constantly trying to outsmart each other. This process pushes both to get better over time.

Setting up the environment is straightforward. We’ll use PyTorch for its flexibility and ease of use. Here’s how to install the necessary libraries:

pip install torch torchvision matplotlib numpy

Now, let’s import the modules and set up our device. Using a GPU speeds things up significantly if available.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Have you ever considered what makes an image look “real” to a machine? It’s all about patterns in the data. For this project, we’ll use the CIFAR-10 dataset, which has small color images. Preprocessing is key to helping the model learn effectively.

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
dataloader = DataLoader(dataset, batch_size=128, shuffle=True)

Next, we build the generator. This network takes random noise and transforms it into an image. I like to start simple and gradually increase complexity. Here’s a basic generator using convolutional layers:

class Generator(nn.Module):
    def __init__(self, input_dim=100, output_channels=3):
        super(Generator, self).__init__()
        self.main = nn.Sequential(
            nn.ConvTranspose2d(input_dim, 256, 4, 1, 0, bias=False),
            nn.BatchNorm2d(256),
            nn.ReLU(True),
            nn.ConvTranspose2d(256, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.ReLU(True),
            nn.ConvTranspose2d(128, 64, 4, 2, 1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(64, output_channels, 4, 2, 1, bias=False),
            nn.Tanh()
        )
    
    def forward(self, x):
        return self.main(x)

generator = Generator().to(device)

The discriminator acts as the critic. It examines images and decides if they’re real or fake. Balancing both networks is crucial; if one becomes too strong, training can stall.

class Discriminator(nn.Module):
    def __init__(self, input_channels=3):
        super(Discriminator, self).__init__()
        self.main = nn.Sequential(
            nn.Conv2d(input_channels, 64, 4, 2, 1, bias=False),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(64, 128, 4, 2, 1, bias=False),
            nn.BatchNorm2d(128),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(128, 256, 4, 2, 1, bias=False),
            nn.BatchNorm2d(256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Conv2d(256, 1, 4, 1, 0, bias=False),
            nn.Sigmoid()
        )
    
    def forward(self, x):
        return self.main(x).view(-1)

discriminator = Discriminator().to(device)

Training GANs can be tricky. I’ve found that using separate optimizers for each network helps maintain balance. We alternate between training the discriminator and the generator.

optimizer_g = torch.optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d = torch.optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))
criterion = nn.BCELoss()

for epoch in range(100):
    for i, (real_images, _) in enumerate(dataloader):
        batch_size = real_images.size(0)
        real_labels = torch.ones(batch_size, device=device)
        fake_labels = torch.zeros(batch_size, device=device)
        
        # Train discriminator
        optimizer_d.zero_grad()
        outputs = discriminator(real_images.to(device))
        loss_real = criterion(outputs, real_labels)
        
        noise = torch.randn(batch_size, 100, 1, 1, device=device)
        fake_images = generator(noise)
        outputs = discriminator(fake_images.detach())
        loss_fake = criterion(outputs, fake_labels)
        
        loss_d = loss_real + loss_fake
        loss_d.backward()
        optimizer_d.step()
        
        # Train generator
        optimizer_g.zero_grad()
        outputs = discriminator(fake_images)
        loss_g = criterion(outputs, real_labels)
        loss_g.backward()
        optimizer_g.step()

What do you think happens when the generator produces images that are too perfect too soon? It can cause the discriminator to struggle, leading to unstable training. Monitoring loss values and occasionally adjusting learning rates can help.

After training, it’s rewarding to see the generated images improve over time. You can save and visualize them to track progress.

with torch.no_grad():
    test_noise = torch.randn(16, 100, 1, 1, device=device)
    generated_images = generator(test_noise).cpu()
    # Denormalize and display images
    generated_images = generated_images * 0.5 + 0.5
    grid = torchvision.utils.make_grid(generated_images, nrow=4)
    plt.imshow(grid.permute(1, 2, 0))
    plt.axis('off')
    plt.show()

Through this process, I’ve learned that patience and experimentation are key. GANs open doors to creative applications, from art to data augmentation. I hope this guide helps you start your own projects. If you found this useful, please like, share, and comment with your experiences or questions. Let’s keep the conversation going and learn together!

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

deep_learning

Building GANs with PyTorch: Complete Guide to Training Image Generation Networks from Scratch

Our Creations

We are on Medium

Similar Posts

Custom CNN PyTorch Tutorial: Image Classification with Data Augmentation and Transfer Learning

Build Fraud Detection System with Deep Learning and Class Imbalance Handling Python

Build Custom ResNet Architecture with PyTorch: Complete Training to Production Guide

Build Real-Time Image Classification API with TensorFlow FastAPI: Complete Production Guide

Complete PyTorch CNN Tutorial: Build Image Classification Models from Scratch

Build PyTorch Image Captioning: Vision-Language Models to Production Deployment with Transformer Architecture