Build a Real-Time Object Detection API with YOLOv8 and FastAPI: Complete Python Tutorial

deep_learning

Build a Real-Time Object Detection API with YOLOv8 and FastAPI: Complete Python Tutorial

Learn to build a production-ready real-time object detection system with YOLOv8 and FastAPI. Complete tutorial with webcam streaming, batch processing, and Docker deployment.

Aug 4, 2025

Build a Real-Time Object Detection API with YOLOv8 and FastAPI: Complete Python Tutorial

I’ve been fascinated by the rapid advancements in computer vision, especially how quickly object detection has evolved. Just last week, while watching security cameras identify delivery drones in real-time, I realized how accessible this technology has become. Today, I’ll show you how to create your own real-time detection system using cutting-edge tools. Follow along as we build something practical that you can adapt for security systems, retail analytics, or even wildlife monitoring.

Setting up our environment is straightforward. We need Python packages for computer vision and web services:

pip install ultralytics fastapi uvicorn opencv-python-headless pillow websockets

Our project structure keeps components organized. We separate configuration, models, services, and API routes - this makes maintenance easier as our system grows. Why do you think clean architecture matters in machine learning projects?

Configuration management comes first. We use Pydantic for robust settings:

# settings.py
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    model_name: str = "yolov8n.pt"
    confidence_threshold: float = 0.25
    max_file_size: int = 10 * 1024 * 1024  # 10MB
    
settings = Settings()

Next, we define data models for our detection results. Precise data structures ensure consistent API responses:

# detection.py
from pydantic import BaseModel

class BoundingBox(BaseModel):
    x1: float  # Top-left X
    y1: float  # Top-left Y
    x2: float  # Bottom-right X
    y2: float  # Bottom-right Y

class Detection(BaseModel):
    class_name: str
    confidence: float
    bbox: BoundingBox

Now for the core - our YOLOv8 detector service. This class handles model loading and inference:

# detector.py
from ultralytics import YOLO
import cv2
import torch

class YOLODetector:
    def __init__(self):
        self.model = YOLO("yolov8n.pt")
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device)
    
    def detect(self, image):
        results = self.model(image)
        return self._process_results(results)
    
    def _process_results(self, results):
        detections = []
        for result in results:
            for box in result.boxes:
                x1, y1, x2, y2 = box.xyxy[0].tolist()
                detections.append(Detection(
                    class_name=result.names[int(box.cls)],
                    confidence=box.conf.item(),
                    bbox=BoundingBox(x1=x1, y1=y1, x2=x2, y2=y2)
                ))
        return detections

Notice how we automatically use GPU if available? That’s crucial for real-time performance. What other optimizations might boost speed for video streams?

With our detection engine ready, we build the FastAPI interface. We’ll create three endpoints: single image processing, batch processing, and real-time video:

# routes.py
from fastapi import FastAPI, UploadFile
from fastapi.responses import StreamingResponse
import cv2

app = FastAPI()
detector = YOLODetector()

@app.post("/detect")
async def detect_image(file: UploadFile):
    image = cv2.imdecode(np.frombuffer(await file.read(), np.uint8), cv2.IMREAD_COLOR)
    detections = detector.detect(image)
    return {"detections": detections}

@app.get("/live")
async def video_stream():
    camera = cv2.VideoCapture(0)
    async def generate_frames():
        while True:
            success, frame = camera.read()
            if not success: break
            detections = detector.detect(frame)
            annotated_frame = _draw_boxes(frame, detections)
            _, buffer = cv2.imencode('.jpg', annotated_frame)
            yield b'--frame\r\nContent-Type: image/jpeg\r\n\r\n' + buffer.tobytes() + b'\r\n'
    return StreamingResponse(generate_frames(), media_type="multipart/x-mixed-replace;boundary=frame")

For the live endpoint, we stream video with bounding boxes drawn in real-time. The StreamingResponse efficiently handles frame-by-frame delivery. How might we scale this for multiple simultaneous users?

Performance monitoring is essential. We add middleware to track processing times:

# metrics.py
import time
from fastapi import Request

async def track_performance(request: Request, call_next):
    start_time = time.time()
    response = await call_next(request)
    process_time = time.time() - start_time
    response.headers["X-Process-Time"] = str(process_time)
    return response

Finally, we containerize our application for easy deployment:

FROM python:3.10-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app /app
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]

This Dockerfile creates a lightweight production image under 500MB. You could deploy this on cloud services or edge devices like Raspberry Pi with GPU acceleration.

We’ve built a complete system that handles images, video batches, and live streams. The real magic happens when you adapt it to your needs - add custom trained models for specialized detection, integrate with alert systems, or analyze traffic patterns. What problem would you solve with this technology?

If you found this guide helpful, share it with others exploring computer vision! I’d love to hear about your implementation ideas in the comments - what creative applications can you imagine for real-time object detection?

Share: Facebook Twitter Reddit LinkedIn WhatsApp Telegram Pinterest Email Instagram

deep_learning

Build a Real-Time Object Detection API with YOLOv8 and FastAPI: Complete Python Tutorial

Our Creations

We are on Medium

Similar Posts

PyTorch Transfer Learning: Build Multi-Class Image Classifier for Production in 2024

How to Build a Production-Ready Named Entity Recognition (NER) System

Build and Fine-Tune Vision Transformers for Image Classification: Complete PyTorch Guide with Advanced Techniques

Build Real-Time Object Detection System with YOLOv8 and OpenCV in Python Complete Tutorial

Build Custom Vision Transformers with PyTorch: Complete Guide from Architecture to Production Deployment

Complete PyTorch Face Recognition System: From Data Preprocessing to Real-Time Production Deployment