Large Language Models Apr 11, 2026

How to Build Long-Term Memory for AI Chatbots with pgvector and PostgreSQL

Learn how to build long-term memory for AI chatbots using pgvector and PostgreSQL to deliver personalized, context-aware conversations.

Let’s talk about something we all struggle with: forgetting. I was reminded of this recently when a helpful chatbot I built forgot a user’s specific coffee order preferences between sessions. It felt clumsy, impersonal. That moment sparked a question: how can we make artificial intelligence not just smart, but thoughtful? How can we give it a proper memory, something that lasts beyond a single conversation?

This isn’t about keeping a simple log of the last few messages. That’s just a short-term buffer. I’m interested in building a long-term memory—a system where an LLM-powered application can recall your preferences, past decisions, and important facts weeks or months later, weaving them seamlessly into new conversations. It’s the difference between a forgetful acquaintance and a trusted partner.

The core challenge is moving beyond the chat history. We need a system that stores, finds, and uses information intelligently. Ever wondered how your favorite tools seem to just know what you need? The answer often lies in a smart memory layer.

So, how do we build it? We need a place to store memories that understands meaning, not just keywords. This is where vector databases shine. By converting text into numerical embeddings—essentially, a mathematical fingerprint of its meaning—we can search for memories based on conceptual relevance, not just exact word matches. For this, PostgreSQL with the pgvector extension is a robust, production-ready choice.

But a database is just storage. We need something to manage the logic: deciding what to remember, how to find it, and when to use it. This is where a memory management layer comes in. Let’s build one step-by-step.

First, we define what a “memory” even is. Not every chat message deserves long-term storage.

# What gets saved? Not everything.
memory_categories = {
    "fact": "The user's company is called 'SolarFlare Tech'.",
    "preference": "They prefer detailed code examples with explanations.",
    "decision": "They chose the Python SDK for integration on March 15th.",
    "summary": "A condensed version of a lengthy discussion about API design."
}

Next, we set up our foundation. We’ll use a simple database schema.

-- Core memory storage table
CREATE TABLE memories (
    id UUID PRIMARY KEY,
    user_id TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding VECTOR(1536), -- Stores the meaning as numbers
    memory_type TEXT,
    created_at TIMESTAMP
);

With our storage ready, the real magic happens in two parts: saving and recalling. Saving a memory means creating an embedding for it. Think of this as giving the memory a unique, searchable ID based on its idea.

# Simplified example of creating and storing a memory
import openai
from pgvector.psycopg2 import register_vector

def create_memory(user_id, text, memory_type="fact"):
    # Turn the text into an embedding (its numerical fingerprint)
    response = openai.embeddings.create(model="text-embedding-3-small", input=text)
    embedding = response.data[0].embedding

    # Store text and its embedding together
    cursor.execute(
        "INSERT INTO memories (user_id, content, embedding, memory_type) VALUES (%s, %s, %s, %s)",
        (user_id, text, embedding, memory_type)
    )

Now for the recall. When a user starts a new conversation, we don’t dump their entire history into the prompt. That’s inefficient and wastes precious token space. Instead, we search for the most relevant past memories related to their current query.

def get_relevant_memories(user_id, current_query, limit=5):
    # Get the embedding for the user's current question
    response = openai.embeddings.create(model="text-embedding-3-small", input=current_query)
    query_embedding = response.data[0].embedding

    # Find the most semantically similar stored memories
    cursor.execute(
        """
        SELECT content FROM memories
        WHERE user_id = %s
        ORDER BY embedding <=> %s  -- This operator calculates cosine similarity
        LIMIT %s
        """,
        (user_id, query_embedding, limit)
    )
    return [row[0] for row in cursor.fetchall()]

This gives us a handful of the most contextually appropriate past snippets. But is relevance enough? What about time? A memory from yesterday about changing a project deadline is probably more important than a general preference from six months ago. A good system balances relevance with recency. We can adjust our search to score and rank memories based on both factors.

Finally, we inject these selected memories into the LLM’s prompt. We format them as a simple, clear context block.

# Constructing the final prompt with memory context
relevant_memories = get_relevant_memories("user_123", "What was the API key format we discussed?")

memory_context = "\n".join([f"- {mem}" for mem in relevant_memories])

final_prompt = f"""
Previous context relevant to this conversation:
{memory_context}

Current user question: What was the API key format we discussed?
"""

And just like that, the AI has context. It can now respond, “Based on our talk last Tuesday, you decided to use the ‘Bearer token’ format for your API keys.” The interaction feels continuous and intelligent.

What about managing this memory over time? Systems can get bloated. We might add logic to summarize old, related memories into a single concise one, or gently fade out memories that haven’t been used in a very long time. This keeps the system fast and focused.

Building this changes everything. It transforms one-off interactions into ongoing relationships. The AI becomes a tool that learns and adapts alongside the user. The code examples above are the basic building blocks. From here, you can layer in complexity: setting importance scores, handling different memory categories, and creating a seamless API for your applications.

The result is an application that feels remarkably human in its ability to remember. It’s not just recalling data; it’s building context. And in a world saturated with AI, that context is what creates real value and trust. Isn’t that the kind of application we all want to build and use?

I’d love to hear what you think. What’s the first feature you would add to this memory system? Share your thoughts in the comments below, and if you found this walk-through helpful, please pass it along to someone else who might be building the next generation of thoughtful AI applications.

As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!

101 Books

101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.

Check out our book Golang Clean Code available on Amazon.

Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!

📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!

Our Creations

Be sure to check out our creations:

We are on Medium

Keywords: AI chatbot memorylong-term memorypgvectorPostgreSQLvector database

How to Build Long-Term Memory for AI Chatbots with pgvector and PostgreSQL

101 Books

Our Creations

We are on Medium

More from our team

Similar Posts

Complete Production-Ready RAG Systems Guide: LangChain Vector Databases Implementation Tutorial

How to Build Real-Time AI Apps with Streaming Responses and SSE

Build Production-Ready RAG Systems with LangChain and Chroma: Complete Implementation Guide

Build Production-Ready RAG Systems: LangChain, ChromaDB, and Advanced Retrieval Optimization Guide 2024

How to Build Production-Ready RAG Systems with LangChain and Vector Databases: Complete 2024 Guide

Production-Ready RAG Systems with LangChain: Complete Implementation Guide for Vector Databases