How to Build an AI Chatbot Memory System with Redis, PostgreSQL, and pgvector
Learn how to build an AI chatbot memory system using Redis, PostgreSQL, and pgvector for persistent, context-aware conversations.
I was building yet another chatbot when I hit the same old wall. It was brilliant for a single conversation but completely forgot everything once the page refreshed. It was like talking to someone with amnesia every single day. This frustration isn’t just mine; it’s the biggest gap between a fun demo and a genuinely useful AI application. A tool that can’t remember your preferences, past conversations, or learned facts isn’t a tool—it’s a novelty. That’s what pushed me to move beyond simple session caches and build a memory system that works for real people. Let’s build something that learns and lasts.
To start, think of memory not as one thing, but three. You have your short-term recall, like remembering what you just said in a chat. You have your episodic memory, the story of past events. Then you have semantic memory, the facts and concepts you know. An AI needs versions of all three to feel coherent.
Our first layer is quick, temporary, and essential for a flowing chat. We’ll use Redis. It’s incredibly fast and perfect for holding the active conversation. We’ll store the recent back-and-forth, but we must keep it tidy. Let it grow too large, and you’ll waste money and slow everything down for no reason.
Here’s a simple way to keep that chat history clean and fast. We store messages in a list but trim it to only keep the latest few. We also set a time limit, so idle sessions don’t clutter the system forever.
import redis
import json
class ChatBuffer:
def __init__(self, redis_url, session_id, max_length=10):
self.redis = redis.from_url(redis_url)
self.key = f"chat:{session_id}:messages"
self.max_len = max_length
def add_exchange(self, user_input, ai_response):
# Store a turn of conversation
turn = json.dumps({"user": user_input, "ai": ai_response})
# Push new, trim old, all in one step
self.redis.lpush(self.key, turn)
self.redis.ltrim(self.key, 0, self.max_len - 1)
# Make it expire after 1 day of inactivity
self.redis.expire(self.key, 86400)
def get_recent_history(self):
data = self.redis.lrange(self.key, 0, -1)
return [json.loads(turn) for turn in reversed(data)]
This gives our AI a working mind for the current talk. But what happens when the conversation ends? Those insights vanish. That brings us to the second layer: saving the story.
Saving every single chat line forever is messy and expensive. Instead, we can periodically summarize the conversation. This creates a condensed story, or “episode,” that we can store. It’s like taking meeting notes instead of saving the full video recording.
How do we create a good summary? We can ask the AI itself. After a chat reaches a natural breakpoint, we feed the recent buffer to a model with a simple instruction: “Briefly summarize the key points and outcomes of this conversation.”
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
summarizer = ChatOpenAI(model="gpt-4-turbo-preview")
prompt = ChatPromptTemplate.from_template("""
You are creating a concise narrative summary of a conversation.
Extract key user facts, decisions, and resolved topics.
Conversation:
{chat_history}
Summary:
""")
def create_episode(history_text):
chain = prompt | summarizer
summary = chain.invoke({"chat_history": history_text})
return summary.content
We save this summary to a standard PostgreSQL database. It’s just text, linked to a user ID and a timestamp. Now we have a searchable log of what happened. But is this enough? What if you ask, “What have I told you about my project deadlines?” Sorting through pages of summaries is clumsy. This is where our third layer changes the game.
We need memory based on meaning, not just keywords. This is where vector databases come in. When we save an episode summary, we also convert its meaning into a mathematical vector—a list of numbers that represents its concepts. We store this vector using pgvector, an extension for PostgreSQL. Later, when you ask a question, we convert your question into a vector too and find the most semantically similar memories from the past.
Ever wonder how a system can just “know” what’s relevant? It’s not magic; it’s math. It’s finding stored memories that “point in a similar direction” in a high-dimensional space.
Here’s the core idea. We generate an embedding (the vector) for our text and save it alongside the summary.
from langchain_openai import OpenAIEmbeddings
import psycopg2
from pgvector.psycopg2 import register_vector
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
def store_memory(user_id, summary_text):
# Generate the vector for the summary
vector = embeddings.embed_query(summary_text)
# Connect and store in PostgreSQL with pgvector
conn = psycopg2.connect(DATABASE_URL)
register_vector(conn)
cur = conn.cursor()
cur.execute(
"INSERT INTO memory_log (user_id, summary, embedding) VALUES (%s, %s, %s)",
(user_id, summary_text, vector)
)
conn.commit()
cur.close()
conn.close()
When a new question comes in, we don’t just look at the last chat. We search this vector space for related past episodes.
def find_related_memories(user_id, query, limit=3):
# Convert the question to a vector
query_vector = embeddings.embed_query(query)
conn = psycopg2.connect(DATABASE_URL)
register_vector(conn)
cur = conn.cursor()
# Find the closest stored vectors by cosine similarity
cur.execute("""
SELECT summary
FROM memory_log
WHERE user_id = %s
ORDER BY embedding <=> %s
LIMIT %s
""", (user_id, query_vector, limit))
results = cur.fetchall()
cur.close()
conn.close()
return [row[0] for row in results]
Now, for the final act: we bring all three layers together. When a user sends a message, our system does four things. It gets the recent chat from Redis. It fetches relevant long-term memories from PostgreSQL. It combines these with the new question into a final prompt. It sends this rich, context-packed prompt to the AI for a response that knows you.
This architecture turns a forgetful program into a thoughtful assistant. The Redis buffer keeps the chat snappy. The episodic summaries in PostgreSQL create a searchable history. The semantic vector search in pgvector allows for smart, relevant recall. You’re not just getting a reply; you’re getting a reply from an agent that has a sense of your history.
Building this changed how I see AI applications. It’s the difference between a single-use tool and a persistent collaborator. The code is just the start; the real impact is in creating systems that grow with their users. What kind of application could you transform if it truly remembered?
If this journey from a stateless bot to a memory-aware agent was helpful, please share it with someone who’s building the next wave of AI tools. I’d love to hear your thoughts or questions in the comments below—what’s the first thing you’d build with a memory system like this?
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva