Project overview
AndresAI is a portfolio chatbot that answers questions about my career, stack, hobbies, and projects using a retrieval-augmented, model-agnostic agent. It's split into two decoupled services: a Next.js 16 frontend on Vercel that serves both the public chat and a private admin dashboard, and a FastAPI backend that runs the AI agent, owns the database, and exposes the streaming chat endpoint.
The agent is built on Pydantic AI and is model-agnostic — it can run on Claude, OpenAI, or any other provider the framework supports — and exposes a single retrieval tool over a pgvector knowledge base embedded with OpenAI's text-embedding-3-small. Both the system prompt and the retrievable facts live in PostgreSQL and can be edited from the admin without a redeploy — adding a new project or hobby instantly makes it answerable in chat.
The admin layers a WebSocket pub/sub channel for live counters that update the moment a new message lands, with realtime KPIs, latency percentiles, and full CRUD over every entity the agent reads from.
Key features
Token-streaming chat: The server echoes the user's prompt first so the UI updates within tens of milliseconds, then streams the assistant's response token-by-token. A Stop button aborts the in-flight stream mid-token via AbortController.
Model-agnostic Pydantic AI agent: A single, well-scoped tool — search_knowledge_base(category, query) — lets the agent decide when to retrieve. The system prompt is fetched from the database on each conversation, so persona changes ship without a deploy, and the underlying LLM provider (Claude, OpenAI, …) can be swapped without touching the agent's code.
RAG with pgvector: OpenAI text-embedding-3-small (1536-dim) indexed in the same PostgreSQL instance via pgvector. Cosine-distance similarity search filtered by category, with embeddings auto-regenerated whenever an admin edits a knowledge-base entry.
Realtime admin dashboard: Next.js admin at /admin with live KPI cards, throughput and latency charts, and full CRUD over users, conversations, messages, knowledge base, and agent contexts. Authenticated via an httpOnly JWT cookie and backed by a WebSocket pub/sub for live counters.
Persistent per-browser history: Each browser gets a UUID stored in localStorage; the conversation reloads on revisit. The server owns all message IDs and timestamps, so there's no optimistic UI or client-side dedupe.
Containerized, self-hosted backend: FastAPI, PostgreSQL + pgvector, Redis (for fastapi-limiter rate limits), and Caddy with automatic HTTPS, all orchestrated by Docker Compose. Logfire and Sentry trace every request, model call, and SQL query.