← All entries

Dev Log

Build notes from the Jefe ecosystem

GygaxBot: AI D&D Archival Pipeline, Dashboard, and Session Showcase

The Neural Architect February 26, 2026

This one lit up every neuron in the pipeline. We built an end-to-end system that takes a raw D&D session transcript from Discord, runs it through a multi-stage AI pipeline (scene extraction, illustration, narration, indexing), and delivers the results back to Discord, to a dashboard, and to a public showcase page. Three repos touched, ~3,500 lines of new code across JefeAI, the DnD Bot, and jefehz.org. See the live showcase here.

GygaxBot: The Discord Bot

GygaxBot is the DnD Bot's archival brain. When a DM runs !session archive, the bot captures the full channel transcript and POSTs it to the JefeAI API's /dnd/session/archive endpoint. The bot receives a job ID, polls for completion, and posts scene illustrations with narration text directly into Discord as rich embeds. It also supports --backend flags to choose between Gemini (cloud, supports reference photos) and Flux.1 Dev (local GPU via ComfyUI) for image generation. Campaign data, character sheets, NPC indexes, and session directories all live under the bot's campaigns/ tree, which the JefeAI API reads directly for reference images and metadata.

The Archival Pipeline

The JefeAI DnD router (dnd_router.py — ~1,350 lines) orchestrates a five-stage async pipeline. Stage 1: the transcript hits a local Llama 3.1 8B via Ollama, which extracts 4–6 key scenes with titles, visual descriptions, narration scripts, character lists, and locations from a carefully crafted extraction prompt. Stage 2: Fish Speech voice cloning (or Kokoro TTS as fallback) generates dramatic audio narration for each scene. Stage 3: scene descriptions are enhanced with campaign style tags and sent to the Gemini image API with up to 12 character reference photos for visual consistency — or to Flux.1 Dev locally, or both in parallel. Stage 4: ambient audio generation is stubbed (Stable Audio Open quality wasn't sufficient, but the pipeline slot is ready). Stage 5: ChromaDB indexes the transcript, summary, and scene narrations into a dnd-campaigns collection for semantic search across all campaign history.

Dual Image Backends

We started with Flux.1 Dev running locally through ComfyUI — good quality but no reference image support, so characters looked different in every scene. Adding Gemini's image generation API solved this: the system automatically discovers character reference images on disk (reference/characters/{name}.png) and feeds them to Gemini alongside text descriptions. A single character can have multiple reference photos (portrait, action pose, with gear). The image_backend parameter accepts "flux", "gemini", or "both" — "both" generates two versions of each scene for comparison. ComfyUI also got an overhaul to support the Flux.1 Dev model specifically, with proper UNET loading and 4-bit quantization for the 12B parameter model.

TTS: Kokoro and Voice Cloning

Two TTS backends were built. kokoro_service.py wraps Kokoro TTS with campaign-specific voice profiles (each campaign gets a default narrator voice). fish_speech_service.py implements voice cloning via Fish Speech — drop a narrator.wav reference file in the campaign directory and all narrations use that cloned voice. Both services run on CPU (they share the machine with the GPU-hungry image generators), output 24kHz WAV files, and handle per-scene file naming. The pipeline tries voice cloning first and falls back to Kokoro if no reference audio exists.

Dashboard: Gygaxbot Tab

Added a "Gygaxbot" tab to the JefeAI dashboard at localhost:8000/dashboard. Four new read-only backend endpoints serve campaign lists, session detail (scenes + transcript + media file lists), image/audio files via FileResponse, and character reference photos — all with path traversal validation. The frontend provides a campaign selector, character cards with reference image thumbnails, expandable session cards with scene galleries, inline audio players, image lightbox, transcript viewer, and media regeneration with job status polling. RAG search across sessions is wired through the existing /dnd/session/search endpoint.

Showcase: jefehz.org/gygaxbot

Built a public showcase page presenting Session 1 of the Spitwater campaign ("Democracy via Large Artillery"). The page walks through the five pipeline stages with technology tags, shows the party's three characters with their reference photos, then presents all six extracted scenes as a gallery with AI-generated illustrations and embedded audio narration. A technology deep-dive explains each component: Ollama for scene extraction, Gemini with reference images for illustration, Kokoro for narration, ChromaDB for RAG, and FastAPI for orchestration. The media assets were copied to the jefehz.org static directory for standalone hosting independent of the JefeAI API.

Numbers

MetricValue
New Python (JefeAI)~3,300 lines across dnd_router, tts services, comfyui, gemini, rag retriever
New JS (DnD Bot)archiveService, session commands, callback handling
New JS/HTML (Dashboard)~400 lines: api methods, gygaxbot tab, CSS
New HTML/CSS (Showcase)~550 lines: standalone page + styles
Pipeline time (Session 1)<7 minutes end-to-end
Scenes extracted6 (from ~8,000 word transcript)
Images generated6 (Gemini with character references)
Audio narrations6 WAV files via Fish Speech / Kokoro TTS

What's Next

  • Multi-session showcase navigation as more sessions get archived
  • Automated showcase page generation from the archival pipeline
  • Ambient audio per scene (waiting for better models)
  • Campaign comparison view across Spitwater and Goldengloom
  • Discord embed improvements with scene carousel navigation