fix: 🐛 more stable ingestion

2026-03-08 17:28:29 +00:00
10 changed files with 130 additions and 100 deletions
@@ -1,3 +0,0 @@
-[submodule "toon-python"]
-	path = toon-python
-	url = git@github.com:toon-format/toon-python.git
@@ -1,22 +1,36 @@
-# Dungeon Masters Vault: Local RAG Assistant
+# 🐉 Dungeon Masters Vault: Local RAG Assistant

-An advanced Retrieval-Augmented Generation (RAG) system designed for Dungeon Masters. This tool ingests markdown-based campaign notes, enriches them with AI-generated metadata, and provides an interactive terminal interface to query your world's lore using **DSPy** and **Local LLMs**.
+An advanced Retrieval-Augmented Generation (RAG) system designed for Dungeon Masters. This tool ingests markdown-based campaign notes, enriches them with AI-generated metadata, and provides an interactive terminal interface to query your world’s lore using **DSPy** and **Local LLMs**.

-## Key Features
+## ⚔️ Key Features

-* **Parallel Enrichment:** Configurable multithreading processes multiple document chunks simultaneously across local LLM slots for high-speed ingestion.
-* **Deep Context Retrieval:** Retrieves relevant chunks and "peeks" at the full source file to provide the LLM with broader narrative context.
-* **Local-First:** Runs entirely on your hardware using **Ollama**, keeping your campaign secrets private.
+* **Parallel Enrichment:** Utilizes a configurable multithreading to process multiple document chunks simultaneously across local LLM slots for high-speed ingestion.
+* **Deep Context Retrieval:** Unlike standard RAG, this system retrieves relevant chunks and then "peeks" at the full source file to provide the LLM with broader narrative context.
+* **Local-First:** Designed to run entirely on your hardware using **LM Studio**, keeping your campaign secrets private.

 ---

-## Setup
+## 🏗️ Architecture
+
+1. **Ingestion:** Scans `DATA_DIR` for `.md` files.
+2. **Chunking:** Splits documents into 800-character segments with overlap.
+3. **Enrichment:** A DSPy `IngestionAgent` analyzes each chunk to extract:
+    * **Synopsis:** A one-sentence summary.
+    * **Tags:** Plot points, item names, or themes.
+    * **Entities:** Specific NPCs, Locations, or Factions.
+4. **Vector Store:** Chunks and metadata are embedded using `text-embedding-qwen3` and stored in a local **Turso** database.
+5. **Interactive RAG:** A terminal loop that uses **ReAct (Reasoning and Acting)** to answer queries based on retrieved context.
+
+---
+
+## 🛠️ Setup

 ### Prerequisites

-* **[UV](https://docs.astral.sh/uv/)** — Python package manager
-* **Ollama** — Running a local server (default `localhost:11434`)
-* **Local Models** — Pull your inference and embedding models with `ollama pull`
+* **UV [Link to install here](https://docs.astral.sh/uv/)**
+* **LM Studio:** Running a local server at `localhost:1234` (or your specific IP).
+* **Models:** * Inference & Embedding: Configurable for your preference. grab your model in LMStudio and update the conifg
+

 ### Installation

@@ -26,52 +40,52 @@ uv sync

 ---

-## Usage
+## 🚀 Usage

-### Ingest & Enrich
+### 1. Ingest & Enrich

-Process your markdown campaign files and build the vector database:
+Run the ingestion script to process your markdown files and build the vector database.

 ```bash
 uv run src/ingest.py
 ```

-### Query the LLM
+### 2. Query the LLM

-Launch the interactive session to ask questions about your campaign:
+Launch the interactive session to ask questions about your campaign.

 ```bash
 uv run src/retrieve.py
 ```

-**Example interaction:**
+**Example Query:**

-> Query: Why did the party get free bread at the Golden Grain Inn?
->
-> Based on the session notes from 'Session_12.md', the party received free bread because the Rogue intimidated the baker's assistant and the Cleric performed Thaumaturgy to impress the owner.
+> `📝 Query: Why did the party get free bread at the Golden Grain Inn?`  
+> `📜 AI RESPONSE: Based on the session notes from 'Session_12.md', the party received free bread because the Rogue successfully intimidated the baker's assistant, and the Cleric later performed a minor miracle (Thaumaturgy) that impressed the owner.`

 ---

-## File Structure
+## 📂 File Structure

 ```
 .
-├── config.yaml                   # App configuration
-├── load_ingestion_llms.sh        # Script to load multiple LLMs (run before ingest)
+├── config.yaml # Configuration for the app
+├── load_ingestion_llms.sh  # script to load multiple LLMs (Run before ingest)
 ├── README.md 
 ├── ROADMAP.md
-├── src/
-│   ├── config_loader.py          # Loads config.yaml
-│   ├── embedding.py              # Ollama embedding model client
-│   ├── experts/
-│   │   ├── ingestion_agent.py    # AI agent for document enrichment
-│   │   └── retrieval_agent.py    # AI agent for queries, with tools and DB calls
-│   ├── ingest.py                 # Campaign notes ingestion script
-│   └── retrieve.py               # Interactive Q&A interface
-├── data/                         # Campaign database (gitignored)
-│   ├── dmv.db
-│   ├── dmv.log
-│   └── time_file.txt
+├── src
+│   ├── config_loader.py # Loads the config yaml file
+│   ├── embedding.py  # Class to talk to LMStudio Embedding Model Server
+│   ├── experts
+│   │   ├── ingestion_agent.py # Agent Class for ingestion enrichment
+│   │   └── retrieval_agent.py # Agent Class for retrieval, with tools and database calls
+│   ├── ingest.py # Ingestion script to load your DnD Campaign Notes
+│   └── retrieve.py # main Q&A for your notes
+├── data # GitIgnored Folder for Notes Database
+│   ├── dmv.db
+│   ├── dmv.db-wal
+│   ├── dmv.log
+│   └── time_file.txt
 ├── pyproject.toml
 ├── LICENSE
 └── uv.lock
@@ -79,12 +93,12 @@ uv run src/retrieve.py

 ---

-## Configuration
+## ⚙️ Configuration

-Edit `config.yaml` to customize:
+In `config.yaml`, you can adjust multiple things:

-* Inference and embedding models
-* Campaign notes location (`data_dir`)
-* System prompts for ingestion and retrieval agents
+* Enrichment / embedding & Retrieval Mdels
+* DnD Notes Location (data_dir)
+* System Prompts for Ingestion & Retrieval Agents

 ---
@@ -11,11 +11,15 @@

 ## Planned Next

-* AI in the middle - make the llm generate multiple queries for a wider search
+* database retrieve for tag or entity

 ## Planned Later

 * entity chunking & re-ranking
 * Logging in Ingestion
-* database retrieve for tag or entity
-*
+* More robust ingestion - llm response sometimes out of expected
+
+
+## Done
+
+* AI in the middle - make the llm generate multiple queries for a wider search
@@ -1,47 +1,61 @@
 # --- Connection Settings ---
 api:
-  base_url: "http://100.110.238.94:11434"
+  base_url: "http://framework.tawny-bellatrix.ts.net:1234"
  api_version: "/v1/"

 # --- Model Settings ---
 models:
-  enrich: "ollama/granite4.1:3b"
-  embedding: "qwen3-embedding:4b"
-  retrieval: "ollama/qwen3.6:latest"
-  expansion: "ollama/granite4.1:3b"
+  enrich: "lm_studio/qwen-" # will have an identifier, based on amount of active LLMs see ./load_ingestion_llms.sh
+  embedding: "text-embedding-qwen3-embedding-8b"
+  retrieval: "lm_studio/qwen/qwen3-30b-a3b-2507"
+  expansion: "lm_studio/qwen/qwen3-30b-a3b-2507"

 # --- Ingestion Settings ---
 ingestion:
  data_dir: "/home/jake/DnD"
  db_path: "./data/"
  db_name: "dmv.db"
-  active_llms: 1
-  parallel_requests_per_llm: 6
-  chunk_size: 1200
-  chunk_overlap: 200
+  active_llms: 2
+  parallel_requests_per_llm: 4
+  chunk_size: 800
+  chunk_overlap: 100
  embedding_batch_size: 32
  time_file_location: "./data/time_file.txt"

 # ---- Agent Settings ----
 ingestion_agent:
  ingestion_signature: |
-    You are an expert Dungeon Master's assistant.
-    Analyze the provided notes and extract a concise synopsis and relevant metadata.
-    synopsis = A one-sentence summary of the document.
-    tags = Relevant tags (NPCs, Locations, Items, Plot Points).
-    entities = a list of names for people, places, or factions.
-    "note -> synopsis:str, tags: list[str], entities: list[str]"
+    You are an expert Dungeon Master's assistant specialized in campaign note enrichment.
+    Your task is to analyze DnD session notes and extract structured metadata.
+
+    Follow these guidelines:
+    - SYNOPSIS: One concise sentence capturing the key event or development (use active voice)
+    - TAGS: Extract 3-7 relevant tags from: Campaign arcs, NPC names, Locations, Items, Spells, Factions, Plot hooks, Themes
+    - ENTITIES: List all proper nouns (NPCs, locations, organizations) - be specific and consistent with naming
+    The TAGS and ENTITIES must be a list of strings, not json objects
+    Format output as JSON with keys: synopsis, tags, entities

 retrieval_agent:
  retrieval_signature: |
-    You are an expert Dungeon Master's assistant.
-    Given the context and the question, answer the question.
-    Do not make things up, base all of your answers on the context.
-    Always site the file location of your source of information.
+    You are an expert Dungeon Master's assistant helping to run a campaign.
+    When answering questions about your DnD world:
+
+    1. Strictly use ONLY the provided context from campaign notes
+    2. If information is incomplete, infer plausibly based on established lore (flag inferences)
+    3. Always cite sources: "Per [filename], [quote/summary]"
+    4. Maintain character voice and narrative style when appropriate
+    5. For rules questions, distinguish between rules-as-written and DM interpretation
+
+    Provide comprehensive answers that help you run the game, including relevant details about NPCs, locations, or plot points.

 expansion_agent:
  expansion_signature: |
-    You are an expert Dungeon Master's assistant.
-    Given a user's question, generate 3-5 similar but enhanced search queries that would help find more relevant information in DnD notes.
-    Each expanded query should be distinct and add different perspective to the original question.
-    Return only the queries as a JSON list with key "queries"."""
+    You are a query expansion expert specialized in Dungeons & Dragons campaign management.
+
+    Given a user question about their DnD world, generate 3-5 enhanced search queries that:
+    - Cover different aspects (characters, locations, lore, rules)
+    - Include synonyms and related terms (e.g., "dragon" → "wyrm", "scales" → "armor")
+    - Address potential follow-up questions the DM might have
+    - Vary specificity (broad to narrow)
+
+    Return ONLY a JSON array with key "queries". Keep queries concise (5-10 words each).
@@ -0,0 +1,5 @@
+lms load qwen-4b-instruct-2507 --parallel 2 --identifier "qwen-0" --ttl 1800
+lms load qwen-4b-instruct-2507 --parallel 2 --identifier "qwen-1" --ttl 1800
+# lms load qwen-4b-instruct-2507 --parallel 2 --identifier "qwen-2" --ttl 1800
+# lms load qwen-4b-instruct-2507 --parallel 2 --identifier "qwen-3" --ttl 1800
+# lms load qwen-4b-instruct-2507 --parallel 2 --identifier "qwen-4" --ttl 1800
@@ -10,7 +10,7 @@ API_VERSION = CFG["api"]["api_version"]

 class LocalLMEmbeddings(Embeddings):
    def __init__(self, model: str, base_url: str = API_BASE, batch_size: int = 32):
-        self.url = f"{base_url}/api/embed"
+        self.url = f"{base_url}/{API_VERSION}embeddings"
        self.model = model
        self.batch_size = batch_size

@@ -22,11 +22,10 @@ class LocalLMEmbeddings(Embeddings):
            response = requests.post(
                self.url, json=payload, timeout=120
            )  # Longer timeout for batches
-            # print(response)
            response.raise_for_status()
            data = response.json()
            # print(data)
-            return data["embeddings"]
+            return [item["embedding"] for item in data["data"]]
        except Exception as e:
            print(f"❌ Batch request failed: {e}")
            # Returning empty lists to maintain index integrity if needed,
@@ -18,7 +18,7 @@ EXPANSION_CONFIG = CFG["expansion_agent"]

 def retrieve_from_turso(embedded_question, k=5):
    query = f"""
-    SELECT file_path, synopsis, tags, chunk_data,
+    SELECT file_path, synopsis, tags, entities, chunk_data,
    vector_distance_cos(embedding, vector32('{embedded_question}')) AS distance
    FROM notes
    ORDER BY distance ASC
@@ -55,7 +55,9 @@ class DnDRAG(dspy.Module):
            base_url=API_BASE,
            # batch_size=1,
        )
-        self.retrieval_lm = dspy.LM(model=CFG["models"]["retrieval"], api_base=API_BASE)
+        self.retrieval_lm = dspy.LM(
+            model=CFG["models"]["retrieval"], api_base=API_BASE + CFG["api"]["api_version"]
+        )
        with dspy.context(lm=self.retrieval_lm, signature=ExpansionSignature):
            self.query_expander = dspy.Predict("question -> queries:list[str]")

@@ -66,9 +68,9 @@ class DnDRAG(dspy.Module):
        print("Enhancing Question")
        with dspy.context(lm=self.retrieval_lm):
            expanded_queries = self.query_expander(question=question).queries
-        # print("Enhanced Queries:")
-        # for q in expanded_queries:
-        #     print("    ", q)
+        print("Enhanced Queries:")
+        for q in expanded_queries:
+            print("    ", q)
        all_embeddings = self.embeddings_model.embed_documents([question] + expanded_queries)
        # print(all_embeddings)
        all_results = []
@@ -79,7 +81,7 @@ class DnDRAG(dspy.Module):
        seen = set()
        unique_results = []
        for row in all_results:
-            key = (row[0], row[3])
+            key = (row[0], row[4])
            if key not in seen:
                seen.add(key)
                unique_results.append(row)
@@ -89,18 +91,18 @@ class DnDRAG(dspy.Module):
            source = row[0]
            synopsis = row[1]
            tags = row[2]
-            # entities = row[3]
-            content = row[3]
-            closeness = row[4]
+            entities = row[3]
+            content = row[4]
+            closeness = row[5]

            context_parts.append(f"""
 --- Chunk {i + 1} from {source} ---
 synopsis: {synopsis},
 tags: {tags},
+entities: {entities},
 closeness: {closeness},
 {content}
 """)
-        # entities: {entities},

        context = "\n\n".join(context_parts)

@@ -83,7 +83,7 @@ def enrich_chunks(chunks: list) -> list:

        try:
            with dspy.context(
-                lm=dspy.LM(model=f"{MODEL_BASE}", api_base=API_BASE),
+                lm=dspy.LM(model=f"{MODEL_BASE}{lm_index}", api_base=API_BASE + API_VERSION),
                chat_template_kwargs={"enable_thinking": False},
            ):
                response = IngestionAgent().ingest(note=chunk.page_content)
@@ -140,11 +140,10 @@ def embed_chunks(chunks: List[Any], batch_size: int = EMBEDDING_BATCH_SIZE) -> L
    # Process chunks in batches
    for i in tqdm(range(0, total_chunks, batch_size), desc="Embedding batches"):
        batch = chunks[i : i + batch_size]
-        # print(f"🚀 Processing batch {(i // batch_size) + 1} (Size: {len(batch)})...")
+        print(f"🚀 Processing batch {(i // batch_size) + 1} (Size: {len(batch)})...")
        batch_content = [chunk.page_content for chunk in batch]
        try:
            batch_embeddings = embeddings_model.embed_documents(batch_content)
-            # print(len(batch_embeddings[0]))
            # Process each chunk in the batch
            for j, (chunk, embedding) in enumerate(zip(batch, batch_embeddings)):
                # Extract metadata
@@ -177,8 +176,8 @@ def embed_chunks(chunks: List[Any], batch_size: int = EMBEDDING_BATCH_SIZE) -> L
            print(f"⚠️ Batch processing failed at index {i}: {e}")
            # Fallback: process individually (if needed)
            for j, chunk in enumerate(batch):
-                try:
                content = chunk.page_content
+                try:
                    embedding = embeddings_model.embed_query(content)

                    file_path_orig = chunk.metadata.get("full_path", "unknown")
@@ -234,15 +233,8 @@ def save_to_db(chunk_dicts):
    # SQL with named placeholders for clarity and safety
    insert_sql = """
    INSERT INTO notes (
-        file_path,
-        file_name,
-        chunk_data,
-        synopsis,
-        tags,
-       -- entities,
-        embedding,
-        timestamp
-    ) VALUES (?, ?, ?, ?, ?, vector32(?), ?)
+        file_path, file_name, chunk_data, synopsis, tags, entities, embedding, timestamp
+    ) VALUES (?, ?, ?, ?, ?, ?, vector32(?), ?)
    """

    # Prepare batch data: convert each dict to a tuple in correct order
@@ -258,7 +250,10 @@ def save_to_db(chunk_dicts):
                entry["chunk_data"],
                entry["synopsis"],
                ",".join(entry["tags"]),  # Store as comma-separated string
-                # ",".join(entry["entities"]),  # Store as comma-separated string
+                ",".join(
+                    str(e) if isinstance(e, str) else e.get("name", str(e))
+                    for e in entry["entities"]
+                ),  # Store as comma-separated string
                embedding_str,
                entry["timestamp"],
            )
@@ -285,8 +280,8 @@ def create_db():
        chunk_data TEXT NOT NULL,
        synopsis TEXT,
        tags TEXT,  -- comma-separated
-        -- entities TEXT,  -- comma-separated
-        embedding F32_BLOB(2560),
+        entities TEXT,  -- comma-separated
+        embedding F32_BLOB(4096),
        timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
    )
    """)
@@ -87,7 +87,8 @@ def main():
    dspy.configure(verbose_errors=True)
    dspy.configure(callbacks=[CallbackHandler(logger)])
    # 1. Setup the LLM
-    lm = dspy.LM(RETRIEVE_MODEL, api_base=API_BASE)
+    print("🚀 Initializing Qwen-8B via LM Studio...")
+    lm = dspy.LM(RETRIEVE_MODEL, api_base=API_BASE + API_VERSION)
    dspy.configure(lm=lm)

    # 2. Load the RAG System (only happens once!)