D&D Second Brain
Overview:
What you’ll have: an Obsidian vault built from your selected PDFs (converted to
Markdown), plus a chatbot plugin configured to run RAG over that vault only.
How answers are produced: when you ask a question in Obsidian, the chatbot:
1. pulls the most relevant passages from your vault,
2. combines them, and
3. writes a concise reply predicted from those passages (no external sources unless you
explicitly allow them).
Behavior guarantees:
Output is derived solely from your included documents (scope-limited RAG).
No manual “conflict handling” rules—if your sources differ, the reply is synthesized
from what was retrieved (it may present multiple variants if that’s what the passages say).
No citations required by default; you can toggle links to the source notes/sections if you
want.
Search & retrieval: standard keyword search remains available in Obsidian; the chatbot uses
AI-assisted passage matching over your vault to fetch its context (top-k configurable).
Privacy/control: your vault lives locally; the retrieval index is local; the LLM endpoint used
by the chatbot is your choice (cloud or local). Nothing leaves the machine unless you configure
it to.
Performance target (for the test set): ask a question → get a synthesized answer and
optional jump-links into the relevant notes in roughly 1–3 seconds on your Mac.
Maintenance: you add or remove documents from the vault; reindexing picks up the
changes; no training, no synthetic data, no automatic file moves.
PDFs Converted to Markdown:
Folder layout (exact)
DND_Second_Brain/
├─ pdf/ # original PDFs only
└─ md/ # one .md file per PDF, same base name
File naming (exact)
For every pdf/Name.ext, create md/Name.md.
Keep the same base name (“Name”).
If a filename has characters your tools choke on, replace only these:
o spaces → _
o :?*"<>|→_
Result must still match between PDF and MD.
What each Markdown file must look like
(exact)
Encoding: UTF-8, Unix line endings (\n).
Format: plain Markdown only (no HTML blocks).
Headings: use #, ##, ### where they exist in the text.
Lists: use - for bullets; 1. for numbered lists.
Tables: Markdown pipe tables if present, otherwise plain text.
Images/figures: omit (text-only output).
Links: none required.
Page markers: insert a plain comment line between pages:
<!-- page:12 -->
No hard wraps inside paragraphs. One paragraph = one line (no mid-sentence breaks).
Processing pipeline (what your n8n run must
do)
For each file in pdf/ that doesn’t already have a matching md/Name.md:
1. OCR if needed
o If the PDF text layer is missing/empty, run OCR so text becomes selectable.
2. Extract text
o Pull the text in reading order. Keep basic structure if available (headings, lists,
tables).
3. Convert to Markdown
o Generate clean Markdown (no HTML), using #/-/tables as above.
4. Fix paragraphs
o Join broken lines inside paragraphs (remove end-of-line breaks that aren’t list
items or headings).
5. Normalize lists
o Make all bullets - ; make ordered lists 1. lines. Keep a blank line before/after
lists.
6. Remove repeating junk
o Delete header/footer lines that repeat every page (book title, running head, lone
page numbers).
7. Insert page markers
o Add <!-- page:N --> between pages (N = original page number). This is your
only “anchor” format.
8. Save
o Write to md/Name.md (UTF-8, \n). Do not overwrite an existing file unless you
intend to refresh it.
9. Log
o If anything fails, do not create the .md. Record the PDF name and the error so you
can retry.
Quality checklist (each .md should pass)
Opens in any Markdown editor; no garbled characters (UTF-8 good).
Headings render as headings.
Lists render as lists.
No random line breaks inside sentences.
You can search for a phrase from the PDF and find it in the .md.
<!-- page:N --> appears at expected spots.
Exactly what to not produce
No asset folders.
No embedded images.
No HTML tables.
No front-matter (YAML) unless you later decide to add it by hand.
No extra folders beyond pdf/ and md/.