Post

Your Notes Are the Wrong Shape for AI

Your Notes Are the Wrong Shape for AI

I have 800+ notes in an Obsidian vault. An AI agent searches them dozens of times a day. Some notes return perfect results. Others are useless. The difference isn’t content quality. It’s shape.

Notes written as long, multi-topic documents consistently fail at retrieval. Notes written as atomic, single-idea files with explicit links consistently succeed. This isn’t a preference. It’s a structural property of how AI retrieval works.

The retrieval problem

When an AI agent searches your notes, it converts your query into a vector embedding and finds the notes closest in semantic space. This is the core of every RAG (retrieval-augmented generation) system. The problem is what happens when the “closest” note is a 2,000-word document about five different topics.

The embedding for a long document is an average across all its ideas. A note titled “Project Alpha Planning” that covers architecture decisions, team concerns, timeline risks, and vendor evaluations produces an embedding that’s sort of close to queries about all of those topics and precisely close to none of them. It’s a blurry centroid.

A note titled “Vendor lock-in risk increases with proprietary data formats” produces an embedding that matches exactly the queries it should match. It’s a sharp point in semantic space, not a cloud.

Research backs this up. A 2025 cross-domain evaluation of chunking strategies for dense retrieval found that content-aware splitting significantly outperforms naive fixed-length chunking. Smaller, focused chunks (128-256 tokens) achieve 8-15% higher recall at retrieval time compared to large chunks. The optimal unit for retrieval is not a document. It’s a single, coherent idea.

Atomic notes are that unit, pre-chunked by the author at the point of maximum semantic understanding.

Atomic notes: human-quality chunking

RAG systems spend enormous effort on chunking strategies. Fixed-size splits. Recursive character splitting. Semantic boundary detection. Sliding windows with overlap. Parent-document retrieval. All of these are attempts to solve the same problem: documents are too big and too unfocused for precise retrieval.

Atomic notes skip the problem entirely. When every note contains one idea, there’s nothing to chunk. The retrieval unit and the authoring unit are the same thing. The author, who understands the material, has already done the work that algorithms approximate.

Niklas Luhmann understood this in the 1960s. His Zettelkasten contained over 90,000 index cards, each holding one idea. He described the system not as a filing cabinet but as a “communication partner” that could surprise him by surfacing connections he hadn’t planned. The value wasn’t in any single card. It was in the 90,000 precise nodes connected by deliberate links.

The same principle applies to AI retrieval, but with a mechanical advantage Luhmann didn’t have. When your notes are atomic, the AI doesn’t need to parse which section of a long document is relevant. It retrieves exactly the note that matches, and that note contains exactly one idea, complete and self-contained.

Atomic notes without links are isolated data points. Atomic notes with links are a knowledge graph.

This matters because the hardest AI retrieval problems aren’t single-hop lookups. “What’s our vendor evaluation for data storage?” is easy. Any retrieval system finds that note. The hard problems are multi-hop: “How does our vendor lock-in risk interact with the migration timeline and the team’s capacity constraints?”

Answering that requires connecting three different notes. A flat vector search might find one or two of them. But if your notes link to each other ([[Vendor lock-in risk]] links to [[Migration timeline depends on data portability]] links to [[Team capacity is the binding constraint in Q3]]), the AI can follow those links and assemble context that no single embedding would retrieve.

This is the insight behind GraphRAG, which emerged as a major research direction in 2024-2025. Instead of retrieving isolated chunks, GraphRAG follows edges in a knowledge graph to pull back connected clusters. The graph it traverses? You build it by linking your notes.

Mark Granovetter’s 1973 research on social networks (the most-cited paper in sociology) found that weak ties (connections between distant clusters) are more valuable for discovery than strong ties (connections within a cluster). The same applies to notes. A link between your “cognitive biases” note and your “software architecture anti-patterns” note creates a weak tie that bridges two knowledge domains. That cross-domain link is exactly the kind of connection that vector search alone won’t find but graph traversal will.

Every wikilink you write today is an edge your AI can follow tomorrow.

Two graphs, not one

Your notes actually live in two graphs simultaneously.

The explicit graph is the one you build through wikilinks. [[Note A]] links to [[Note B]] because you’ve reasoned about their relationship. These edges carry semantic precision. “This contradicts X.” “This is an instance of Y.” “This depends on Z.” No embedding model can infer that level of specificity.

The implicit graph is the one that emerges when your notes are embedded in vector space. Notes with similar meanings cluster together, even if you never linked them. A note about “decision fatigue” written in January and a note about “cognitive resource depletion in crisis management” written in June might sit close together in embedding space. A connection you never noticed, but the math surfaced.

AI tools that work with your notes (Obsidian’s Smart Connections, any semantic search extension, an agent like mine) operate on both graphs. They retrieve from the implicit graph (semantic similarity) and can traverse the explicit graph (follow your links for context). Together, these two layers give AI both the serendipity of discovery and the precision of your own reasoning.

But only if your notes are shaped for it. Long, multi-topic documents are blurry in the implicit graph and have imprecise links in the explicit one (what exactly does a link to a 2,000-word document mean?). Atomic notes are sharp in both.

Confidence becomes computable

There’s an underappreciated benefit of atomic notes for AI: metadata becomes actionable.

A single note with a confidence: conjecture tag tells the AI something precise: “This claim comes from one source. Treat it as a hypothesis.” A single note with confidence: corroborated says: “Multiple independent sources agree on this.”

Try putting confidence levels on a 2,000-word document. Does the whole document have the same confidence? Of course not. The first paragraph might be established fact and the last paragraph might be speculation. Per-document metadata can’t capture that. Per-idea metadata can.

The same applies to sources, timestamps, and tags. When every note is about one thing, every piece of metadata is about one thing. The AI can reason about it precisely instead of treating it as an approximate label on a heterogeneous blob.

What this looks like in practice

Here’s a concrete example from my own system. I track competitive Pokémon strategy notes in my vault. Each note is one idea:

  • “Tailwind lasts four turns including the turn it’s used”
  • “Icy Wind is preferred over Electroweb when ground types are common”
  • “Speed control is the most important team building consideration in doubles”

Each has a confidence field, a sources list, and wikilinks to related notes. When I ask my AI agent “what speed control options work against ground-heavy teams?”, it retrieves exactly the relevant notes, knows their confidence level, and can follow links to team compositions that use those options.

If all of that lived in one document called “Pokémon Strategy Notes,” the agent would retrieve the whole thing and have to figure out which parts matter. With atomic notes, retrieval is surgical.

The objection: “This is too much work”

Writing atomic notes takes more upfront effort than dumping everything into a long document. That’s true. But consider what you’re actually doing: you’re pre-processing your knowledge into a format that both humans and AI can retrieve precisely.

Long documents are write-optimized. They’re easy to create because you just keep typing. Atomic notes are read-optimized. They cost more to write but pay dividends every time they’re retrieved, whether by you or by an AI.

And you don’t have to do it all at once. Tiago Forte’s progressive summarization model describes how notes ripen over time. Start with rough captures. Refine them when you revisit. Decompose long notes into atomic ones as their ideas prove useful. The notes that matter will naturally get sharpened. The ones that don’t will stay rough, and that’s fine.

The inbox-to-evergreen pipeline is itself a knowledge management system: capture loosely, process weekly, promote the best ideas into atomic notes, delete the rest. Most fleeting thoughts die, and that’s the point. The survivors get the structure they deserve.

The punchline

Every RAG system in production is solving a chunking problem: how do you split documents into units that retrieve well? Atomic notes are the answer that predates the question by sixty years. Luhmann was writing retrieval-optimized knowledge nodes on index cards in the 1960s. He just didn’t have a vector database to prove it.

If you’re using an AI assistant (a coding agent, a writing tool, a research copilot, anything that searches your notes), the shape of those notes determines the quality of what the AI can do with them. Long documents produce blurry retrieval. Atomic notes produce sharp retrieval. Links between atomic notes produce a traversable knowledge graph that enables the multi-hop reasoning where AI actually shines.

Your notes aren’t just for you anymore. They’re for every AI system that will ever search them. Make them the right shape.

This post is licensed under CC BY 4.0 by the author.