Skip to main content
Stop Documentation Drift featured

Stop Documentation Drift

March 29, 2026

5 min read
DevelopmentToolingMeta
Photo by Tanja Tepavac

I use npm run docs:check as a lightweight guardrail against documentation drift.

Under the hood, that command runs one script: node scripts/check-docs.js.

This post uses that script as a concrete example.

The bigger idea is reusable: add one fast check that catches stale doc references and contract drift before they hit production.

Why documentation drifts

Docs drift for boring reasons:

  • files get renamed
  • paths move during refactors
  • APIs evolve while docs stay frozen

None of that is dramatic on its own. The pain shows up later, when links break or examples no longer reflect reality.

The pattern in one sentence

Index the repo, extract file references from docs, validate those paths, verify a small set of symbol contracts1, and fail fast when anything drifts.

That is exactly what this script does.

How it works in this repo

1. Define what to check

The script starts with an explicit docs scope and symbol contracts:

const DOC_FILES = ['README.md', 'ARCHITECTURE.md', 'COMPONENTS.md', 'MAP.md'];

SYMBOL_CHECKS then lists high-signal string markers in source files that should not disappear accidentally.

2. Build a repository index

walkFiles() recursively builds allRepoFiles while skipping noise directories (node_modules, .git, public).

That index makes basename lookups2 possible (for references like seo.jsx), not just full path lookups.

3. Extract references from docs

collectReferencedPaths(text) scans docs using:

  • CODE_SPAN_RE for inline code references
  • MD_LINK_RE for markdown links

Each candidate is passed through normalizeDocPath().

4. Normalize aggressively3

normalizeDocPath(rawPath) removes fragments, wrappers, and obvious non-file references.

It rejects things like URLs (://), mailto: links, and /memories/ paths, then enforces a file-extension allowlist via FILE_EXT_RE.

One additional guard: any path-like reference whose first segment is not a known top-level directory in the repo is discarded. This prevents phantom references to plausible-looking paths that don't map to anything real.

This is where most false positives are filtered out.

5. Resolve path and basename references

resolveReference(ref) supports two formats:

  1. path-like references with /
  2. basename-only references like navigation.jsx

Path-like refs are checked with fs.existsSync(path.join(repoRoot, rel)).

Basename refs are matched against allRepoFiles.

Resolution behavior:

  • one match: pass
  • multiple matches: pass (first match wins)
  • zero matches: fail

That ambiguity rule is intentional. It favors practical drift detection over strict disambiguation.

6. Run two check passes

Documentation pass:

  1. read each file in DOC_FILES
  2. extract and normalize references
  3. resolve references
  4. collect misses in missingPaths

Symbol pass:

  1. ensure target file exists
  2. read source text
  3. verify each needle4 via code.includes(needle)
  4. collect misses in missingSymbols

Example symbol contracts:

  • export default Seo; in src/components/seo.jsx
  • slug: PropTypes.string.isRequired in src/components/navigation-item.jsx
  • export default function useIsBrowser() in src/hooks/useIsBrowser.jsx

7. Exit cleanly for CI5

If both miss lists are empty, the script exits 0.

Otherwise, it prints missing references/contracts and exits 1.

That makes it easy to run locally and in CI without extra tooling.

Failing docs check output showing a missing file reference

How to adapt this pattern to your repo

If you want to build your own version, start with this checklist:

  1. Choose which docs are source of truth (DOC_FILES).
  2. Decide which file extensions count as valid references.
  3. Add a small set of high-value symbol contracts.
  4. Handle both full paths and basename references.
  5. Wire the script into npm scripts and CI.
  6. Keep output readable so failures are quick to fix.

Keep it small first. You can always tighten rules later.

Minimal starter you can copy

If you want a working baseline, this is enough to get started.

This starter focuses on markdown link checks first; symbol contract checks can be layered in after.

1. Starter script

const fs = require('fs');
const path = require('path');

const repoRoot = path.resolve(__dirname, '..');
const DOC_FILES = ['README.md'];
const FILE_EXT_RE = /[A-Za-z0-9_./-]+\.(?:md|mdx|js|jsx|json|yml|yaml)$/;

const missing = [];

for (const doc of DOC_FILES) {
  const docPath = path.join(repoRoot, doc);
  if (!fs.existsSync(docPath)) continue;

  const text = fs.readFileSync(docPath, 'utf8');
  const refs = [...text.matchAll(/\[[^\]]+\]\(([^)]+)\)/g)].map((m) => m[1]);

  refs.forEach((ref) => {
    const normalized = ref.split('#')[0].replace(/^\.\//, '').replace(/^\//, '');
    if (!FILE_EXT_RE.test(normalized)) return;

    const exists = fs.existsSync(path.join(repoRoot, normalized));
    if (!exists) missing.push(`${doc}: ${normalized}`);
  });
}

if (missing.length === 0) {
  console.log('Docs check passed: references and symbol contracts are in sync.');
  process.exit(0);
}

console.error('Missing doc references:');
missing.forEach((m) => console.error(`- ${m}`));
process.exit(1);

2. Wire it into npm scripts

{
  "scripts": {
    "docs:check": "node scripts/check-docs.js"
  }
}

3. Run it in CI

- name: Check docs drift
	run: npm run docs:check

4. Expected output

These match the starter script above, not the production script in this repo (which has a more descriptive pass message and a different fail header).

Pass:

Docs check passed: references and symbol contracts are in sync.
Passing docs check output with synced references and symbol contracts

Fail:

Missing doc references:
- README.md: src/components/old-file.jsx

Once this baseline works, add symbol checks and stricter normalization rules.

Current limits

Because this approach is string-based, it does not:

  • Understand AST-level semantics6
  • Detect narrative drift in prose
  • Strictly disambiguate ambiguous basename references

Those tradeoffs are acceptable here because the goal is speed and signal, not perfect static analysis.


Closing thought

This script is custom to this project, but the strategy is broadly reusable.

A fast docs drift check catches a surprising amount of real breakage for very little maintenance cost.

Footnotes

  1. A symbol contract is a small, explicit code marker (string) that should remain present, used as a lightweight stability check.

  2. A basename is just the filename without its directory path (for example, seo.jsx instead of src/components/seo.jsx).

  3. Normalization means cleaning and standardizing input so equivalent references can be compared consistently.

  4. In search examples, "needle" is a naming convention for the target term you are trying to find (from "needle in a haystack").

  5. CI means continuous integration: automated checks that run in your pipeline before changes are merged or deployed.

  6. AST-level semantics refers to understanding code via its parsed syntax tree structure, not only via plain text matching.

Marc Santos

Marc Santos

Full-Stack Engineer & Product Developer

I write about what I’m actually building — features on this site, developer tooling, and applied computer vision — with the occasional detour into photography.

When I’m not at a screen, I’m usually underwater, on a mountainside, or somewhere new with a camera.

About MarcBuild a product together

Keep reading

Building a Travel Map

Building a Travel Map

Mar 15, 2026

Fifteen Years In (and Counting)

Fifteen Years In (and Counting)

Mar 01, 2026