I use npm run docs:check as a lightweight guardrail against documentation drift.
Under the hood, that command runs one script: node scripts/check-docs.js.
This post uses that script as a concrete example.
The bigger idea is reusable: add one fast check that catches stale doc references and contract drift before they hit production.
Why documentation drifts
Docs drift for boring reasons:
- files get renamed
- paths move during refactors
- APIs evolve while docs stay frozen
None of that is dramatic on its own. The pain shows up later, when links break or examples no longer reflect reality.
The pattern in one sentence
Index the repo, extract file references from docs, validate those paths, verify a small set of symbol contracts1, and fail fast when anything drifts.
That is exactly what this script does.
How it works in this repo
1. Define what to check
The script starts with an explicit docs scope and symbol contracts:
const DOC_FILES = ['README.md', 'ARCHITECTURE.md', 'COMPONENTS.md', 'MAP.md'];
SYMBOL_CHECKS then lists high-signal string markers in source files that should not disappear accidentally.
2. Build a repository index
walkFiles() recursively builds allRepoFiles while skipping noise directories (node_modules, .git, public).
That index makes basename lookups2 possible (for references like seo.jsx), not just full path lookups.
3. Extract references from docs
collectReferencedPaths(text) scans docs using:
CODE_SPAN_REfor inline code referencesMD_LINK_REfor markdown links
Each candidate is passed through normalizeDocPath().
4. Normalize aggressively3
normalizeDocPath(rawPath) removes fragments, wrappers, and obvious non-file references.
It rejects things like URLs (://), mailto: links, and /memories/ paths, then enforces a file-extension allowlist via FILE_EXT_RE.
One additional guard: any path-like reference whose first segment is not a known top-level directory in the repo is discarded. This prevents phantom references to plausible-looking paths that don't map to anything real.
This is where most false positives are filtered out.
5. Resolve path and basename references
resolveReference(ref) supports two formats:
- path-like references with
/ - basename-only references like
navigation.jsx
Path-like refs are checked with fs.existsSync(path.join(repoRoot, rel)).
Basename refs are matched against allRepoFiles.
Resolution behavior:
- one match: pass
- multiple matches: pass (first match wins)
- zero matches: fail
That ambiguity rule is intentional. It favors practical drift detection over strict disambiguation.
6. Run two check passes
Documentation pass:
- read each file in
DOC_FILES - extract and normalize references
- resolve references
- collect misses in
missingPaths
Symbol pass:
- ensure target file exists
- read source text
- verify each
needle4 viacode.includes(needle) - collect misses in
missingSymbols
Example symbol contracts:
export default Seo;insrc/components/seo.jsxslug: PropTypes.string.isRequiredinsrc/components/navigation-item.jsxexport default function useIsBrowser()insrc/hooks/useIsBrowser.jsx
7. Exit cleanly for CI5
If both miss lists are empty, the script exits 0.
Otherwise, it prints missing references/contracts and exits 1.
That makes it easy to run locally and in CI without extra tooling.

How to adapt this pattern to your repo
If you want to build your own version, start with this checklist:
- Choose which docs are source of truth (
DOC_FILES). - Decide which file extensions count as valid references.
- Add a small set of high-value symbol contracts.
- Handle both full paths and basename references.
- Wire the script into npm scripts and CI.
- Keep output readable so failures are quick to fix.
Keep it small first. You can always tighten rules later.
Minimal starter you can copy
If you want a working baseline, this is enough to get started.
This starter focuses on markdown link checks first; symbol contract checks can be layered in after.
1. Starter script
const fs = require('fs');
const path = require('path');
const repoRoot = path.resolve(__dirname, '..');
const DOC_FILES = ['README.md'];
const FILE_EXT_RE = /[A-Za-z0-9_./-]+\.(?:md|mdx|js|jsx|json|yml|yaml)$/;
const missing = [];
for (const doc of DOC_FILES) {
const docPath = path.join(repoRoot, doc);
if (!fs.existsSync(docPath)) continue;
const text = fs.readFileSync(docPath, 'utf8');
const refs = [...text.matchAll(/\[[^\]]+\]\(([^)]+)\)/g)].map((m) => m[1]);
refs.forEach((ref) => {
const normalized = ref.split('#')[0].replace(/^\.\//, '').replace(/^\//, '');
if (!FILE_EXT_RE.test(normalized)) return;
const exists = fs.existsSync(path.join(repoRoot, normalized));
if (!exists) missing.push(`${doc}: ${normalized}`);
});
}
if (missing.length === 0) {
console.log('Docs check passed: references and symbol contracts are in sync.');
process.exit(0);
}
console.error('Missing doc references:');
missing.forEach((m) => console.error(`- ${m}`));
process.exit(1);
2. Wire it into npm scripts
{
"scripts": {
"docs:check": "node scripts/check-docs.js"
}
}
3. Run it in CI
- name: Check docs drift
run: npm run docs:check
4. Expected output
These match the starter script above, not the production script in this repo (which has a more descriptive pass message and a different fail header).
Pass:
Docs check passed: references and symbol contracts are in sync.

Fail:
Missing doc references:
- README.md: src/components/old-file.jsx
Once this baseline works, add symbol checks and stricter normalization rules.
Current limits
Because this approach is string-based, it does not:
- Understand AST-level semantics6
- Detect narrative drift in prose
- Strictly disambiguate ambiguous basename references
Those tradeoffs are acceptable here because the goal is speed and signal, not perfect static analysis.
Closing thought
This script is custom to this project, but the strategy is broadly reusable.
A fast docs drift check catches a surprising amount of real breakage for very little maintenance cost.
Footnotes
-
A symbol contract is a small, explicit code marker (string) that should remain present, used as a lightweight stability check. ↩
-
A basename is just the filename without its directory path (for example,
seo.jsxinstead ofsrc/components/seo.jsx). ↩ -
Normalization means cleaning and standardizing input so equivalent references can be compared consistently. ↩
-
In search examples, "needle" is a naming convention for the target term you are trying to find (from "needle in a haystack"). ↩
-
CI means continuous integration: automated checks that run in your pipeline before changes are merged or deployed. ↩
-
AST-level semantics refers to understanding code via its parsed syntax tree structure, not only via plain text matching. ↩
