2. 1. What Vale Is and Why It Matters
Vale is a standalone Go binary that lints prose the way ESLint lints JavaScript. You define rules in YAML files, point Vale at your content, and it returns structured feedback — warnings, errors, and suggestions — scoped to headings, paragraphs, sentences, or any other markup element.
Key properties
- Markup-aware
- Vale understands Markdown, HTML, AsciiDoc, reStructuredText, Org mode, DITA, and MDX. It skips code blocks, avoids false positives from markup syntax, and can scope rules to specific elements (e.g., only check headings for title case).
- Deterministic
- Every rule produces the same output for the same input, every time. No API calls, no model drift, no hallucinations. This makes Vale auditable and suitable for CI/CD enforcement.
- Offline and private
- Content never leaves the local machine. Critical for proprietary documentation, internal playbooks, and anything you wouldn't paste into ChatGPT.
- Fast
- Benchmarks show Vale outperforms textlint, RedPen, write-good, and proselint. It can lint hundreds of files in seconds.
- No runtime dependencies
- Single binary. No Node.js, Python, or Java. Download, configure, run.
Who uses Vale
| Organization | Use case |
|---|---|
| Datadog | Oxford comma enforcement, jargon flagging, temporal word detection, abbreviation substitution across all documentation |
| GitLab | Extensive documentation testing integrated into CI pipeline |
| Grafana | Centralized style (Writers' Toolkit) applied across multiple repositories |
| Elastic | Product/feature name capitalization and spelling enforcement |
| Linode/Akamai | Documentation style consistency |
3. 2. The Rule System: 11 Check Types
Every Vale rule is a YAML file that extends one of 11 built-in check types. Each rule requires an extends field and a message field. Optional fields include level (suggestion, warning, error), scope (heading, paragraph, sentence, etc.), and link (URL to documentation).
| Check type | Purpose | Example use |
|---|---|---|
existence | Flag tokens that appear in content | Ban em dashes, flag weasel words like "arguably" or "basically" |
substitution | Map observed terms to preferred replacements | Replace "utilize" with "use", "leverage" with "use" |
occurrence | Enforce min/max counts of a token within a scope | No more than 3 commas per sentence |
repetition | Detect repeated tokens | Catch "the the" even across markup boundaries |
consistency | Ensure only one of two competing terms appears per document | "Color" vs. "colour", "advisor" vs. "adviser" |
conditional | If pattern A exists, pattern B must also exist | Every acronym (e.g., "API") must be defined on first use |
capitalization | Enforce case styles | Headings must use title case (AP or Chicago style) |
metric | Evaluate mathematical formulas on document-level variables | Flesch-Kincaid readability score must be below 8.0 |
sequence | Grammar rules using POS tagging | Flag split infinitives, dangling modifiers |
spelling | Hunspell-compatible dictionary spell checking | American English with custom technical dictionary |
script | Arbitrary logic via Tengo scripting language | Count paragraphs per section, enforce custom metrics |
Vale's regex engine supports lookahead and lookbehind ((?=re), (?!re), (?<=re), (?<!re)), making complex pattern matching possible without scripting.
4. 3. Writing Custom Rules (With Examples)
Rules are YAML files organized into style directories under your StylesPath. Here are practical examples for common writing constraints.
Ban em dashes
NoEmDashes.yml
extends: existence message: "Don't use em dashes. Use a comma, semicolon, colon, or period instead." level: error tokens: - '—' - '—' - '\u2014'
Ban passive voice
PassiveVoice.yml
extends: existence message: "'%s' looks like passive voice. Rewrite in active voice." level: warning ignorecase: true tokens: - 'is being \w+ed' - 'was \w+ed' - 'were \w+ed' - 'has been \w+ed' - 'have been \w+ed' - 'had been \w+ed' - 'will be \w+ed' - 'being \w+ed'
Enforce "we" over "I" in documentation
NoFirstPersonSingular.yml
extends: existence message: "Use 'we' instead of '%s' in documentation." level: warning scope: paragraph tokens: - '\bI\b' - '\bmy\b' - '\bmine\b' - '\bmyself\b'
Enforce short sentences
SentenceLength.yml
extends: occurrence message: "Sentence too long (%s words). Keep sentences under 25 words." level: warning scope: sentence max: 25 token: '\b\w+\b'
No weasel words
Weasel.yml
extends: existence message: "Remove the weasel word '%s'. Be specific." level: warning ignorecase: true tokens: - arguably - basically - clearly - essentially - extremely - generally - in order to - it should be noted - literally - obviously - quite - simply - somewhat - very - virtually
Consistent terminology
Terminology.yml
extends: substitution message: "Use '%s' instead of '%s'." level: error ignorecase: true swap: e-mail: email e mail: email repo: repository config: configuration 'open source': open-source web site: website data base: database end point: endpoint
Readability gate
FleschKincaid.yml
extends: metric message: "Flesch-Kincaid grade level (%s) is too high. Aim for 8.0 or below." level: warning formula: | (0.39 * (words / sentences)) + (11.8 * (syllables / words)) - 15.59 condition: '> 8.0'
No exclamation marks in headings
HeadingPunctuation.yml
extends: existence message: "Don't use '%s' in headings." level: error scope: heading tokens: - '!' - '\?'
5. 4. Configuration and Project Setup
A .vale.ini file at the project root controls everything.
Minimal setup
StylesPath = .vale/styles MinAlertLevel = suggestion Packages = Google, write-good[*.md] BasedOnStyles = Vale, Google, write-good
[*.html] BasedOnStyles = Vale, Google
Directory structure
project/
.vale.ini
.vale/
styles/
config/
vocabularies/
MyProject/
accept.txt # Approved terms (one per line)
reject.txt # Banned terms (one per line)
MyStyle/
NoEmDashes.yml
Weasel.yml
SentenceLength.yml
Google/ # Downloaded via `vale sync`
write-good/ # Downloaded via `vale sync`
Key configuration options
| Setting | Purpose |
|---|---|
StylesPath | Path to all styles, configs, and scripts |
Packages | Styles to download via vale sync |
Vocab | Vocabulary directories to load |
MinAlertLevel | Minimum severity to display: suggestion, warning, error |
BasedOnStyles | Which styles to apply (per file glob) |
TokenIgnores | Regex patterns for inline content to skip (e.g., LaTeX formulas) |
BlockIgnores | Regex patterns for block-level content to skip |
Vocabulary system
Vocabularies are project-specific term lists. accept.txt contains approved terms (auto-added to exception lists across all active styles and fed into Vale.Terms for casing enforcement). reject.txt contains banned terms (auto-populates Vale.Avoid as errors). Case-sensitive by default; prefix with (?i) for case-insensitive entries.
Running Vale
# Install brew install vale # macOS go install github.com/errata-ai/vale/v3/cmd/vale@latest # GoDownload packages
vale sync
Lint files
vale docs/ vale --output=JSON docs/ # Machine-readable output vale --glob='*.md' . # Filter by pattern
6. 5. Example Style Guide: Rules for This Site
Here is a concrete set of rules tailored for a personal website with blog articles and documentation. These encode specific editorial preferences as deterministic checks.
| Rule | Type | Level | Rationale |
|---|---|---|---|
| No em dashes | existence | error | Use commas, semicolons, or periods instead |
| No exclamation marks | existence | warning | Maintain a calm, measured tone |
| No weasel words | existence | warning | Be specific rather than hedging |
| Sentences under 25 words | occurrence | warning | Short sentences are easier to read |
| Readability below grade 8 | metric | warning | Accessible to a broad audience |
| American English spelling | consistency | error | Pick one and stick with it |
| No "click here" links | existence | error | Link text should describe the destination |
| Consistent terminology | substitution | error | "email" not "e-mail", "website" not "web site" |
| Title case headings | capitalization | warning | Consistent heading style |
| No repeated words | repetition | error | Catch "the the" typos |
| Acronyms defined on first use | conditional | warning | Don't assume the reader knows every abbreviation |
| No passive voice | existence | warning | Active voice is more direct and engaging |
Combined with reject.txt:
synergy synergize leverage learnings deep dive circle back move the needle low-hanging fruit paradigm shift disrupt
7. 6. Vale + LLMs: Architecture Patterns
Vale and LLMs have complementary strengths. Vale is deterministic, fast, offline, and auditable. LLMs understand context, nuance, tone, and can rewrite prose. Neither alone is sufficient for great writing. Together, they form a powerful feedback loop.
Pattern 1: Actor/Critic loop
The LLM writes (actor). Vale lints the output (critic). The LLM rewrites based on Vale's structured feedback. Repeat until clean.
1. LLM generates draft 2. Vale lints draft (vale --output=JSON draft.md) 3. If errors exist: a. Feed Vale's JSON output back to LLM b. LLM rewrites flagged passages c. Go to step 2 4. Output clean draft
This is the most practical pattern. Vale's JSON output is machine-readable, so the LLM can parse exactly which lines have issues, what the rule says, and what severity it is. The LLM can then make targeted fixes rather than rewriting the entire document.
Pattern 2: LLM as Vale interpreter
Vale finds issues. The LLM explains them in context and suggests specific rewrites. This is useful for writers who want to learn from the feedback rather than just accepting automated fixes.
1. Writer drafts content 2. Vale lints and produces structured output 3. LLM receives: original text + Vale output 4. LLM produces: explanation of each issue + suggested rewrite + reasoning 5. Writer reviews and decides
Pattern 3: LLM as pre-filter, Vale as enforcer
The LLM does a first pass for tone, structure, and coherence (things Vale cannot check). Vale then enforces the hard rules that the LLM might miss or ignore.
1. Writer drafts content 2. LLM reviews for: logical flow, argument structure, tone, audience fit 3. Writer incorporates LLM feedback 4. Vale enforces: terminology, sentence length, readability, banned patterns 5. Writer fixes Vale errors 6. Ship
Pattern 4: Dual enforcement in CI
Vale runs as a fast, cheap first pass in CI. Only documents with zero Vale errors get sent to an LLM for deeper analysis. This minimizes API costs.
1. PR opened with documentation changes 2. GitHub Action runs Vale (seconds, free) 3. If Vale passes: a. Send changed files to LLM API b. LLM checks: coherence, accuracy, tone c. LLM posts review comments on PR 4. If Vale fails: a. Block merge b. Author fixes deterministic issues first
8. 7. MCP Integration: Vale as an LLM Tool
The Vale-MCP server exposes Vale to AI coding assistants via the Model Context Protocol. This means an LLM like Claude can call Vale as a tool, lint a document, receive structured results, and provide contextual writing advice informed by Vale's rule-based findings.
How it works
Vale-MCP is a TypeScript/Node.js server that provides three MCP tools:
| Tool | Purpose |
|---|---|
vale_status | Check that Vale is installed and configured |
vale_sync | Download and install style packages |
check_file | Lint a file and return structured results |
Compatible clients
- Claude Desktop
- Claude Code (via MCP server config)
- Cursor
- VS Code with GitHub Copilot
- Any MCP-compatible client
This is the most natural integration point for using Vale with LLMs today. The LLM doesn't need to understand Vale's rule syntax; it just calls the tool and interprets the results. The LLM can then explain issues to the user, suggest fixes, or automatically rewrite flagged passages.
Setup
# Requirements: Node.js 22+, Vale 3.0+Add to your MCP client configuration:
{ "vale": { "command": "npx", "args": ["-y", "@christianchiama/vale-mcp"] } }
9. 8. Valegen: Generating Rules from Natural Language
Valegen is a web application that generates Vale YAML rules from plain English descriptions using RAG (Retrieval-Augmented Generation).
How it works
- User describes a desired rule in natural language: "Flag sentences that use passive voice"
- Valegen searches a vector database of Vale documentation and real-world rules from Vale core, Google, and Microsoft styles
- Retrieved context + user prompt are sent to an LLM (supports Gemini, GPT, and Claude)
- The LLM generates three candidate YAML rules with confidence ratings
- User picks the best one and saves it to their style directory
Why this matters
Writing Vale rules requires understanding YAML syntax, regex patterns, and Vale's check type system. Valegen lowers that barrier to zero. A technical writer with no programming experience can describe what they want in English and get a working rule. This makes it practical to encode an entire editorial style guide as Vale rules, even if the guide has dozens of preferences.
Example
| Natural language input | Generated rule (simplified) |
|---|---|
| "Don't allow sentences longer than 20 words" | extends: occurrence, scope: sentence, max: 20, token: '\b\w+\b' |
| "Replace 'utilize' with 'use'" | extends: substitution, swap: { utilize: use } |
| "No em dashes anywhere" | extends: existence, tokens: ['---', '—'] |
| "Headings should use sentence case" | extends: capitalization, scope: heading, match: $sentence |
10. 9. CI/CD Pipeline Integration
Vale has a first-party GitHub Action (errata-ai/vale-action) used by ~3,700 projects. It runs Vale on pull requests and surfaces results as inline annotations, PR reviews, or GitHub Checks.
Basic GitHub Actions setup
name: Lint Prose on: pull_requestjobs: vale: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: errata-ai/vale-action@v2 with: files: docs/ reporter: github-pr-review fail_on_error: true
Vale + LLM in CI (advanced)
name: Writing Quality on: pull_requestjobs: vale: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: errata-ai/vale-action@v2 with: files: docs/ reporter: github-pr-check fail_on_error: true
llm-review: needs: vale # Only runs if Vale passes runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Get changed docs id: changed run: | FILES=$(gh pr diff ${{ github.event.pull_request.number }} --name-only | grep -E '.(md|html)$') echo "files=$FILES" >> $GITHUB_OUTPUT - name: LLM review run: | # Send changed files to LLM API for tone/coherence review # Post results as PR comment
Pre-commit hook
# .pre-commit-config.yaml
repos:
- repo: https://github.com/errata-ai/vale
rev: v3.12.0
hooks:
- id: vale
args: [--glob, '*.md']
Editor integrations
| Editor | Integration |
|---|---|
| VS Code | "Vale CLI" extension (real-time linting, quick fixes) |
| Neovim | ALE plugin |
| JetBrains IDEs | Official Vale CLI plugin |
| Sublime Text | LSP package |
| Emacs | flymake-vale |
| Obsidian | Community plugin |
| Zed | LSP integration |
11. 10. Practical Workflow: End to End
Here is a concrete workflow for writing a blog article or documentation page using Vale and an LLM together.
Step 1: Set up Vale for the project
- Install Vale:
brew install vale - Create
.vale.iniat project root - Create custom style directory with your rules
- Run
vale syncto download community packages - Test with
vale docs/
Step 2: Write the first draft
Write freely. Don't self-censor. Get the ideas down. The tooling will catch style issues later.
Step 3: Run Vale
vale --output=JSON article.md > vale-report.json
Fix all errors (hard rules like terminology, banned patterns). Consider warnings (soft rules like sentence length, readability).
Step 4: LLM review
Send the article to an LLM with a prompt like:
Review this article for: - Logical flow and argument structure - Missing context that a reader would need - Tone consistency - Places where examples would helpDo NOT change terminology, formatting, or style conventions. Those are handled by our linter.
[article content]
The key instruction is telling the LLM not to touch what Vale already handles. This prevents the LLM from re-introducing banned patterns.
Step 5: Final Vale pass
After incorporating LLM feedback, run Vale again. The LLM may have introduced style violations in its suggestions. Vale catches them deterministically.
Step 6: Publish
If using CI, the PR will be automatically linted. Zero Vale errors required to merge.
12. 11. What Vale Catches vs. What LLMs Catch
| Concern | Vale | LLM |
|---|---|---|
| Banned words/phrases (em dashes, jargon) | Yes (deterministic) | Unreliable (may ignore instructions) |
| Consistent terminology | Yes (substitution rules) | Unreliable (may use synonyms) |
| Spelling | Yes (Hunspell dictionaries) | Unreliable |
| Sentence length | Yes (occurrence check) | Can suggest but not enforce |
| Readability metrics | Yes (Flesch-Kincaid, etc.) | No (cannot compute reliably) |
| Heading capitalization | Yes (AP/Chicago style) | Unreliable |
| Acronym definitions | Yes (conditional check) | Sometimes |
| Repeated words | Yes (across markup boundaries) | Sometimes |
| Logical coherence | No | Yes |
| Argument structure | No | Yes |
| Tone and voice assessment | No | Yes |
| Audience appropriateness | No | Yes |
| Missing context | No | Yes |
| Factual accuracy | No | Partially (with caveats) |
| Creative rewriting | No | Yes |
| Cultural sensitivity | Partial (alex style) | Yes |
The takeaway: Vale handles everything that can be expressed as a pattern. LLMs handle everything that requires understanding meaning. Using both means your writing is both mechanically correct and genuinely good.
As Datadog's engineering team put it: Vale's "crisp, computer-understandable rules" are foundational infrastructure that should exist before integrating LLMs. LLMs lack awareness of organization-specific style choices, but Vale encodes those choices precisely.
13. 12. Resources and Further Reading
- Vale on GitHub — source code, benchmarks, releases
- vale.sh — official documentation
- Vale-MCP — MCP server for AI assistant integration
- Valegen — generate Vale rules from natural language using LLMs
- vale-action — GitHub Actions integration
- Write Better with Vale — book by Brian P. Hogan (Pragmatic Programmers, 2025)
- Community packages — Google, Microsoft, write-good, proselint, alex styles