~ / startup analyses / Vale + LLMs: Deterministic Prose Linting Meets AI for Better Writing


Vale + LLMs: Deterministic Prose Linting Meets AI for Better Writing

Vale is an open-source, command-line prose linter written in Go. It applies code-like linting to written content — enforcing style guides, catching grammar issues, and ensuring terminological consistency. It runs entirely offline, processes Markdown, HTML, AsciiDoc, reStructuredText, and more, and requires no runtime dependencies. With 5,300+ GitHub stars and 3M+ downloads, it has become the industry standard for documentation teams at Datadog, GitLab, Grafana, and Elastic.

Core thesis: Vale and LLMs are complementary, not competing. Vale provides fast, deterministic, auditable rule enforcement (no em dashes, no passive voice, consistent terminology). LLMs provide contextual understanding, nuanced rewriting, and tone assessment. Combined in an actor/critic pipeline, they produce writing that is both stylistically consistent and genuinely good. This report details how to set up that pipeline for documentation and blog articles.



2. 1. What Vale Is and Why It Matters

Vale is a standalone Go binary that lints prose the way ESLint lints JavaScript. You define rules in YAML files, point Vale at your content, and it returns structured feedback — warnings, errors, and suggestions — scoped to headings, paragraphs, sentences, or any other markup element.

Key properties

Markup-aware
Vale understands Markdown, HTML, AsciiDoc, reStructuredText, Org mode, DITA, and MDX. It skips code blocks, avoids false positives from markup syntax, and can scope rules to specific elements (e.g., only check headings for title case).
Deterministic
Every rule produces the same output for the same input, every time. No API calls, no model drift, no hallucinations. This makes Vale auditable and suitable for CI/CD enforcement.
Offline and private
Content never leaves the local machine. Critical for proprietary documentation, internal playbooks, and anything you wouldn't paste into ChatGPT.
Fast
Benchmarks show Vale outperforms textlint, RedPen, write-good, and proselint. It can lint hundreds of files in seconds.
No runtime dependencies
Single binary. No Node.js, Python, or Java. Download, configure, run.

Who uses Vale

OrganizationUse case
DatadogOxford comma enforcement, jargon flagging, temporal word detection, abbreviation substitution across all documentation
GitLabExtensive documentation testing integrated into CI pipeline
GrafanaCentralized style (Writers' Toolkit) applied across multiple repositories
ElasticProduct/feature name capitalization and spelling enforcement
Linode/AkamaiDocumentation style consistency

3. 2. The Rule System: 11 Check Types

Every Vale rule is a YAML file that extends one of 11 built-in check types. Each rule requires an extends field and a message field. Optional fields include level (suggestion, warning, error), scope (heading, paragraph, sentence, etc.), and link (URL to documentation).

Vale's 11 check types
Check typePurposeExample use
existenceFlag tokens that appear in contentBan em dashes, flag weasel words like "arguably" or "basically"
substitutionMap observed terms to preferred replacementsReplace "utilize" with "use", "leverage" with "use"
occurrenceEnforce min/max counts of a token within a scopeNo more than 3 commas per sentence
repetitionDetect repeated tokensCatch "the the" even across markup boundaries
consistencyEnsure only one of two competing terms appears per document"Color" vs. "colour", "advisor" vs. "adviser"
conditionalIf pattern A exists, pattern B must also existEvery acronym (e.g., "API") must be defined on first use
capitalizationEnforce case stylesHeadings must use title case (AP or Chicago style)
metricEvaluate mathematical formulas on document-level variablesFlesch-Kincaid readability score must be below 8.0
sequenceGrammar rules using POS taggingFlag split infinitives, dangling modifiers
spellingHunspell-compatible dictionary spell checkingAmerican English with custom technical dictionary
scriptArbitrary logic via Tengo scripting languageCount paragraphs per section, enforce custom metrics

Vale's regex engine supports lookahead and lookbehind ((?=re), (?!re), (?<=re), (?<!re)), making complex pattern matching possible without scripting.


4. 3. Writing Custom Rules (With Examples)

Rules are YAML files organized into style directories under your StylesPath. Here are practical examples for common writing constraints.

Ban em dashes

NoEmDashes.yml
extends: existence
message: "Don't use em dashes. Use a comma, semicolon, colon, or period instead."
level: error
tokens:
  - '—'
  - '&mdash;'
  - '\u2014'

Ban passive voice

PassiveVoice.yml
extends: existence
message: "'%s' looks like passive voice. Rewrite in active voice."
level: warning
ignorecase: true
tokens:
  - 'is being \w+ed'
  - 'was \w+ed'
  - 'were \w+ed'
  - 'has been \w+ed'
  - 'have been \w+ed'
  - 'had been \w+ed'
  - 'will be \w+ed'
  - 'being \w+ed'

Enforce "we" over "I" in documentation

NoFirstPersonSingular.yml
extends: existence
message: "Use 'we' instead of '%s' in documentation."
level: warning
scope: paragraph
tokens:
  - '\bI\b'
  - '\bmy\b'
  - '\bmine\b'
  - '\bmyself\b'

Enforce short sentences

SentenceLength.yml
extends: occurrence
message: "Sentence too long (%s words). Keep sentences under 25 words."
level: warning
scope: sentence
max: 25
token: '\b\w+\b'

No weasel words

Weasel.yml
extends: existence
message: "Remove the weasel word '%s'. Be specific."
level: warning
ignorecase: true
tokens:
  - arguably
  - basically
  - clearly
  - essentially
  - extremely
  - generally
  - in order to
  - it should be noted
  - literally
  - obviously
  - quite
  - simply
  - somewhat
  - very
  - virtually

Consistent terminology

Terminology.yml
extends: substitution
message: "Use '%s' instead of '%s'."
level: error
ignorecase: true
swap:
  e-mail: email
  e mail: email
  repo: repository
  config: configuration
  'open source': open-source
  web site: website
  data base: database
  end point: endpoint

Readability gate

FleschKincaid.yml
extends: metric
message: "Flesch-Kincaid grade level (%s) is too high. Aim for 8.0 or below."
level: warning
formula: |
  (0.39 * (words / sentences)) + (11.8 * (syllables / words)) - 15.59
condition: '> 8.0'

No exclamation marks in headings

HeadingPunctuation.yml
extends: existence
message: "Don't use '%s' in headings."
level: error
scope: heading
tokens:
  - '!'
  - '\?'

5. 4. Configuration and Project Setup

A .vale.ini file at the project root controls everything.

Minimal setup

StylesPath = .vale/styles
MinAlertLevel = suggestion
Packages = Google, write-good

[*.md] BasedOnStyles = Vale, Google, write-good

[*.html] BasedOnStyles = Vale, Google

Directory structure

project/
  .vale.ini
  .vale/
    styles/
      config/
        vocabularies/
          MyProject/
            accept.txt    # Approved terms (one per line)
            reject.txt    # Banned terms (one per line)
      MyStyle/
        NoEmDashes.yml
        Weasel.yml
        SentenceLength.yml
      Google/              # Downloaded via `vale sync`
      write-good/          # Downloaded via `vale sync`

Key configuration options

SettingPurpose
StylesPathPath to all styles, configs, and scripts
PackagesStyles to download via vale sync
VocabVocabulary directories to load
MinAlertLevelMinimum severity to display: suggestion, warning, error
BasedOnStylesWhich styles to apply (per file glob)
TokenIgnoresRegex patterns for inline content to skip (e.g., LaTeX formulas)
BlockIgnoresRegex patterns for block-level content to skip

Vocabulary system

Vocabularies are project-specific term lists. accept.txt contains approved terms (auto-added to exception lists across all active styles and fed into Vale.Terms for casing enforcement). reject.txt contains banned terms (auto-populates Vale.Avoid as errors). Case-sensitive by default; prefix with (?i) for case-insensitive entries.

Running Vale

# Install
brew install vale          # macOS
go install github.com/errata-ai/vale/v3/cmd/vale@latest  # Go

Download packages

vale sync

Lint files

vale docs/ vale --output=JSON docs/ # Machine-readable output vale --glob='*.md' . # Filter by pattern


6. 5. Example Style Guide: Rules for This Site

Here is a concrete set of rules tailored for a personal website with blog articles and documentation. These encode specific editorial preferences as deterministic checks.

Proposed rules for alexisbouchez.com
RuleTypeLevelRationale
No em dashesexistenceerrorUse commas, semicolons, or periods instead
No exclamation marksexistencewarningMaintain a calm, measured tone
No weasel wordsexistencewarningBe specific rather than hedging
Sentences under 25 wordsoccurrencewarningShort sentences are easier to read
Readability below grade 8metricwarningAccessible to a broad audience
American English spellingconsistencyerrorPick one and stick with it
No "click here" linksexistenceerrorLink text should describe the destination
Consistent terminologysubstitutionerror"email" not "e-mail", "website" not "web site"
Title case headingscapitalizationwarningConsistent heading style
No repeated wordsrepetitionerrorCatch "the the" typos
Acronyms defined on first useconditionalwarningDon't assume the reader knows every abbreviation
No passive voiceexistencewarningActive voice is more direct and engaging

Combined with reject.txt:

synergy
synergize
leverage
learnings
deep dive
circle back
move the needle
low-hanging fruit
paradigm shift
disrupt

7. 6. Vale + LLMs: Architecture Patterns

Vale and LLMs have complementary strengths. Vale is deterministic, fast, offline, and auditable. LLMs understand context, nuance, tone, and can rewrite prose. Neither alone is sufficient for great writing. Together, they form a powerful feedback loop.

Pattern 1: Actor/Critic loop

The LLM writes (actor). Vale lints the output (critic). The LLM rewrites based on Vale's structured feedback. Repeat until clean.

1. LLM generates draft
2. Vale lints draft (vale --output=JSON draft.md)
3. If errors exist:
   a. Feed Vale's JSON output back to LLM
   b. LLM rewrites flagged passages
   c. Go to step 2
4. Output clean draft

This is the most practical pattern. Vale's JSON output is machine-readable, so the LLM can parse exactly which lines have issues, what the rule says, and what severity it is. The LLM can then make targeted fixes rather than rewriting the entire document.

Pattern 2: LLM as Vale interpreter

Vale finds issues. The LLM explains them in context and suggests specific rewrites. This is useful for writers who want to learn from the feedback rather than just accepting automated fixes.

1. Writer drafts content
2. Vale lints and produces structured output
3. LLM receives: original text + Vale output
4. LLM produces: explanation of each issue + suggested rewrite + reasoning
5. Writer reviews and decides

Pattern 3: LLM as pre-filter, Vale as enforcer

The LLM does a first pass for tone, structure, and coherence (things Vale cannot check). Vale then enforces the hard rules that the LLM might miss or ignore.

1. Writer drafts content
2. LLM reviews for: logical flow, argument structure, tone, audience fit
3. Writer incorporates LLM feedback
4. Vale enforces: terminology, sentence length, readability, banned patterns
5. Writer fixes Vale errors
6. Ship

Pattern 4: Dual enforcement in CI

Vale runs as a fast, cheap first pass in CI. Only documents with zero Vale errors get sent to an LLM for deeper analysis. This minimizes API costs.

1. PR opened with documentation changes
2. GitHub Action runs Vale (seconds, free)
3. If Vale passes:
   a. Send changed files to LLM API
   b. LLM checks: coherence, accuracy, tone
   c. LLM posts review comments on PR
4. If Vale fails:
   a. Block merge
   b. Author fixes deterministic issues first

8. 7. MCP Integration: Vale as an LLM Tool

The Vale-MCP server exposes Vale to AI coding assistants via the Model Context Protocol. This means an LLM like Claude can call Vale as a tool, lint a document, receive structured results, and provide contextual writing advice informed by Vale's rule-based findings.

How it works

Vale-MCP is a TypeScript/Node.js server that provides three MCP tools:

ToolPurpose
vale_statusCheck that Vale is installed and configured
vale_syncDownload and install style packages
check_fileLint a file and return structured results

Compatible clients

  • Claude Desktop
  • Claude Code (via MCP server config)
  • Cursor
  • VS Code with GitHub Copilot
  • Any MCP-compatible client

This is the most natural integration point for using Vale with LLMs today. The LLM doesn't need to understand Vale's rule syntax; it just calls the tool and interprets the results. The LLM can then explain issues to the user, suggest fixes, or automatically rewrite flagged passages.

Setup

# Requirements: Node.js 22+, Vale 3.0+

Add to your MCP client configuration:

{ "vale": { "command": "npx", "args": ["-y", "@christianchiama/vale-mcp"] } }


9. 8. Valegen: Generating Rules from Natural Language

Valegen is a web application that generates Vale YAML rules from plain English descriptions using RAG (Retrieval-Augmented Generation).

How it works

  1. User describes a desired rule in natural language: "Flag sentences that use passive voice"
  2. Valegen searches a vector database of Vale documentation and real-world rules from Vale core, Google, and Microsoft styles
  3. Retrieved context + user prompt are sent to an LLM (supports Gemini, GPT, and Claude)
  4. The LLM generates three candidate YAML rules with confidence ratings
  5. User picks the best one and saves it to their style directory

Why this matters

Writing Vale rules requires understanding YAML syntax, regex patterns, and Vale's check type system. Valegen lowers that barrier to zero. A technical writer with no programming experience can describe what they want in English and get a working rule. This makes it practical to encode an entire editorial style guide as Vale rules, even if the guide has dozens of preferences.

Example

Natural language inputGenerated rule (simplified)
"Don't allow sentences longer than 20 words"extends: occurrence, scope: sentence, max: 20, token: '\b\w+\b'
"Replace 'utilize' with 'use'"extends: substitution, swap: { utilize: use }
"No em dashes anywhere"extends: existence, tokens: ['---', '&mdash;']
"Headings should use sentence case"extends: capitalization, scope: heading, match: $sentence

10. 9. CI/CD Pipeline Integration

Vale has a first-party GitHub Action (errata-ai/vale-action) used by ~3,700 projects. It runs Vale on pull requests and surfaces results as inline annotations, PR reviews, or GitHub Checks.

Basic GitHub Actions setup

name: Lint Prose
on: pull_request

jobs: vale: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: errata-ai/vale-action@v2 with: files: docs/ reporter: github-pr-review fail_on_error: true

Vale + LLM in CI (advanced)

name: Writing Quality
on: pull_request

jobs: vale: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: errata-ai/vale-action@v2 with: files: docs/ reporter: github-pr-check fail_on_error: true

llm-review: needs: vale # Only runs if Vale passes runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Get changed docs id: changed run: | FILES=$(gh pr diff ${{ github.event.pull_request.number }} --name-only | grep -E '.(md|html)$') echo "files=$FILES" >> $GITHUB_OUTPUT - name: LLM review run: | # Send changed files to LLM API for tone/coherence review # Post results as PR comment

Pre-commit hook

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/errata-ai/vale
    rev: v3.12.0
    hooks:
      - id: vale
        args: [--glob, '*.md']

Editor integrations

EditorIntegration
VS Code"Vale CLI" extension (real-time linting, quick fixes)
NeovimALE plugin
JetBrains IDEsOfficial Vale CLI plugin
Sublime TextLSP package
Emacsflymake-vale
ObsidianCommunity plugin
ZedLSP integration

11. 10. Practical Workflow: End to End

Here is a concrete workflow for writing a blog article or documentation page using Vale and an LLM together.

Step 1: Set up Vale for the project

  1. Install Vale: brew install vale
  2. Create .vale.ini at project root
  3. Create custom style directory with your rules
  4. Run vale sync to download community packages
  5. Test with vale docs/

Step 2: Write the first draft

Write freely. Don't self-censor. Get the ideas down. The tooling will catch style issues later.

Step 3: Run Vale

vale --output=JSON article.md > vale-report.json

Fix all errors (hard rules like terminology, banned patterns). Consider warnings (soft rules like sentence length, readability).

Step 4: LLM review

Send the article to an LLM with a prompt like:

Review this article for:
- Logical flow and argument structure
- Missing context that a reader would need
- Tone consistency
- Places where examples would help

Do NOT change terminology, formatting, or style conventions. Those are handled by our linter.

[article content]

The key instruction is telling the LLM not to touch what Vale already handles. This prevents the LLM from re-introducing banned patterns.

Step 5: Final Vale pass

After incorporating LLM feedback, run Vale again. The LLM may have introduced style violations in its suggestions. Vale catches them deterministically.

Step 6: Publish

If using CI, the PR will be automatically linted. Zero Vale errors required to merge.


12. 11. What Vale Catches vs. What LLMs Catch

Complementary strengths
ConcernValeLLM
Banned words/phrases (em dashes, jargon)Yes (deterministic)Unreliable (may ignore instructions)
Consistent terminologyYes (substitution rules)Unreliable (may use synonyms)
SpellingYes (Hunspell dictionaries)Unreliable
Sentence lengthYes (occurrence check)Can suggest but not enforce
Readability metricsYes (Flesch-Kincaid, etc.)No (cannot compute reliably)
Heading capitalizationYes (AP/Chicago style)Unreliable
Acronym definitionsYes (conditional check)Sometimes
Repeated wordsYes (across markup boundaries)Sometimes
Logical coherenceNoYes
Argument structureNoYes
Tone and voice assessmentNoYes
Audience appropriatenessNoYes
Missing contextNoYes
Factual accuracyNoPartially (with caveats)
Creative rewritingNoYes
Cultural sensitivityPartial (alex style)Yes

The takeaway: Vale handles everything that can be expressed as a pattern. LLMs handle everything that requires understanding meaning. Using both means your writing is both mechanically correct and genuinely good.

As Datadog's engineering team put it: Vale's "crisp, computer-understandable rules" are foundational infrastructure that should exist before integrating LLMs. LLMs lack awareness of organization-specific style choices, but Vale encodes those choices precisely.


13. 12. Resources and Further Reading