en2026-05-13

Building a Coding Agent in Zig: bash and grep

Hypercode bash+grep illustration

At the end of the sixth post, Hypercode could read, write and edit files. Enough to touch the code, not enough to verify it hasn't been broken.

This post adds the two tools that close the loop: bash (run a shell command — so zig build, npm test, pytest, anything) and grep (recursive substring search across the codebase).

With these five tools — read, write, edit, bash, grep — we have an agent that can observe, edit, and verify. That triad is what defines a usable coding assistant.

Code stays on github.com/alexisbchz/hypercode.

The bash tool

The challenge with bash isn't the code — std.process.run does 90% of the work — it's the discipline of limits. A shell command can:

  • Run forever (a tail -f or an infinite loop).
  • Spew enormous output to stdout (cat /dev/urandom).
  • Same on stderr.

For each, we need a bound. All of them live in constants.zig:

src/constants.zig
pub const bash_default_timeout_ms: u32 = 30_000;
pub const bash_timeout_ms_max: u32 = 120_000;
pub const bash_stdout_bytes_max: u32 = 64 * KiB;
pub const bash_stderr_bytes_max: u32 = 16 * KiB;

The model can request a shorter timeout than the default, never a longer one. That's TigerStyle's "everything has a limit" rule applied to execution time.

src/tools/bash.zig
const Args = struct {
    command: []const u8,
    timeout_ms: ?u32 = null,
};

pub fn run(gpa: std.mem.Allocator, io: std.Io, args_json: []const u8) ![]u8 {
    const parsed = try std.json.parseFromSlice(Args, gpa, args_json, .{ .ignore_unknown_fields = true });
    defer parsed.deinit();
    const args = parsed.value;

    const requested = args.timeout_ms orelse constants.bash_default_timeout_ms;
    const timeout_ms = @min(requested, constants.bash_timeout_ms_max);

    const result = try std.process.run(gpa, io, .{
        .argv = &.{ "sh", "-c", args.command },
        .stdout_limit = .limited(constants.bash_stdout_bytes_max),
        .stderr_limit = .limited(constants.bash_stderr_bytes_max),
        .timeout = .{ .duration = .{
            .raw = .fromMilliseconds(@intCast(timeout_ms)),
            .clock = .awake,
        } },
    });
    defer gpa.free(result.stdout);
    defer gpa.free(result.stderr);

    const exit_code: i32 = switch (result.term) {
        .exited => |c| @intCast(c),
        .signal => |s| -@as(i32, @intCast(@intFromEnum(s))),
        .stopped, .unknown => -1,
    };

    return std.fmt.allocPrint(
        gpa,
        "exit: {d}\n--- stdout ---\n{s}\n--- stderr ---\n{s}",
        .{ exit_code, result.stdout, result.stderr },
    );
}

Four details:

DetailWhy
sh -c <command>We hand interpretation to the shell. The model writes pipelines, &&, redirects — we don't have to parse.
.clock = .awakeMonotonic clock that excludes system suspend time. For a timeout, that's what you want.
result.term is a tagged unionexited / signal / stopped / unknown — four real Unix process outcomes. We map them to a signed i32 (negative = killed by signal).
Uniform exit/stdout/stderr responseThe format is deliberately uniform so the model learns the convention.

Safety note

sh -c <command> runs exactly what the model asks for. If the model decides to write rm -rf ~/, it does. For Hypercode as it stands, the contract with the user is: you trust the model. A future post will add a sandbox (probably bwrap on Linux, sandbox-exec on macOS). Out of scope here.

The grep tool

This is the other half of the ritual: before editing, you search.

The name is misleading — we don't do regex, just substring search. That's what 90% of models use in practice, and it's much faster than loading a regex engine.

Output format mirrors ripgrep / git grep — path:line:content — so the model recognises it:

src/tools/read.zig:13:pub fn run(gpa: std.mem.Allocator, io: std.Io, args_json: []const u8) ![]u8 {
src/tools/write.zig:16:pub fn run(gpa: std.mem.Allocator, io: std.Io, args_json: []const u8) ![]u8 {

Simple schema:

src/tools/grep.schema.json
{
  "type": "object",
  "properties": {
    "pattern": { "type": "string", "description": "Literal substring to search for. No regex." },
    "path":    { "type": "string", "description": "Directory to search. Defaults to working directory." }
  },
  "required": ["pattern"]
}

The implementation uses std.Io.Dir.Walker:

src/tools/grep.zig
pub fn run(gpa: std.mem.Allocator, io: std.Io, args_json: []const u8) ![]u8 {
    const parsed = try std.json.parseFromSlice(Args, gpa, args_json, .{ .ignore_unknown_fields = true });
    defer parsed.deinit();
    const args = parsed.value;

    if (args.pattern.len == 0) return error.PatternEmpty;

    var base = if (args.path) |p|
        try std.Io.Dir.cwd().openDir(io, p, .{ .iterate = true })
    else
        try std.Io.Dir.cwd().openDir(io, ".", .{ .iterate = true });
    defer base.close(io);

    var walker = try base.walk(gpa);
    defer walker.deinit();

    var out: std.Io.Writer.Allocating = .init(gpa);
    defer out.deinit();

    const file_buf = try gpa.alloc(u8, constants.tool_file_bytes_max);
    defer gpa.free(file_buf);

    var matches: u32 = 0;
    var files_scanned: u32 = 0;

    while (try walker.next(io)) |entry| {
        if (entry.kind != .file) continue;
        if (files_scanned >= constants.grep_files_scanned_max) break;
        files_scanned += 1;

        const contents = entry.dir.readFile(io, entry.basename, file_buf) catch continue;
        var line_num: u32 = 1;
        var cursor: usize = 0;
        while (cursor < contents.len) {
            const line_end = std.mem.indexOfScalarPos(u8, contents, cursor, '\n') orelse contents.len;
            const line = contents[cursor..line_end];
            if (std.mem.indexOf(u8, line, args.pattern)) |_| {
                try out.writer.print("{s}:{d}:{s}\n", .{ entry.path, line_num, line });
                matches += 1;
                if (matches >= constants.grep_matches_max) break;
            }
            cursor = line_end + 1;
            line_num += 1;
        }
        if (matches >= constants.grep_matches_max) break;
    }

    if (matches == 0) {
        return std.fmt.allocPrint(gpa, "no matches for '{s}' ({d} files scanned)", .{ args.pattern, files_scanned });
    }
    return gpa.dupe(u8, out.writer.buffer[0..out.writer.end]);
}

Two essential bounds:

  • grep_files_scanned_max = 5000 — any reasonable codebase is under that. node_modules isn't, but that's node_modules's problem.
  • grep_matches_max = 200 — past that the model drowns. Better to say "200 matches, refine" than ship 5000 lines.

entry.dir.readFile(io, entry.basename, file_buf) catch continue: if we hit an unreadable file (binary larger than 256 KiB, permissions denied), skip. No fatal error.

No .gitignore, no regex

A clean implementation would honour .gitignore, skip .git/, node_modules/, target/, zig-out/. That'll come later. For this post, we picked simplicity — the user can pass path: "src" in a project with a heavy node_modules.

Live test — the code-bash-verify loop

With bash and grep, we can now ask Hypercode for something resembling real work.

./zig-out/bin/hypercode "Use grep to find where the constant tool_file_bytes_max is used, then run zig build to make sure everything compiles."
→ grep({"pattern": "tool_file_bytes_max"})
→ bash({"command": "./zig/zig build"})

`tool_file_bytes_max` is used in 4 places:
- src/constants.zig:5 (definition)
- src/tools/read.zig:18 (read tool buffer cap)
- src/tools/edit.zig:33 (edit tool read buffer)
- src/tools/edit.zig:39 (edit tool write check)
- src/tools/grep.zig:39 (grep file buffer cap)

The build succeeded (exit code 0). Everything compiles cleanly.

That's the moment Hypercode flips from toy to tool. It locates a reference, verifies it compiles, narrates what's happening.

Shell error:

./zig-out/bin/hypercode "Run 'false' via bash and tell me what happened."
→ bash({"command": "false"})

The command `false` returned a non-zero exit code (1) — by convention, this
indicates an error. There was no output to stdout or stderr. The command
`false` is designed specifically to always exit with a failure status,
typically used in shell scripting for conditional logic.

Exit code propagates correctly; the model understands it.

--help reflects the world

Five tools, five lines:

Tools (always available to the model):
  read    Read the contents of a UTF-8 text file.
  write   Create or overwrite a file.
  edit    Replace one exact occurrence of a string in a file.
  bash    Run a shell command with a hard timeout.
  grep    Recursively search for a literal substring.

The commits

2516735 feat(tools): grep — bounded recursive substring search
4222421 feat(tools): bash with hard timeout and output caps

Two commits, two tools. Neither tool knows anything about the other — the dispatch table in tools.zig joins them, that's all.

Conclusion

Five tools. Hypercode can now write code, search it, and verify it works. We've left "assistant that talks" and entered "agent that acts".

But the experience is still rough: every user turn waits for the model's full reply, showing nothing in between. You see → read({...}) then silence for 5 seconds, then the answer arrives in one block. A shame, because modern models stream their output token by token.

In the next post, we finally tackle SSE streaming. The model talks, we print as the tokens arrive. Less fundamental than tools, but it's what flips Hypercode from "demo" to "I actually want to use this".

Stuck, or want to share notes? Join the Discord server.