Building a Coding Agent in Zig: the first call

Hypercode first call illustration

In the first post, we laid down the skeleton. In the second, we built the CLI that gathers model, API key and prompt. The binary proudly printed all three and signed off with (no model call yet — that's Post 03.).

We're there. This post replaces that placeholder with a real call to OpenRouter. By the end, we'll have an agent that can literally answer a prompt.

Code still lives at github.com/alexisbchz/hypercode.

What the OpenRouter request looks like

OpenRouter exposes an OpenAI-compatible API — POST /api/v1/chat/completions with an Authorization: Bearer <key> header. The body is JSON with two essential fields:

{
  "model": "poolside/laguna-m.1:free",
  "messages": [
    { "role": "user", "content": "Say hello in exactly 5 words." }
  ]
}

The response comes back in this shape:

{
  "id": "...",
  "model": "...",
  "choices": [
    {
      "message": { "role": "assistant", "content": "Hello, how are you today?" }
    }
  ]
}

Many other fields ride along (usage, finish_reason, created...), but for a first single-turn call, we only need choices[0].message.content.

The two-step plan

To keep layers separate, we isolate everything in a new src/openrouter.zig. Its job: given a key, a model, and a prompt, return the response text. That's it. No user-facing error handling, no TUI — just the mechanics of the protocol.

main.zig orchestrates, as before.

Modelling the request in Zig

Zig structs serialise naturally to JSON via std.json. No mapping attribute needed — field names become keys.

src/openrouter.zig

const Message = struct {
    role: []const u8,
    content: []const u8,
};

const Request = struct {
    model: []const u8,
    messages: []const Message,
};

const Response = struct {
    choices: []const struct {
        message: Message,
    },
};

Three structs, the minimum for one round-trip. Response only includes choices — by default std.json rejects unknown fields, but we'll tell it .{ .ignore_unknown_fields = true } at parse time, and it'll silently skip id, usage, etc.

Encoding is `std.json.Stringify.valueAlloc`

Encoding a struct to a JSON string is one line:

const payload = try std.json.Stringify.valueAlloc(gpa, req, .{});
defer gpa.free(payload);

valueAlloc allocates the buffer, writes the JSON, hands it back. We defer the free. The standard library covers string escaping, control characters, quotes — things we absolutely don't want to roll by hand for an agent that's about to receive user code.

The auth header

The Authorization: Bearer <key> header gets built in a buffer whose size we control. Static allocation, per CLAUDE.md §2.4.

var auth_buf: [512]u8 = undefined;
const auth = try std.fmt.bufPrint(&auth_buf, "Bearer {s}", .{api_key});

512 bytes is plenty for a Bearer token (OpenRouter keys are ~75 chars). bufPrint returns a []u8 pointing into auth_buf — so auth's lifetime is the function's. No heap.

The HTTPS call via `std.http.Client`

This is where Zig 0.16 does a lot of work for us. std.http.Client speaks TLS through std.crypto.tls, handles redirects, keep-alive, status codes. You give it a URL and a payload; it gives you back a status and writes the body into a writer you supply.

var response: std.Io.Writer.Allocating = .init(gpa);
defer response.deinit();

var client: std.http.Client = .{ .allocator = gpa, .io = io };
defer client.deinit();

const fetched = client.fetch(.{
    .location = .{ .url = endpoint },
    .method = .POST,
    .payload = payload,
    .extra_headers = &.{
        .{ .name = "authorization", .value = auth },
        .{ .name = "content-type", .value = "application/json" },
    },
    .response_writer = &response.writer,
}) catch return .network_error;

A few details worth a word.

Element	Role
`std.Io.Writer.Allocating`	A writer that grows its buffer as bytes arrive. Perfect for collecting a response body of unknown size.
`client.io = io`	In 0.16, `std.http.Client` requires a `std.Io`. It's the I/O abstraction passed to every async layer. We take it from `init.io` in `main`.
`.extra_headers`	Our own headers, on top of what the stdlib adds by default (User-Agent, Host, etc.).
`catch return .network_error`	Any network error (DNS, TLS, connection refused...) → one variant. The exact diagnostic is rarely useful to the end user.

Decoding the response

if (fetched.status != .ok) {
    return .{ .http_status = @intFromEnum(fetched.status) };
}

const body = response.writer.buffer[0..response.writer.end];
const parsed = std.json.parseFromSlice(
    Response,
    gpa,
    body,
    .{ .ignore_unknown_fields = true },
) catch return .bad_response;
defer parsed.deinit();

if (parsed.value.choices.len == 0) return .bad_response;
return .{ .ok = try gpa.dupe(u8, parsed.value.choices[0].message.content) };

Three steps:

Check the HTTP status. Anything not 200 goes into the http_status variant with the code.
Parse the JSON. If the shape doesn't match (because OpenRouter returns an HTML error page, say), we return bad_response.
gpa.dupe(u8, ...) copies the text into caller-owned memory — because parsed will be freed when the function exits. The caller owns the string and frees when done.

The `Result` union

pub const Result = union(enum) {
    /// Assistant text, owned by the caller's allocator.
    ok: []const u8,
    /// Couldn't reach the server (DNS, TLS, connection reset, ...).
    network_error,
    /// Reached the server, got a non-2xx status.
    http_status: u16,
    /// 2xx but the body didn't look like a chat-completions response.
    bad_response,
};

Four variants. Four distinct error stories the caller can decide to handle differently — print a message, retry, escalate. Same pattern we used in cli.Result and config.Result: no opaque errors, no lost diagnostics.

Wiring into `main`

src/main.zig

const gpa = init.gpa;
const result = try openrouter.call(gpa, io, cfg.api_key, cfg.model, cfg.prompt);
switch (result) {
    .ok => |text| {
        defer gpa.free(text);
        try stdout.writeAll(text);
        try stdout.writeAll("\n");
        try stdout.flush();
    },
    .network_error => fail(stderr, "could not reach {s}", .{openrouter.endpoint}),
    .http_status => |code| fail(stderr, "{s} returned HTTP {d}", .{ openrouter.endpoint, code }),
    .bad_response => fail(stderr, "unexpected response shape from {s}", .{openrouter.endpoint}),
}

init.gpa is the general allocator std.process.Init hands us. In debug mode it detects leaks — handy for catching a forgotten defer gpa.free(text).

The exhaustive switch on result makes every variant visible. If we add rate_limited someday, the compiler forces us to handle it here.

Live test

We already have OPENROUTER_API_KEY in the environment and HYPERCODE_MODEL=poolside/laguna-m.1:free — Poolside via OpenRouter, free tier.

./zig-out/bin/hypercode "Say hello in exactly 5 words."

Hello, how are you today?

First real round-trip. The model doesn't quite follow the instruction (five words but turned into a question), but we don't care — what matters is that the mechanics work.

The error paths

Bad key:

OPENROUTER_API_KEY=sk-bogus ./zig-out/bin/hypercode "ping"

error: https://openrouter.ai/api/v1/chat/completions returned HTTP 401
Run `hypercode --help` for usage.

exit 2, clean. The user knows exactly where to look.

No network (simulate by unplugging wifi):

error: could not reach https://openrouter.ai/api/v1/chat/completions
Run `hypercode --help` for usage.

The commits

Two commits, two layers:

45654a6 feat(main): send the prompt to openrouter and print the reply
dd0aaee feat(openrouter): minimal chat-completions client

git show dd0aaee shows the network mechanics in isolation, knowing nothing about the CLI. git show 45654a6 shows the orchestration alone. The split between network layer and orchestration stays legible.

Conclusion

We have an agent that talks. Not one that thinks yet — single exchange, no memory between calls, no tools, no streaming. But the network pipe is laid, and everything else stacks on top.

In the next post, we add streaming. Instead of waiting for the full response, we process SSE chunks as they arrive and write them straight to stdout. That's what turns a model call into a model experience.

Stuck, or want to share notes? Join the Discord server.