
In the first post, we laid down the skeleton. In the second, we built the CLI that gathers model, API key and prompt. The binary proudly printed all three and signed off with (no model call yet — that's Post 03.).
We're there. This post replaces that placeholder with a real call to OpenRouter. By the end, we'll have an agent that can literally answer a prompt.
Code still lives at github.com/alexisbchz/hypercode.
OpenRouter exposes an OpenAI-compatible API — POST /api/v1/chat/completions with an Authorization: Bearer <key> header. The body is JSON with two essential fields:
{
"model": "poolside/laguna-m.1:free",
"messages": [
{ "role": "user", "content": "Say hello in exactly 5 words." }
]
}
The response comes back in this shape:
{
"id": "...",
"model": "...",
"choices": [
{
"message": { "role": "assistant", "content": "Hello, how are you today?" }
}
]
}
Many other fields ride along (usage, finish_reason, created...), but for a first single-turn call, we only need choices[0].message.content.
To keep layers separate, we isolate everything in a new src/openrouter.zig. Its job: given a key, a model, and a prompt, return the response text. That's it. No user-facing error handling, no TUI — just the mechanics of the protocol.
main.zig orchestrates, as before.
Zig structs serialise naturally to JSON via std.json. No mapping attribute needed — field names become keys.
const Message = struct {
role: []const u8,
content: []const u8,
};
const Request = struct {
model: []const u8,
messages: []const Message,
};
const Response = struct {
choices: []const struct {
message: Message,
},
};
Three structs, the minimum for one round-trip. Response only includes choices — by default std.json rejects unknown fields, but we'll tell it .{ .ignore_unknown_fields = true } at parse time, and it'll silently skip id, usage, etc.
std.json.Stringify.valueAllocEncoding a struct to a JSON string is one line:
const payload = try std.json.Stringify.valueAlloc(gpa, req, .{});
defer gpa.free(payload);
valueAlloc allocates the buffer, writes the JSON, hands it back. We defer the free. The standard library covers string escaping, control characters, quotes — things we absolutely don't want to roll by hand for an agent that's about to receive user code.
The Authorization: Bearer <key> header gets built in a buffer whose size we control. Static allocation, per CLAUDE.md §2.4.
var auth_buf: [512]u8 = undefined;
const auth = try std.fmt.bufPrint(&auth_buf, "Bearer {s}", .{api_key});
512 bytes is plenty for a Bearer token (OpenRouter keys are ~75 chars). bufPrint returns a []u8 pointing into auth_buf — so auth's lifetime is the function's. No heap.
std.http.ClientThis is where Zig 0.16 does a lot of work for us. std.http.Client speaks TLS through std.crypto.tls, handles redirects, keep-alive, status codes. You give it a URL and a payload; it gives you back a status and writes the body into a writer you supply.
var response: std.Io.Writer.Allocating = .init(gpa);
defer response.deinit();
var client: std.http.Client = .{ .allocator = gpa, .io = io };
defer client.deinit();
const fetched = client.fetch(.{
.location = .{ .url = endpoint },
.method = .POST,
.payload = payload,
.extra_headers = &.{
.{ .name = "authorization", .value = auth },
.{ .name = "content-type", .value = "application/json" },
},
.response_writer = &response.writer,
}) catch return .network_error;
A few details worth a word.
| Element | Role |
|---|---|
std.Io.Writer.Allocating | A writer that grows its buffer as bytes arrive. Perfect for collecting a response body of unknown size. |
client.io = io | In 0.16, std.http.Client requires a std.Io. It's the I/O abstraction passed to every async layer. We take it from init.io in main. |
.extra_headers | Our own headers, on top of what the stdlib adds by default (User-Agent, Host, etc.). |
catch return .network_error | Any network error (DNS, TLS, connection refused...) → one variant. The exact diagnostic is rarely useful to the end user. |
if (fetched.status != .ok) {
return .{ .http_status = @intFromEnum(fetched.status) };
}
const body = response.writer.buffer[0..response.writer.end];
const parsed = std.json.parseFromSlice(
Response,
gpa,
body,
.{ .ignore_unknown_fields = true },
) catch return .bad_response;
defer parsed.deinit();
if (parsed.value.choices.len == 0) return .bad_response;
return .{ .ok = try gpa.dupe(u8, parsed.value.choices[0].message.content) };
Three steps:
http_status variant with the code.bad_response.gpa.dupe(u8, ...) copies the text into caller-owned memory — because parsed will be freed when the function exits. The caller owns the string and frees when done.Result unionpub const Result = union(enum) {
/// Assistant text, owned by the caller's allocator.
ok: []const u8,
/// Couldn't reach the server (DNS, TLS, connection reset, ...).
network_error,
/// Reached the server, got a non-2xx status.
http_status: u16,
/// 2xx but the body didn't look like a chat-completions response.
bad_response,
};
Four variants. Four distinct error stories the caller can decide to handle differently — print a message, retry, escalate. Same pattern we used in cli.Result and config.Result: no opaque errors, no lost diagnostics.
mainconst gpa = init.gpa;
const result = try openrouter.call(gpa, io, cfg.api_key, cfg.model, cfg.prompt);
switch (result) {
.ok => |text| {
defer gpa.free(text);
try stdout.writeAll(text);
try stdout.writeAll("\n");
try stdout.flush();
},
.network_error => fail(stderr, "could not reach {s}", .{openrouter.endpoint}),
.http_status => |code| fail(stderr, "{s} returned HTTP {d}", .{ openrouter.endpoint, code }),
.bad_response => fail(stderr, "unexpected response shape from {s}", .{openrouter.endpoint}),
}
init.gpa is the general allocator std.process.Init hands us. In debug mode it detects leaks — handy for catching a forgotten defer gpa.free(text).
The exhaustive switch on result makes every variant visible. If we add rate_limited someday, the compiler forces us to handle it here.
We already have OPENROUTER_API_KEY in the environment and HYPERCODE_MODEL=poolside/laguna-m.1:free — Poolside via OpenRouter, free tier.
./zig-out/bin/hypercode "Say hello in exactly 5 words."
Hello, how are you today?
First real round-trip. The model doesn't quite follow the instruction (five words but turned into a question), but we don't care — what matters is that the mechanics work.
Bad key:
OPENROUTER_API_KEY=sk-bogus ./zig-out/bin/hypercode "ping"
error: https://openrouter.ai/api/v1/chat/completions returned HTTP 401
Run `hypercode --help` for usage.
exit 2, clean. The user knows exactly where to look.
No network (simulate by unplugging wifi):
error: could not reach https://openrouter.ai/api/v1/chat/completions
Run `hypercode --help` for usage.
Two commits, two layers:
45654a6 feat(main): send the prompt to openrouter and print the reply
dd0aaee feat(openrouter): minimal chat-completions client
git show dd0aaee shows the network mechanics in isolation, knowing nothing about the CLI. git show 45654a6 shows the orchestration alone. The split between network layer and orchestration stays legible.
We have an agent that talks. Not one that thinks yet — single exchange, no memory between calls, no tools, no streaming. But the network pipe is laid, and everything else stacks on top.
In the next post, we add streaming. Instead of waiting for the full response, we process SSE chunks as they arrive and write them straight to stdout. That's what turns a model call into a model experience.
Stuck, or want to share notes? Join the Discord server.