This post builds on an existing homelab setup that won't be covered here: a self-hosted n8n instance running on the local network, Claude Code installed and configured, and Nginx Proxy Manager handling internal routing. If you're curious about any of that, the first post covers the broader infrastructure philosophy.
What this post is about: wanting to talk to Claude from the couch, from a phone, from anywhere on the network... without sitting at a terminal.
The First Idea: Slack
The natural instinct was a Slack bot. n8n has a Slack node, Claude is already wired up... it seemed like a five-minute job.
The problem: Slack delivers events to bots via webhooks. Slack's servers need to reach your n8n instance to POST a message when someone mentions the bot. A self-hosted n8n that lives entirely on the local network is unreachable from the outside world. There's no public endpoint for Slack to hit.
You could expose n8n through the Cloudflare Tunnel. But that opens a workflow automation engine, with access to the local network, running scripts, sending notifications to the public internet. That's a hard pass.
So the Slack idea died, and creativity kicked in.
The Architecture That Actually Works
The key insight: instead of waiting for an external service to push events in, build something that pulls responses out.
The flow works like this:
Browser → POST /api/chat → FastAPI (claude-chat)
↓
n8n webhook (new or resume session)
↓
n8n runs Claude
↓
n8n POSTs response → /api/callback
↓
SSE stream delivers response to browser
Three actors. One session ID ties them together.
n8n: Two Webhooks, One ID
n8n exposes two webhook endpoints:
- New session:
POST /webhook/<uuid>— body:{"query": "..."}— returns{"sessionId": "..."} - Resume session:
POST /webhook/<uuid>— body:{"query": "...", "sessionId": "..."}— same ID echoed back
The session ID is what n8n uses to maintain conversation memory across turns. On the first message, n8n generates it. On every subsequent message, you hand it back.
When Claude finishes, n8n doesn't return the response in the HTTP reply — that would mean the browser has to wait with an open HTTP connection for however long Claude takes to think. Instead, n8n fires a separate POST to /api/callback with the session ID and the response text. This is the async handoff.
FastAPI: Bridging the Gap
claude-chat is a small FastAPI app running on nh2. It does three things:
POST /api/chat — receives the query from the browser, forwards it to the right n8n webhook (new or resume based on whether a sessionId is present), and returns the sessionId. The browser now knows what session to listen on.
GET /api/stream/{session_id} — opens a Server-Sent Events connection. The browser connects here immediately after sending the message. The endpoint waits on an asyncio.Queue for up to 10 minutes. When the callback arrives, it pushes the response into the queue and the SSE stream delivers it.
POST /api/callback — n8n posts here when Claude is done. If the SSE stream is already open, the response goes into the queue immediately. If not (the stream hasn't opened yet), it gets stored in a holding dict until the stream connects.
That last part — the two-part pending state — is what handles the race condition. n8n is fast sometimes. Claude can finish and the callback can fire before the browser has even opened the SSE stream. Without the holding dict, that response would vanish.
# If callback already arrived before this stream opened, deliver immediately.
if session_id in pending_responses:
response = pending_responses.pop(session_id)
yield f"data: {json.dumps({'response': response})}\n\n"
return
queue: asyncio.Queue = asyncio.Queue()
pending_queues[session_id] = queue
response = await asyncio.wait_for(queue.get(), timeout=600.0)
yield f"data: {json.dumps({'response': response})}\n\n"
The UI
The frontend is a single index.html. A dark GitHub-palette theme, chat bubbles, a session badge in the header that shows the first 8 characters of the session ID and lets you click to copy the full thing.
Markdown rendering is hand-rolled: fenced code blocks, tables, headings, lists, inline bold/italic/code. No library. The responses Claude gives are often code-heavy, so this was non-negotiable.
PIN authentication keeps it from being open to anyone on the network. On first visit, a lock screen asks for the PIN. A successful entry sets an httponly cookie (SHA-256 derived from the PIN, valid for 30 days). Subsequent visits skip the gate automatically.
The session badge in the header doubles as a conversation reference — if something interesting comes up, you can grab the ID and resume that exact thread later.
Putting It Together
The compose stack is straightforward:
services:
claude-chat:
build: .
container_name: claude-chat
restart: unless-stopped
ports:
- "7576:8000"
environment:
- PIN_CODE=${PIN_CODE}
A Dockerfile builds the FastAPI app and copies the static HTML in. Because source files are baked into the image, any UI change requires a rebuild: docker compose up -d --build.
And to bring it home, an HTTP Request node in n8n at the end of the Claude workflow, a POST request to http://192.168.6.108:7576/api/callback with {"sessionId": "{{ $json.sessionId }}", "response": "{{ $json.output }}"}.
The Result
A chat interface accessible from any device on the local network. No Slack. No exposed ports. No public endpoint. The entire flow lives inside the homelab.
The journey from "just use Slack" to "build a three-component async system with SSE and a race condition workaround" is a fair summary of self-hosting in general. Nothing works the obvious way. The interesting part is what you build when it doesn't.