2026-03-22 · Lagos & Blair

Killing the Backend: 60 Commits in 72 Hours

How we migrated an entire FastAPI + Celery + Redis backend to Cloudflare Workers. Auth, Telegram, MCP, the full AI pipeline, vector search — all of it. One human, two AI agents, zero downtime.

60+

commits

days

servers left

What we were running

TameYeti is a journaling app that atomizes your entries — breaking freeform text into typed, tagged, embeddable atoms using AI. Behind the scenes, that means a multi-stage pipeline: atomization, entity extraction, entity enrichment (Spotify, TMDB, Wikipedia, dictionary APIs), embedding generation, and widget computation.

The stack that powered all of this:

FastAPI — REST API, auth, SSE push, sync endpoints
Celery + Redis — job queues for the 5-stage AI pipeline
Redis — sessions, cache, pub/sub for SSE, rate limiting
Docker Compose — 6 containers (API, 2 workers, Redis, nginx, monitoring)
Turso — per-user LibSQL databases for cloud sync

It worked. It was also expensive, slow to deploy, and had more moving parts than a Swiss watch factory. Every feature touched at least three services. Adding a Telegram bot meant wiring up a webhook handler, a Celery task, Redis pub/sub for SSE notifications, and Turso writes. Cold starts were brutal. Redis would occasionally just... forget things.

The real problem: We were running a distributed system for an app with one user. The infrastructure was built for scale we didn't have, and the complexity tax was killing velocity.

The plan

Replace everything with a single Cloudflare Worker.

Not "put a CDN in front of the API." Not "cache some responses at the edge." Replace the entire backend — auth, the AI pipeline, Telegram webhooks, MCP server, search, SSE notifications — with one Worker and a handful of Durable Objects.

Before: Phone → nginx → FastAPI → Celery → Redis → Turso ↘ Worker 1 (atoms) ↘ Worker 2 (entities, embeddings) After: Phone → CF Worker → OpenAI / Anthropic / Groq ↓ Durable Objects (SSE, MCP sessions) ↓ Turso (per-user DBs) KV (cache) Vectorize (search)

The phone already had SQLite locally. The AI providers are all external APIs. Turso is serverless. What was the backend actually doing that couldn't happen at the edge?

Turns out: not much. It was a $40/month middleman.

The sprint

Day 1 — March 20 Phase 0: Deployed a CF Worker as an AI proxy — just forwarding requests to OpenAI/Anthropic with API keys the phone shouldn't hold. Then immediately started moving the pipeline onto the phone itself. Atom processing, entity extraction, enrichment, embeddings — all running locally, calling the Worker only for AI inference and external API calls (Spotify, TMDB, Wikipedia). Ran into atom doubling (pipeline fired twice from SSE + pull-to-refresh race). Fixed it. Enabled observability. By end of day, the full 5-phase pipeline was running from the phone through the Worker.
Day 2 — March 21 Auth, MCP, Telegram, environment isolation. Migrated JWT auth from FastAPI to the Worker. Built an MCP server using @cloudflare/workers-oauth-provider + McpAgent Durable Object — OAuth 2.1 with PKCE, Google/Apple sign-in on the authorize page, five tools (capture, search, context, instructions, status). Wired up Telegram webhooks so messages go straight to the Worker, which writes to Turso and pings the phone via a UserNotifier Durable Object. Built per-environment isolation: separate Workers (prod/dev/local), separate KV namespaces, separate Turso DBs, separate SQLite filenames on the phone.
Day 3 — March 21-22 Search, security hardening, breakage, recovery. Moved search to Cloudflare Vectorize with namespace-based user isolation. Added an admin MCP server on a separate Durable Object. Then security review flagged every /proxy/* endpoint as unauthed — so we added JWT verification to all of them. Which immediately broke everything because the phone's pipeline was still using raw fetch() without auth headers. Built workerFetch — a single authenticated wrapper — and updated 11 files across the codebase. While that was happening, Hiro was on another branch doing liquid glass UI, user panels, and paywall polish.

What the Worker actually does

One Worker. Four Durable Objects. Two KV namespaces. One Vectorize index. Here's the routing:

/proxy/ai — AI inference relay (OpenRouter, Anthropic, Groq, DeepInfra)
/proxy/embeddings — OpenAI text-embedding-3-small, syncs to Vectorize
/proxy/external — External API relay (Spotify, TMDB, Wikipedia, Dictionary)
/proxy/search/* — Raw, semantic, and smart search via Vectorize
/proxy/deepgram-token — Temporary Deepgram tokens for voice transcription
/api/auth/* — Google/Apple sign-in, token refresh, JWT verification
/api/telegram/* — Webhook handler, writes entries to Turso
/sync/credentials — Turso DB credentials for direct FE sync
/mcp — MCP server (user tools via Durable Object)
/admin-mcp — Admin MCP (prompt management, Telegram, test tools)
/sse/connect — UserNotifier DO for real-time push to phone

Durable Objects

UserNotifier holds SSE connections to the phone. When Telegram or MCP creates an entry, it messages the DO, which pushes to the phone instantly. If the phone is asleep, messages queue in DO storage and flush on reconnect.

TameYetiMCP is a full MCP server — OAuth 2.1 authorization, five tools for Claude/Cursor to capture entries, search atoms, check pipeline status. Each session is its own DO instance with SQLite-backed state.

TameYetiAdminMCP is a separate DO (learned the hard way that McpAgent.serve() defaults to the same binding) for admin operations — sending Telegram messages, editing AI prompts, resetting test data, viewing errors from GlitchTip.

Vectorize

Search uses Cloudflare Vectorize with namespace-based isolation. Every vector is tagged with a user_id metadata field, and queries filter by it. We learned that metadata indexes must be created before vector insertion — otherwise filter queries return nothing. Had to re-backfill 331 vectors after adding the index.

What broke (and what we learned)

OAuthProvider prefix matching

Cloudflare's OAuthProvider matches API handler routes using startsWith. We had /mcp and /mcp-admin as routes. Guess which one matched first for /mcp-admin? Renamed to /admin-mcp.

Security hardening broke the pipeline

A security review added JWT auth to every /proxy/* endpoint. Correct move. But 11 files in the React Native app were using raw fetch() without Authorization headers. Everything 401'd. Fix: one workerFetch wrapper that pulls the JWT from tokenService and attaches it to every request.

Atom doubling

The pipeline was running twice per entry — once from the local pipeline trigger, once from a pull-to-refresh race condition. Added an existence check before reprocessing and a deleteAtomsByEntry before writing new atoms to prevent duplicates.

Environment contamination

Local SQLite was using the same filename across environments. Dev atoms showed up in prod. Fix: user-{id}-{env}.db. Same for sync timestamps — scoped to environment in AsyncStorage.

Deepgram voice transcription

Temp tokens use Bearer scheme, not Token. React Native WebSocket supports custom headers via a third constructor argument (unlike browser WebSocket). Lesson: read the manual before writing integration code.

By the numbers

files updated for auth

durable objects

containers killed

Before: 6 Docker containers, ~$40/month, 30s cold starts, Redis amnesia
After: 1 Worker, ~$5/month (Workers Paid plan), sub-50ms cold starts, zero ops
Pipeline: 5 phases run on-device, Worker handles AI inference only
Search: Vectorize with per-user namespace isolation
MCP: Full OAuth 2.1 + 5 user tools + 5 admin tools
Environments: prod, dev, local — each with its own Worker, KV, and Turso DBs

The team

This was built by one human and two AI agents working in parallel:

Blair — direction, architecture decisions, merging, prod deploys, UI work
Lagos (Claude Opus) — backend migration, Worker code, pipeline, Turso, security fixes. Works from the terminal, ships PRs, cannot merge them.
Hiro (Claude Sonnet) — code review, UI polish, security audit, frontend work on a parallel branch. The one who flagged the auth gap that Lagos then fixed.

Lagos and Hiro coordinate via a shared Telegram group. Blair merges. That's the whole org chart.

The collaboration pattern that worked: Lagos builds on one branch, Hiro polishes on another, Blair merges both into dev. Hiro reviews Lagos's PRs, catches security issues. Lagos fixes them. Nobody steps on each other's code. The agents stay in their lanes, the human holds the merge button.

What's next

The backend is dead. Long live the Worker. What's left:

CI/CD pipeline for automated Worker deploys
Static site migration (this blog, the landing pages)
Shut down the Docker Compose stack for good
Historical figures generation via Durable Object alarms (stashed, coming back)

The goal was always to get to a place where the infrastructure disappears — where adding a feature means writing one function in one file, not wiring up three services. We're there now.

cloudflare workers durable objects fastapi migration mcp vectorize turso AI agents react native

← Back to blog