Killing the Backend: 60 Commits in 72 Hours
How we migrated an entire FastAPI + Celery + Redis backend to Cloudflare Workers. Auth, Telegram, MCP, the full AI pipeline, vector search — all of it. One human, two AI agents, zero downtime.
What we were running
TameYeti is a journaling app that atomizes your entries — breaking freeform text into typed, tagged, embeddable atoms using AI. Behind the scenes, that means a multi-stage pipeline: atomization, entity extraction, entity enrichment (Spotify, TMDB, Wikipedia, dictionary APIs), embedding generation, and widget computation.
The stack that powered all of this:
- FastAPI — REST API, auth, SSE push, sync endpoints
- Celery + Redis — job queues for the 5-stage AI pipeline
- Redis — sessions, cache, pub/sub for SSE, rate limiting
- Docker Compose — 6 containers (API, 2 workers, Redis, nginx, monitoring)
- Turso — per-user LibSQL databases for cloud sync
It worked. It was also expensive, slow to deploy, and had more moving parts than a Swiss watch factory. Every feature touched at least three services. Adding a Telegram bot meant wiring up a webhook handler, a Celery task, Redis pub/sub for SSE notifications, and Turso writes. Cold starts were brutal. Redis would occasionally just... forget things.
The plan
Replace everything with a single Cloudflare Worker.
Not "put a CDN in front of the API." Not "cache some responses at the edge." Replace the entire backend — auth, the AI pipeline, Telegram webhooks, MCP server, search, SSE notifications — with one Worker and a handful of Durable Objects.
The phone already had SQLite locally. The AI providers are all external APIs. Turso is serverless. What was the backend actually doing that couldn't happen at the edge?
Turns out: not much. It was a $40/month middleman.
The sprint
- Day 1 — March 20 Phase 0: Deployed a CF Worker as an AI proxy — just forwarding requests to OpenAI/Anthropic with API keys the phone shouldn't hold. Then immediately started moving the pipeline onto the phone itself. Atom processing, entity extraction, enrichment, embeddings — all running locally, calling the Worker only for AI inference and external API calls (Spotify, TMDB, Wikipedia). Ran into atom doubling (pipeline fired twice from SSE + pull-to-refresh race). Fixed it. Enabled observability. By end of day, the full 5-phase pipeline was running from the phone through the Worker.
-
Day 2 — March 21
Auth, MCP, Telegram, environment isolation. Migrated JWT auth from FastAPI to the Worker. Built an MCP server using
@cloudflare/workers-oauth-provider+McpAgentDurable Object — OAuth 2.1 with PKCE, Google/Apple sign-in on the authorize page, five tools (capture, search, context, instructions, status). Wired up Telegram webhooks so messages go straight to the Worker, which writes to Turso and pings the phone via a UserNotifier Durable Object. Built per-environment isolation: separate Workers (prod/dev/local), separate KV namespaces, separate Turso DBs, separate SQLite filenames on the phone. -
Day 3 — March 21-22
Search, security hardening, breakage, recovery. Moved search to Cloudflare Vectorize with namespace-based user isolation. Added an admin MCP server on a separate Durable Object. Then security review flagged every
/proxy/*endpoint as unauthed — so we added JWT verification to all of them. Which immediately broke everything because the phone's pipeline was still using rawfetch()without auth headers. BuiltworkerFetch— a single authenticated wrapper — and updated 11 files across the codebase. While that was happening, Hiro was on another branch doing liquid glass UI, user panels, and paywall polish.
What the Worker actually does
One Worker. Four Durable Objects. Two KV namespaces. One Vectorize index. Here's the routing:
/proxy/ai— AI inference relay (OpenRouter, Anthropic, Groq, DeepInfra)/proxy/embeddings— OpenAI text-embedding-3-small, syncs to Vectorize/proxy/external— External API relay (Spotify, TMDB, Wikipedia, Dictionary)/proxy/search/*— Raw, semantic, and smart search via Vectorize/proxy/deepgram-token— Temporary Deepgram tokens for voice transcription/api/auth/*— Google/Apple sign-in, token refresh, JWT verification/api/telegram/*— Webhook handler, writes entries to Turso/sync/credentials— Turso DB credentials for direct FE sync/mcp— MCP server (user tools via Durable Object)/admin-mcp— Admin MCP (prompt management, Telegram, test tools)/sse/connect— UserNotifier DO for real-time push to phone
Durable Objects
UserNotifier holds SSE connections to the phone. When Telegram or MCP creates an entry, it messages the DO, which pushes to the phone instantly. If the phone is asleep, messages queue in DO storage and flush on reconnect.
TameYetiMCP is a full MCP server — OAuth 2.1 authorization, five tools for Claude/Cursor to capture entries, search atoms, check pipeline status. Each session is its own DO instance with SQLite-backed state.
TameYetiAdminMCP is a separate DO (learned the hard way that McpAgent.serve() defaults to the same binding) for admin operations — sending Telegram messages, editing AI prompts, resetting test data, viewing errors from GlitchTip.
Vectorize
Search uses Cloudflare Vectorize with namespace-based isolation. Every vector is tagged with a user_id metadata field, and queries filter by it. We learned that metadata indexes must be created before vector insertion — otherwise filter queries return nothing. Had to re-backfill 331 vectors after adding the index.
What broke (and what we learned)
OAuthProvider prefix matching
Cloudflare's OAuthProvider matches API handler routes using startsWith. We had /mcp and /mcp-admin as routes. Guess which one matched first for /mcp-admin? Renamed to /admin-mcp.
Security hardening broke the pipeline
A security review added JWT auth to every /proxy/* endpoint. Correct move. But 11 files in the React Native app were using raw fetch() without Authorization headers. Everything 401'd. Fix: one workerFetch wrapper that pulls the JWT from tokenService and attaches it to every request.
Atom doubling
The pipeline was running twice per entry — once from the local pipeline trigger, once from a pull-to-refresh race condition. Added an existence check before reprocessing and a deleteAtomsByEntry before writing new atoms to prevent duplicates.
Environment contamination
Local SQLite was using the same filename across environments. Dev atoms showed up in prod. Fix: user-{id}-{env}.db. Same for sync timestamps — scoped to environment in AsyncStorage.
Deepgram voice transcription
Temp tokens use Bearer scheme, not Token. React Native WebSocket supports custom headers via a third constructor argument (unlike browser WebSocket). Lesson: read the manual before writing integration code.
By the numbers
- Before: 6 Docker containers, ~$40/month, 30s cold starts, Redis amnesia
- After: 1 Worker, ~$5/month (Workers Paid plan), sub-50ms cold starts, zero ops
- Pipeline: 5 phases run on-device, Worker handles AI inference only
- Search: Vectorize with per-user namespace isolation
- MCP: Full OAuth 2.1 + 5 user tools + 5 admin tools
- Environments: prod, dev, local — each with its own Worker, KV, and Turso DBs
The team
This was built by one human and two AI agents working in parallel:
- Blair — direction, architecture decisions, merging, prod deploys, UI work
- Lagos (Claude Opus) — backend migration, Worker code, pipeline, Turso, security fixes. Works from the terminal, ships PRs, cannot merge them.
- Hiro (Claude Sonnet) — code review, UI polish, security audit, frontend work on a parallel branch. The one who flagged the auth gap that Lagos then fixed.
Lagos and Hiro coordinate via a shared Telegram group. Blair merges. That's the whole org chart.
What's next
The backend is dead. Long live the Worker. What's left:
- CI/CD pipeline for automated Worker deploys
- Static site migration (this blog, the landing pages)
- Shut down the Docker Compose stack for good
- Historical figures generation via Durable Object alarms (stashed, coming back)
The goal was always to get to a place where the infrastructure disappears — where adding a feature means writing one function in one file, not wiring up three services. We're there now.