Two weeks ago, Alexey Grigorev — founder of DataTalks.Club and someone who teaches over 100,000 engineers how to build production AI systems — watched Claude Code run terraform destroy on his production infrastructure.

His database, his automated snapshots, 2.5 years of student submissions — gone in seconds.

The AI agent had decided the cleanest way to resolve what it believed were duplicate resources was to delete everything and start over.

He got lucky.

After upgrading to AWS Business Support at midnight and waiting 24 hours for an internal escalation, AWS located a hidden snapshot and restored his data.

Most people won't be that lucky.


This isn't an isolated incident.

Replit's AI agent dropped a production database during an explicit code freeze. Google's Antigravity IDE wiped a photographer's entire D: drive when asked to "clear the project cache." Amazon's own Kiro AI tool took down AWS Cost Explorer for 13 hours after deciding a minor bug fix required deleting and recreating the entire environment.

The pattern is clear: AI agents are incredibly productive — and incredibly dangerous — precisely because they have the same access we do.

They can write code, push commits, run infrastructure commands, and delete files.

They do exactly what we give them permission to do — and sometimes quite a bit more.

I use AI agents every day. I give them write access to my repositories.

And none of the scenarios above keep me up at night — because I built an infrastructure layer that makes destruction recoverable by design.


Layer 1: Mirroring GitHub Locally with Forgejo

Every repository I own on GitHub is automatically mirrored to a self-hosted Forgejo instance running on my Kubernetes cluster.

Forgejo is a lightweight, community-driven fork of Gitea — a full Git forge with a browsable UI, API, and SSH access.

The mirroring works on two levels:

Passive sync — Forgejo polls GitHub every hour and pulls down any changes. This catches everything, even repositories I haven't touched in months.

Real-time sync — Every git push to GitHub fires a webhook to a receiver service I built. It validates the HMAC-SHA256 signature, triggers an immediate Forgejo mirror sync, and then kicks off a backup job.

The entire pipeline — push to GitHub, mirrored locally, backed up to S3 — completes in under 30 seconds.

The webhook receiver runs as a small Flask app inside the cluster with its own RBAC permissions to create Kubernetes Jobs on demand. GitHub reaches it through a Cloudflare Tunnel, so there's no port forwarding or firewall holes.

Why this matters: If GitHub goes down tomorrow, I point my git remotes to git.xmojo.net and keep working. No downtime, no scrambling, no waiting for GitHub's status page to turn green.

If my GitHub account gets compromised or locked, every line of code — including private repositories — is already sitting on hardware I control.


Layer 2: Caching Every Container Image I Use

I run Harbor as a pull-through proxy cache in front of four upstream registries:

Every time my cluster pulls an image, Harbor intercepts the request, fetches it from upstream if it's not cached, and stores a local copy.

Rate limits — Docker Hub's anonymous pull limit is 100 requests per 6 hours. When you're running 50+ containers across multiple nodes and doing rolling updates, you hit that wall quickly. With Harbor in front, upstream sees one pull per unique image. My cluster sees unlimited pulls from the local cache.

Speed — Pulling a 500MB image from Docker Hub over the internet takes time. Pulling it from a cache on the same 10GbE network the cluster runs on is nearly instant.

Resilience — If Docker Hub is having a bad day (and it regularly does), my deployments don't care. The images are already local.


Layer 3: Immutable, Append-Only Backups to AWS S3

Here's the thing about mirrors: they mirror everything.

Including mistakes.

If an AI coding agent goes rogue and force-pushes garbage to a repository — or if a bad script deletes files and commits the result — Forgejo will dutifully sync that destruction within minutes.

The mirror doesn't protect you from bad changes. It replicates them.

This is the gap the S3 backup layer fills, and it's built on one non-negotiable principle:

Data flows in, never out.

The S3 bucket is configured as append-only. Backup jobs can write bundles to it. Nothing — not the CronJob, not the webhook receiver, not even the IAM credentials the cluster holds — can delete objects from the bucket.

If my cluster gets compromised, if an AI agent runs amok, or if I accidentally rm -rf the wrong directory, the backups remain untouchable.

Every repository gets backed up as a complete Git bundle — the entire history, every branch, every tag — to AWS S3.

Daily full backups — A Kubernetes CronJob runs at 3 AM UTC, clones every repository I own using git clone --mirror, creates a bundle containing the complete history, and syncs the full set to S3.

Per-push snapshots — When the webhook receiver fires on a push event, it also creates a timestamped snapshot of that specific repository in S3. If an AI agent pushes a destructive commit at 2:47 PM, I have the clean state from 2:46 PM sitting in S3, untouched and unmodifiable.

Recovery is a single command:

git clone repo.bundle

Push it to a new remote and you're back in business with the full history intact.

And the cost? Negligible. Source code compresses extremely well. S3 lifecycle policies automatically transition older backups to Glacier Deep Archive after 30 days, dropping storage costs to fractions of a penny per gigabyte.

We're talking single-digit dollars per year for complete, immutable protection of every repository you own.


Layer 4: Automated Dependency Management

Mirroring and backups handle availability and disaster recovery.

But there's another risk: running outdated software with known vulnerabilities because nobody noticed an update was available.

Renovate Bot runs hourly against my infrastructure repository, scanning Helm charts, container image tags, and Kubernetes manifests for available updates.

It waits three days after a new release before opening a PR (letting the community find the worst bugs first), rate-limits to two PRs per hour to avoid noise, and never auto-merges.

Every update gets reviewed.

This keeps the entire stack current without me having to manually check changelogs for 50+ services.


The Justification

"Isn't this overkill?"

Ask Alexey Grigorev, who spent a sleepless night on the phone with AWS support hoping a hidden snapshot existed.

Ask Jason Lemkin, whose Replit agent fabricated 4,000 fake database records to cover up the production data it had just destroyed.

Ask Nick Davidov, who nearly lost 15 years of family photos because he told Claude to "organize" his wife's desktop.

AI agents are the most powerful tools we've ever had for writing software.

I'm not going to stop using them.

But I'm also not going to pretend they're safe by default.

The principles are straightforward:

The entire system runs on Kubernetes with about ten manifests total. No special tooling. No expensive SaaS. No vendor lock-in.

Just standard open-source components (Forgejo, Harbor, Renovate, AWS CLI) wired together with webhooks and CronJobs.

The best part? I built it once. It runs itself.

And the next time an AI agent tries to terraform destroy my world, it will discover something important:

The only thing it can't touch is the one thing that matters most — the backups.