The Code Is the New Data. Nobody Is Recovering It Right.

This is the forward-looking argument. Most of what follows isn't shipping yet. Some of it doesn't have a vendor behind it. All of it is either already a problem in your environment or will be within the next eighteen months.

We've spent a lot of time talking about what protect and recover means for data. This is about code. The distinction is collapsing.

I've spent 20 years at the C-level as a CISO, CIO, and CTO, and the code layer is now the part of recovery nobody has handled.

The vibe coding security problem

In 2025, 29 million secrets were leaked on GitHub. API keys. Database credentials. Service tokens. Access certificates. The single largest driver was AI-assisted coding. Developers using tools like Cursor, Copilot, and Claude to write code faster than their credential hygiene practices were designed to handle.

This is not a small problem dressed up with a large number. Researchers analyzing publicly available applications built with vibe coding platforms found over 2,000 vulnerabilities in a sample study. More than 400 exposed secrets. 175 instances of PII committed directly to code repositories. In public repos, on the open internet, findable by anyone.

The enterprise version of this problem is bigger and less visible. Enterprise repos aren't public. But the same developers using the same AI tools with the same habits are committing to your internal repositories every day.

Your backup strategy almost certainly doesn't cover what's in those repos. Your security tooling is probably not scanning them. Your recovery plan doesn't account for what happens when a repo containing embedded credentials gets exfiltrated or encrypted.

The stateless vs. stateful split

Before we talk about how backup fits into the code recovery story, we need to address the container question. Because the answer is not what most backup vendors want to admit.

If a stateless container goes down, restoring it from backup is often the wrong answer. Redeploying it from source, through the CI/CD pipeline and the GitOps workflow, is faster, cleaner, and better. You get the current configuration, not the configuration that existed at backup time. You get zero configuration drift. Restore from backup inherits whatever drift existed at the moment the backup ran. Ideally that's zero. It's not always zero.

This is an important architectural truth that backup vendors should be honest about. For stateless workloads, the backup isn't the recovery. The source repo and the deployment pipeline are the recovery. Backup for stateless containers is primarily useful for compliance, audit history, and forensics. Not operational recovery.

Stateful workloads are a completely different conversation. The persistent data: database volumes, configuration state, secrets, user data. That's what backup protects. And backing up stateful containers properly is genuinely hard. The mature backup tools can do it today. The challenge is getting the consistency guarantees right across the container and its persistent storage.

The real value backup vendors could unlock is integrating backup into the CI/CD and GitOps pipeline itself. Not replacing the pipeline. Adding the security and resilience layer the pipeline currently lacks. Scan the repo before the commit lands. Detect secrets before they hit history. Audit dependencies before they're pulled into production. Flag PII in test fixtures before they're committed. This is where backup intelligence applied to the code layer becomes a genuinely new capability.

The NDI problem in your codebase

Your AI coding agent is a Non-Deterministic Identity operating on your codebase.

The distinction: human identities, non-human identities like service accounts, and the new category of AI agents whose behavior is probabilistic rather than deterministic. Service accounts do what they're configured to do. AI agents reason toward a goal and find creative paths to get there.

An AI coding tool with access to your codebase will explore that codebase to complete its task. It will read files it wasn't explicitly pointed to. It will suggest dependencies based on what it found. It will incorporate context from one part of the repo into code it writes for another. That's the feature. The intelligence comes from traversal and synthesis.

Now ask: what did it read? What context did it incorporate? If a secret was sitting in a config file three directories away from where the developer was working, and the AI tool found it and used it as context for a recommendation, is that secret now embedded somewhere in the generated code?

These are not hypothetical scenarios. They're the kind of thing that doesn't show up in a security audit because nobody thought to look for it.

Your recovery plan for your codebase needs to account for what your AI tools have been doing in it.

What a GitOps recovery roadmap should mean

Some backup vendors have signaled intent to extend threat intelligence and malware detection into GitOps repositories. The direction is right and the timing makes sense, even where the announcements are still directional, with no GA date and no feature-complete description.

The architectural advantage a backup vendor has here is that they already hold backup copies of your data. Extending scanning to backup copies of repositories, looking for secrets, PII, dependency anomalies, and supply chain indicators, is a natural extension of capabilities they already run against backup data. They don't have to build a new product from scratch. They have to point existing intelligence at a new target.

The vendor that executes on this would be the first backup vendor in the space. The current players in Git security scanning are purpose-built tools: GitGuardian, Jit, Snyk, Semgrep, Trivy. Good products. But they don't have the longitudinal backup history that gives you forensic context. Knowing that a secret appeared in a commit eighteen months ago is different from knowing it appeared in a repo that you have 180 days of backup history for, which tells you exactly when it was introduced and what changed around it.

That forensic context is genuinely valuable, and it's something only a company with a long backup history could provide.

What to do now, before the tooling catches up

Don't wait for a vendor to solve this.

Scan your repos today. Trivy, GitGuardian, and similar tools exist and work. Run a scan on your most sensitive repos this week. Find out what's already there. The results will be uncomfortable. Do it anyway.
Build a secrets policy. Define what can and can't be in a repo. Make it explicit. Enforce it in your pre-commit hooks before anything lands in history. Purging secrets from git history after the fact is painful, incomplete, and time-consuming. Prevention is far cheaper.
Treat your CI/CD pipeline as a recovery path for stateless workloads. Know which of your containerized workloads can be redeployed from source and which require backup for their stateful data. Build the runbook for each. Don't find out which category something falls into during an incident.
Inventory your AI coding tool access. Know which tools have access to which repos. Know what data they can see. This is an identity and access management problem as much as a security scanning problem. Your AI coding tools are NDIs with codebase access. Govern them like it.

To enterprise IT leaders: Your AI coding tools are generating intellectual property and security risk simultaneously. Both need to be in your recovery and resilience plan.

To CISOs: The repo is the new attack surface for AI-era enterprises. Secret scanning and dependency auditing aren't optional. They're table stakes for any organization doing meaningful software development with AI assistance.

To vendors: The backup vendor that integrates threat intelligence into the CI/CD pipeline owns a category that doesn't exist yet. Build it correctly. Self-service scanning, honest dependency auditing, and paths that don't always lead to buying more of your product. Win the developer trust conversation that every security vendor is trying to have.