Setting Up Your Three-Layer Backup

This is Part 3 of a series. In Part 1, I argued that open source licenses protect your code but not the infrastructure around it. In Part 2, I surveyed the landscape of options for keeping your code somewhere other than GitHub.

Today: I am setting it up.

At the end of Part 2, I said the strategy I would pick is three layers: an archive, a live mirror, and a local backup. Here is what that looks like in practice, start to finish.

Woman at a workstation with three translucent layers representing code backup strategies: archive vault, shield, and filing cabinet

Layer 1: Software Heritage (The Archive)

Software Heritage automatically crawls most public GitHub repositories, but “most” is not “all,” and crawl frequency varies. The first step is confirming your repos are actually in the archive.

Check if your repo is archived

Use the Software Heritage API to check the last visit:

If the response includes "status": "full" and a recent "date", you are covered. If you get a 404, the repo has never been crawled.

Trigger an archive if needed

For repos that have not been crawled (or were last visited months ago), use “Save Code Now”:

  1. Go to archive.softwareheritage.org/save/
  2. Select “git” as the origin type
  3. Enter the full repository URL (e.g., https://github.com/your-username/your-repo)
  4. Submit

The visit is queued and typically completes within a few hours. You can also do this via the API:

Verify the archive

Once the visit completes, you can browse your archived code at:

Every commit in the archive gets a SWHID (SoftWare Hash IDentifier). For git commits, the SWHID revision hash is the same as the git commit SHA, so swh:1:rev:abc123... maps directly to commit abc123... in your repo.

Batch-check multiple repos

If you have more than a handful of repositories, checking them one at a time gets tedious. Here is a bash script that checks all public repos for a GitHub user:

And the PowerShell equivalent. Save this as Check-SoftwareHeritage.ps1 and run it from any directory. It has full comment-based help, so Get-Help .\Check-SoftwareHeritage.ps1 -Full works out of the box.

Here are some example command lines to get you started:

The script produces color-coded output: [OK] in green, [MISSING] in red, [STALE] in yellow. If something goes wrong with the API, you will see [ERROR], [RATELIM], or [BLOCKED] with an explanation of what happened and what to do about it. Forked repos are skipped by default since the upstream maintainer is typically responsible for archiving the original.

Any repo showing [MISSING] or [STALE] without the submission switches needs a “Save Code Now” submission (or rerun with the switch).

Total effort: About 15 minutes for a personal collection of 20-30 repos. Zero ongoing maintenance; Software Heritage handles future crawls automatically.

Layer 2: Codeberg Mirror (The Live Copy)

A Codeberg mirror gives you a functional forge that stays in sync with GitHub. If GitHub becomes unavailable, you can point people to Codeberg and they can clone, browse, and (if you choose to accept contributions there) open issues.

Step 1: Create a Codeberg account

Sign up at codeberg.org. Use the same username as GitHub if it is available; this makes URLs predictable.

Step 2: Create the mirror repository on Codeberg

For each repo you want to mirror, create a new empty repository on Codeberg. Do not initialize it with a README, license, or .gitignore. It needs to be completely empty so the first push from GitHub succeeds cleanly.

Step 3: Generate an SSH key pair

Create a dedicated key pair for the mirror workflow. This key should not be your personal SSH key; it is a deploy key with a single purpose.

The -N "" flag creates the key without a passphrase. This is required for unattended CI/CD use; GitHub Actions cannot type a passphrase interactively. The private key is protected by GitHub’s encrypted secrets storage rather than a passphrase, so this is the expected pattern for deploy keys.

This creates two files: ~/.ssh/codeberg-mirror (private key) and ~/.ssh/codeberg-mirror.pub (public key). On Windows, ssh-keygen ships with the built-in OpenSSH client (Windows 10 1809 and later); the same command works from PowerShell or Command Prompt, just use $env:USERPROFILE\.ssh\codeberg-mirror instead of ~/.ssh/codeberg-mirror if the tilde does not resolve.

A word about secrets. The private key file (codeberg-mirror, no .pub extension) must never be committed to a repository, pasted into a public issue, or shared in a chat log. Only the public key goes to Codeberg; only the GitHub Secrets web UI should ever see the private key. If you accidentally expose it, delete the Codeberg deploy key immediately and generate a new pair. Consider adding **/codeberg-mirror to your global .gitignore as an extra safeguard.

Step 4: Add the public key to Codeberg

In Codeberg, go to your mirror repository, then Settings, then Deploy Keys. Add the contents of ~/.ssh/codeberg-mirror.pub as a deploy key. Enable “Write” access; the mirror needs to push.

If you are mirroring multiple repos, you can add the public key to your Codeberg account settings (SSH/GPG Keys) instead of per-repo, which covers all repos at once.

Step 5: Add the private key to GitHub Actions

In your GitHub repository, go to Settings, then Secrets and variables, then Actions. Create a new repository secret named CODEBERG_SSH_KEY and paste the contents of ~/.ssh/codeberg-mirror (the private key).

Step 6: Create the workflow

Create .github/workflows/mirror-codeberg.yml in your repository:

What this does:

  • fetch-depth: 0 clones the full history, not a shallow copy
  • webfactory/ssh-agent loads the private key into the SSH agent for the workflow run
  • ssh-keyscan adds Codeberg’s host key so SSH does not prompt for confirmation
  • --force --mirror pushes all refs (branches, tags) and prunes anything on Codeberg that no longer exists on GitHub
  • Triggers on pushes to main, branch deletions, and manual dispatch

Step 7: Test it

Push a commit to main (or trigger the workflow manually via the Actions tab). Check the Actions run log. Then visit your Codeberg repo and verify the code, branches, and tags all match GitHub.

Security note

Pin the webfactory/ssh-agent action to a commit SHA rather than a tag for production use. Tags can be moved; commit SHAs cannot. Check the release page for the current SHA.

Also be aware: anyone with write access to the workflow file in your GitHub repo can modify it to exfiltrate the SSH private key. If you have collaborators, use a per-repo deploy key on Codeberg (not an account-wide key) to limit the blast radius.[2] In Codeberg, navigate to your repository, then Settings, then Deploy Keys; each key added there is scoped to that single repository.

Multiple repos

You need one workflow file per repo, but the pattern is identical. The only things that change are the repository secret name (if you use different keys per repo) and the Codeberg remote URL. If you use an account-wide SSH key on Codeberg, the same CODEBERG_SSH_KEY secret works across all repos; you just need to create the secret in each GitHub repo’s settings.

Layer 3: git bundle (The Local Backup)

A git bundle is the simplest possible backup: one file per repo, containing the complete git history, storable anywhere.

One-time backup

The verify command confirms the bundle is valid and lists what it contains.

Automated backups on Linux

Create a script that loops through your repos and produces dated bundles:

Add it to crontab to run weekly:

That runs every Sunday at 3:00 AM, logging output for review.

Automated backups on Windows

The equivalent PowerShell script:

Schedule it with Task Scheduler:

  1. Open Task Scheduler and click “Create Basic Task”
  2. Name it “Git Bundle Backup” and set it to run weekly
  3. Action: Start a program
  4. Program: powershell.exe
  5. Arguments: -ExecutionPolicy Bypass -File "C:\Users\your-username\backup-repos.ps1"
  6. Check “Open the Properties dialog” on finish, then set “Run whether user is logged on or not”

Restoring from a bundle

If you ever need to restore:

The clone creates a fully functional repo. Set the remote URL back to your actual upstream and you are right where you left off.

Putting It All Together

Here is what the complete setup looks like:

Software Heritage (one-time, 15 minutes)

  • Checked all public repos via the API
  • Submitted “Save Code Now” for any that were missing
  • No ongoing maintenance needed

Codeberg mirror (per-repo, 10 minutes each)

  • Created matching repos on Codeberg
  • Generated a dedicated SSH key pair
  • Added the workflow file to each GitHub repo
  • Every push to main automatically mirrors to Codeberg

git bundle (one-time setup, runs automatically)

  • Script loops through all local repos
  • Creates dated bundle files weekly
  • Cleans up bundles older than 90 days
  • Runs on a schedule via crontab or Task Scheduler

Total cost: zero. Total ongoing effort: near zero, aside from occasionally checking that the Codeberg mirror workflow is still green.

Is this overkill for a personal project collection? Maybe. But the point of Part 1 was that platforms change, companies get acquired, and policies shift in ways we cannot predict. The total investment here is an afternoon of setup and a few hundred kilobytes of bundle files per week. Compared to the alternative (hoping everything will be fine), that seems like a reasonable trade.

What Comes Next

The three-layer strategy covers preservation and mirroring. But there is a step beyond mirroring: running your own forge.

Self-hosting a Forgejo instance as your primary source of truth, with GitHub acting as a distribution mirror rather than the canonical home, inverts the dependency entirely. Your code lives on your infrastructure, under your control, in whatever jurisdiction you choose. GitHub becomes a convenience, not a requirement.

That is a larger project with its own set of decisions (hosting, CI/CD runners, database backends, reverse proxy configuration), and it deserves its own writeup. Stay tuned.


Have you set up your own backup strategy? Find me on Bluesky or LinkedIn.


Notes

[1] Software Heritage’s “Save Code Now” feature queues an immediate crawl of any publicly accessible git repository. Processing time varies based on queue depth but typically completes within a few hours. The feature is available both via the web interface and the REST API. Rate limits apply to the API (approximately 120 requests per hour for unauthenticated users). Source: Software Heritage Save Code Now.

[2] Codeberg deploy keys are repository-scoped by default. When “Write” access is enabled, the key can push to that specific repository. Account-level SSH keys (added via user settings) grant access to all repositories under that account. For mirroring, account-level keys reduce setup effort but increase blast radius if compromised. Source: Codeberg SSH key documentation.

[3] The webfactory/ssh-agent GitHub Action loads an SSH private key into the agent for the duration of the workflow run. The key is not written to disk in plaintext; it is loaded directly into the agent via ssh-add. However, any step in the workflow can access the agent, so malicious or compromised actions in the same job could use the key. Pinning to a commit SHA (rather than a version tag) prevents tag-reassignment attacks. Source: webfactory/ssh-agent repository.

[4] git bundle create --all includes all refs: branches, tags, and any other refs (e.g., refs/stash). The --all flag is equivalent to listing every ref manually. Bundle files are self-contained and can be verified with git bundle verify, which checks that all prerequisite commits are present. Incremental bundles (using a basis bundle) are possible for reducing transfer size in ongoing backup scenarios. Source: git-bundle documentation.

[5] git push --mirror pushes all local refs (branches, tags, notes) to the remote and deletes any remote refs that do not exist locally. This makes the remote an exact copy of the local refs. Combined with --force, it overwrites any divergence on the remote. Caution: if used carelessly, --mirror can delete branches on the remote that were created there independently. For a one-way mirror (GitHub to Codeberg), this is the desired behavior. Source: git-push documentation.