1/16

Engineering a Hardened AI Agent Node

A comprehensive guide to deploying autonomous AI agents on public infrastructure with zero-trust networking, kernel-level sandboxing, and cryptographic security.

14 min readJoão Pedro Maia
securitydevopsai-agentsinfrastructurelinux

Deploying an autonomous agent on a public VPS is fundamentally more dangerous than running one locally. Within minutes of provisioning, automated scanners (Shodan, Censys) index your IP, brute-force botnets begin dictionary attacks against SSH, and any exposed service becomes an entry point to the agent's memory, credentials, and compute.

This guide builds a Sovereign Node—a server that is cryptographically secured, invisible to public port scans, sandboxed at the kernel level, and accessible only through a private mesh network. We use Hetzner Cloud for compute, Tailscale for zero-trust networking, and systemd for process isolation.

Threat Model

Before executing any commands, define what you're defending against:

  • Automated Scanners: Bots continuously sweep the IPv4 space for open ports (22, 80, 443) and vulnerable banners. Any publicly listening service will be found within hours.
  • Brute-Force Botnets: Distributed networks running dictionary attacks against SSH and web gateways.
  • Privilege Escalation: If the agent runtime is compromised (e.g., via prompt injection leading to code execution), running as root grants the attacker full kernel control.
  • Data Exfiltration: Access to the agent implies access to its memory (chat history, summarized data) and environment variables (API keys, tokens).

The defense strategy rests on four pillars: zero-trust networking (no public management ports), least privilege (unprivileged runtime user), cryptographic identity (Ed25519 keys, no passwords), and automated patching.


1. Cryptographic Identity

We use Ed25519 exclusively. Unlike RSA, which requires 2048–4096 bit keys to remain secure, Ed25519 uses a 256-bit key based on Twisted Edwards curves. It offers faster verification, deterministic signatures, and resistance to several classes of side-channel attacks.

Generate the key pair on your local machine (not the server):

bash
ssh-keygen -t ed25519 -C "ai-agent-admin-2025" -f ~/.ssh/hetzner_ed25519
  • -t ed25519: Selects the Edwards-curve algorithm.
  • -C: Appends a comment for auditing authorized_keys files across multiple servers and rotation cycles.
  • -f: Saves to a specific path, preventing accidental overwrites of your default keys.

You must set a passphrase. If your laptop is stolen or compromised, the passphrase is the only barrier preventing immediate use of your private key. An unencrypted key on a stolen device is equivalent to a stolen password.

Retrieve the public key:

bash
cat ~/.ssh/hetzner_ed25519.pub

Rotation: A static key is a liability. If a private key is suspected of compromise, immediately remove the corresponding public key from authorized_keys on all managed servers and from the cloud provider's SSH key metadata. Consider rotating keys annually, using the -C comment field (e.g., admin-2025, admin-2026) to track lifecycle.


2. Infrastructure Provisioning

We use Hetzner Cloud for its performance-to-cost ratio, but these principles apply to any VPS provider.

  1. Inject Identity: In the Hetzner Console, navigate to Security > Add SSH Key and paste your ed25519.pub key.
  2. Create the Server:
    • Image: Ubuntu 24.04 LTS (Noble Numbat).
    • Type: CPX21 (3 vCPU, 4GB RAM) or similar.
    • SSH Key: Select the key you uploaded. Do not skip this. If no key is selected, Hetzner emails a root password—an immediate security failure.

3. Initial Access and User Privilege Separation

Connect as root. This is the only time we operate as root directly.

bash
ssh -i ~/.ssh/hetzner_ed25519 root@<PUBLIC_IP>

Running the agent as root is the single most dangerous configuration choice in this guide. A compromised agent process running as root owns the entire machine. We create two distinct accounts:

  • ops: An administrative user with sudo access for system configuration.
  • agent: A restricted system user with no shell login and no sudo, used exclusively to run the OpenClaw daemon.

Create the ops user

bash
adduser --gecos "" ops

This will prompt for a password. Set a strong one. The password is required for sudo commands during setup. It is not used for SSH (key-only), but it protects against unauthorized privilege escalation if someone gains shell access.

bash
# Grant sudo access (required for setup; we revoke this at the end)
usermod -aG sudo ops
 
# Migrate SSH identity
mkdir -p /home/ops/.ssh
cp /root/.ssh/authorized_keys /home/ops/.ssh/
chown -R ops:ops /home/ops/.ssh
chmod 700 /home/ops/.ssh
chmod 600 /home/ops/.ssh/authorized_keys

Create the agent user

bash
adduser --system --group --home /home/agent --shell /usr/sbin/nologin agent

This creates a system account with no login shell and no password. It exists solely to own the agent process.

Verify ops access

Open a new terminal on your local machine:

bash
ssh -i ~/.ssh/hetzner_ed25519 ops@<PUBLIC_IP>

If successful, close the root session. All subsequent commands are executed as ops.


4. SSH Hardening

Ubuntu 24.04 introduced an insidious configuration anomaly that silently undermines SSH hardening.

The 50-cloud-init.conf Override

Standard procedure says: edit /etc/ssh/sshd_config, set PasswordAuthentication no. In Ubuntu 24.04, this is quietly overridden by a drop-in file at /etc/ssh/sshd_config.d/50-cloud-init.conf, which cloud-init generates on first boot with PasswordAuthentication yes. Because drop-in files are parsed after the main config, the override wins. An administrator may believe passwords are disabled while the server remains vulnerable to brute-force attacks.

Fix

Remove the override file:

bash
sudo rm -f /etc/ssh/sshd_config.d/50-cloud-init.conf

As a belt-and-suspenders measure, create a high-priority drop-in that explicitly enforces key-only auth:

bash
sudo tee /etc/ssh/sshd_config.d/99-hardening.conf > /dev/null << 'EOF'
PermitRootLogin no
PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication no
UsePAM no
PubkeyAuthentication yes
X11Forwarding no
EOF
  • PermitRootLogin no: Forces all access through the unprivileged ops account. Even if the Ed25519 key is compromised, the attacker lands as ops, not root.
  • UsePAM no: Prevents PAM modules from re-enabling password auth through alternative authentication stacks.
  • X11Forwarding no: Eliminates the X11 attack surface on a headless server.

Reload the daemon:

bash
sudo systemctl reload ssh

Verification

From your local machine, attempt a password-only login:

bash
ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no ops@<PUBLIC_IP>

This must return Permission denied (publickey). If it prompts for a password, the override was not fully removed.


5. Automated Security Patching

A provisioned-and-forgotten server accumulates unpatched CVEs. We configure unattended-upgrades to apply security patches automatically.

bash
sudo apt update && sudo apt install unattended-upgrades -y
sudo dpkg-reconfigure --priority=low unattended-upgrades

Select Yes when prompted.

Verify the security repository is enabled in /etc/apt/apt.conf.d/50unattended-upgrades:

plaintext
Unattended-Upgrade::Allowed-Origins {
    "${distro_id}:${distro_codename}-security";
};
Unattended-Upgrade::Automatic-Reboot "true";
Unattended-Upgrade::Automatic-Reboot-Time "03:30";

Enabling automatic reboot is controversial but necessary. Kernel patches do not take effect until reboot. Scheduling reboots at 03:30 minimizes disruption. If you run time-critical workloads, disable automatic reboot and schedule maintenance windows manually.


6. Network Isolation: Tailscale & Firewall

We now transition the server to "dark mode"—no publicly accessible management ports.

6.1 Tailscale Installation

Tailscale creates an encrypted WireGuard mesh. Instead of exposing SSH on the public internet, we move it inside this private overlay.

bash
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

A note on curl | sh: Piping remote scripts into a shell is a known trust tradeoff that appears twice in this guide (here and for Homebrew in Section 8). For higher assurance, download the scripts first, inspect them, and then execute: curl -fsSL https://tailscale.com/install.sh -o install.sh && less install.sh && sh install.sh. We use the one-liner for brevity, not because it's best practice.

Authenticate via the URL Tailscale provides. Your server is now part of your tailnet.

Save the Tailscale IP—you'll need it for OpenClaw configuration:

bash
tailscale ip -4
# Example: 100.105.45.12

6.2 Firewall: The Split-Horizon Strategy

We use UFW to enforce deny-by-default, then selectively permit traffic.

Set defaults:

bash
sudo ufw default deny incoming
sudo ufw default allow outgoing

Allow SSH only on the Tailscale interface:

bash
sudo ufw allow in on tailscale0 to any port 22 proto tcp

Packets hitting the public IP on port 22 will be silently dropped.

Allow NAT traversal (UDP 41641):

Tailscale uses UDP port 41641 for direct peer-to-peer connections via STUN. Without this, traffic is relayed through DERP servers—still encrypted end-to-end, but with higher latency. Opening this port does not expose application-level services; WireGuard silently drops any packet that fails cryptographic validation. To an external scanner, the port appears closed.

bash
sudo ufw allow 41641/udp

Activation sequence (order matters):

  1. Ensure Tailscale is running on your local machine.
  2. Open a second terminal.
  3. In the second terminal, connect via Tailscale: ssh -i ~/.ssh/hetzner_ed25519 ops@100.105.45.12
  4. If that succeeds, enable the firewall in your original terminal:
bash
sudo ufw enable
  1. Verify the public IP is now unreachable: ssh ops@<PUBLIC_IP> → should timeout.

Do not enable UFW before confirming Tailscale connectivity. If the mesh isn't working, enabling deny-by-default locks you out with no recovery path short of Hetzner's web console.

6.3 Intrusion Prevention: Fail2Ban

With SSH restricted to Tailscale, Fail2Ban's SSH jail will be largely silent—which is the desired state. It remains valuable as defense-in-depth: if a misconfiguration later re-exposes a port, Fail2Ban catches the resulting probes.

bash
sudo apt install fail2ban -y

Create /etc/fail2ban/jail.local:

ini
[DEFAULT]
bantime = 1h
findtime = 10m
maxretry = 5
backend = systemd
 
[sshd]
enabled = true
port = ssh
filter = sshd

The backend = systemd directive is essential on modern Ubuntu. Without it, Fail2Ban attempts to parse /var/log/auth.log, which may not be populated correctly on journal-based systems, causing it to silently fail to detect intrusions.

bash
sudo systemctl enable fail2ban --now

7. OpenClaw Gateway Binding

The most critical application-level setting is the gateway bind address.

  • 0.0.0.0: Exposes the control plane to the entire internet. Never use this.
  • 127.0.0.1: Localhost only. Safe, but inaccessible from your laptop—even over Tailscale—without SSH port forwarding.
  • Tailscale IP (e.g., 100.105.45.12): Accessible only to authenticated devices on your mesh. Safe and convenient.

Get your IP:

bash
tailscale ip -4

Edit the OpenClaw configuration (typically ~/.openclaw/config.json):

json
{
  "gateway": {
    "bind": "100.105.45.12",
    "port": 18789
  }
}

Allow access to this port only from the Tailscale interface:

bash
sudo ufw allow in on tailscale0 to any port 18789 proto tcp

8. Runtime Environment: Homebrew & Node.js

System package managers like apt freeze package versions for stability. AI agent tooling typically requires current Node.js releases. Homebrew installs the latest toolchain in user-space (/home/linuxbrew/.linuxbrew) without requiring root, preventing conflicts with system libraries.

This is a deliberate tradeoff. Alternatives (nvm, direct Node tarballs, Docker) are equally valid. We use Homebrew for its simplicity and because it manages both Node.js and ancillary tools in a single workflow.

bash
# Run as ops, not root
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
 
# Configure PATH
(echo; echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"') >> ~/.bashrc
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
 
# Install Node.js
brew install node
corepack enable
corepack prepare pnpm@latest --activate

9. Application Deployment

9.1 Installation

We install via npm, which leverages the registry's package signing and integrity checks. This is preferable to piping an opaque shell script for application-level code.

bash
npm install -g @openclaw/cli

Once installed, run the onboarding wizard with the daemon flag to automatically configure the systemd service:

bash
openclaw onboard --install-daemon

This command:

  • Creates the ~/.openclaw configuration directory with secure permissions (700)
  • Generates a default config.json bound to 127.0.0.1 (you'll update this to your Tailscale IP)
  • Installs a sandboxed systemd unit file at /etc/systemd/system/openclaw.service
  • Configures the service to run as the agent user with NoNewPrivileges and filesystem isolation
  • Enables the service to start on boot

9.2 Security Audit

Run the built-in auditor before going live:

bash
openclaw security audit --deep

This checks binding exposure (is the gateway on 0.0.0.0?), credential file permissions (are they 600?), and plugin integrity. If it reports failures:

bash
openclaw security audit --deep --fix

10. Sudo Revocation

Setup is complete. The ops user no longer needs sudo access for routine operation. Removing it limits the blast radius if the ops SSH session is ever compromised.

bash
# Run this from the ops account
sudo deluser ops sudo

After running this, ops can no longer execute privileged commands. If you need sudo access for future maintenance, use the Hetzner web console to log in as root, re-add sudo temporarily, perform the task, and revoke it again.

If you prefer to keep sudo on ops permanently (a reasonable choice for solo operators), the NoNewPrivileges=yes directive in the systemd unit ensures the agent process itself can never leverage it, even if compromised.


11. Operational Security

11.1 The Contractor Mindset

Treat the agent as an untrusted third-party contractor with access to your data.

Credential Scoping:

  • GitHub: Do not use your personal PAT. Create a Fine-Grained Personal Access Token. Select only the specific repositories the agent needs. Grant Contents: Read/Write only—revoke Metadata, Admin, and all other scopes.
  • OpenRouter: Set a hard monthly spend limit (e.g., $10–20) in the OpenRouter dashboard. If the agent enters an infinite loop or is compromised, this cap prevents financial drain.
  • General principle: Every API key the agent holds should have the minimum scope required. A leaked key with repo:admin access is a catastrophe; a leaked key with contents:read on a single repository is a nuisance.

11.2 Telegram Command & Control

When creating your bot via @BotFather:

  1. Disable group access: /setjoingroups → Disable. This prevents your agent from being added to public groups where it could leak data or be triggered by strangers.
  2. Restrict DMs: In the OpenClaw configuration, set dmPolicy to pairing or allowlist. Never set it to open.
  3. Whitelist your user ID only. Add your specific Telegram numeric ID to the allowed users list.

11.3 Interface Discipline

  • Prefer TUI + Telegram over the Web UI. The web interface is a larger attack surface.
  • If using the Web UI, access it only via the Tailscale IP (http://100.105.45.12:18789), never over the public internet.
  • Do not ask the agent to browse untrusted websites. Prompt injection via web content is a real and demonstrated attack vector.

12. Monitoring

A node you cannot observe is a node you cannot trust. At minimum, implement a dead man's switch so you know if the agent goes down.

Create /usr/local/bin/healthcheck-agent.sh:

bash
#!/bin/bash
set -euo pipefail
 
if systemctl is-active --quiet openclaw; then
    # Agent is running — ping the dead man's switch
    curl -fsS --max-time 10 -o /dev/null https://hc-ping.io/YOUR-UUID-HERE
else
    # Agent is down — log the failure (the missed ping triggers an alert)
    logger -p user.err "openclaw service is not active"
fi

Register with a free monitoring service like Healthchecks.io, which alerts you (email, Slack, Telegram) if the ping stops arriving.

Schedule via cron:

bash
crontab -e
# Add:
*/5 * * * * /usr/local/bin/healthcheck-agent.sh

For deeper observability, monitor journald output:

bash
journalctl -u openclaw -f

13. Backup & Recovery

The agent's memory, configuration, and credential files are irreplaceable without a backup strategy.

Create /usr/local/bin/backup-agent.sh:

bash
#!/bin/bash
set -euo pipefail
 
DATE=$(date +%Y-%m-%d)
SRC="/home/agent/.openclaw/"
DEST="user@backup-host:/backups/openclaw/"
LOG_TAG="openclaw-backup"
 
if rsync -avz --delete \
    -e "ssh -i /home/ops/.ssh/backup_key" \
    "$SRC" "$DEST"; then
    logger -p user.info -t "$LOG_TAG" "Backup completed successfully for $DATE"
else
    logger -p user.err -t "$LOG_TAG" "Backup FAILED for $DATE"
    # Optionally ping a failure webhook here
    exit 1
fi
  • set -euo pipefail: Exits on any error. The script does not silently swallow failures.
  • --delete: Ensures the backup is an exact mirror.
  • -z: Compresses data in transit.

The destination can be a second VPS, a NAS, or an S3-compatible bucket mounted via FUSE.

Schedule daily:

bash
crontab -e
# Add:
0 4 * * * /usr/local/bin/backup-agent.sh

14. Final Audit Checklist

Before considering the node operational, verify each layer:

LayerCheckExpected Result
SSHssh -o PreferredAuthentications=password ops@<PUBLIC_IP>Permission denied (publickey)
SSHssh ops@<PUBLIC_IP>Timeout (port closed)
SSHssh ops@<TAILSCALE_IP>Success
SSHssh root@<TAILSCALE_IP>Permission denied
Firewallsudo ufw statusDefault deny, tailscale0 rules only + 41641/udp
Patchessystemctl status unattended-upgradesActive
Agentsystemctl status openclawActive, running as agent user
Sandboxcat /proc/$(pgrep -f openclaw)/status | grep NoNewPrivsNoNewPrivs: 1
Auditopenclaw security audit --deepNo high-severity warnings

References

1/16

Threat Model

No commands

Continue reading the article