Skip to content

Multi-Host Monitoring (Pro)

Maintenant can monitor containers and workloads running on multiple remote hosts from a single central server. Remote hosts run a lightweight agent process that streams events to the server over a persistent gRPC connection.


Modes

The maintenant binary supports three operating modes via the --mode= flag:

Mode Description
embedded Default. Monitors the local runtime and stores data in local SQLite. No network exposure.
server Central server. Receives events from remote agents via gRPC. Exposes the web UI and REST API. Pro only.
agent Remote agent. Monitors the local runtime and pushes events to a central server. Pro only.

Community Edition enforces --mode=embedded at boot. Starting with --mode=server or --mode=agent exits with an error.


Server Setup

Network interfaces

The server exposes two interfaces:

Interface Default Purpose
HTTP 127.0.0.1:8080 REST API + web UI (reverse-proxied)
gRPC/TLS 127.0.0.1:8443 Agent event ingestion

The gRPC port must be reachable from agent hosts. Configure the public URL so that generated install commands point to the right address:

MAINTENANT_GRPC_PUBLIC_URL=grpcs://monitoring.example.com

The port is optional and defaults to 443 — only add it (e.g. :8443) if agents reach gRPC directly instead of through a reverse proxy/DNS terminating TLS on 443.

If not set, the server infers the URL from the HTTP request headers (X-Forwarded-Host, Host). A warning is shown in the UI if the resolved URL appears to be a private address.

Starting the server

maintenant \
  --mode=server \
  --grpc-listen=127.0.0.1:8443 \
  --grpc-tls-cert=/etc/maintenant/tls.crt \
  --grpc-tls-key=/etc/maintenant/tls.key

Plaintext gRPC is not supported. A valid TLS certificate is required.


Agent Enrollment

Agents authenticate with the server using a one-time enrollment token and an Ed25519 keypair generated locally on first boot.

1. Generate an enrollment token

From the web UI: Agents → Generate enrollment token

Or via API:

POST /api/v1/agents/enrollment-tokens
Authorization: Bearer <session>
Content-Type: application/json

{ "ttl_hours": 24 }

The response contains: - token — the cleartext token, shown once only - install_templates — a map of ready-to-run install snippets, one per environment. Keys: - standalonecurl … | sudo bash invocation of the install script (binary + systemd unit) - docker_run — single docker run command with the right socket/proc mounts - docker_composecompose.yml snippet - kubernetes — Namespace + Secret + RBAC + DaemonSet manifest - warnings — present if the server URL appears to be a private/local address

The token cannot be retrieved again after creation. If lost, delete it and generate a new one.

2. Run the install command on the remote host

Pick the snippet matching the host environment (the UI exposes these as tabs in the modal). For a bare-metal/VM host, the standalone snippet is:

curl -fsSL https://install.maintenant.dev | sudo bash -s -- \
  --mode=agent \
  --server=grpcs://monitoring.example.com \
  --enrollment-token=mnt_enr_XXXXXXXXXXXXXXXX

For a host where the binary is already installed, the equivalent invocation is:

maintenant \
  --mode=agent \
  --server=grpcs://monitoring.example.com \
  --enrollment-token=mnt_enr_XXXXXXXXXXXXXXXX \
  --label="prod-worker-01"

On first boot the agent: 1. Detects the local runtime (Docker, Swarm, or Kubernetes) 2. Generates an Ed25519 keypair and persists it to identity.json (mode 0600) 3. Calls RegisterAgent on the server with the token and public key 4. Marks itself as enrolled and enters the streaming loop

The --label flag sets a human-readable display name (max 64 chars). If omitted, the hostname is used.


Streaming Protocol

Once enrolled, the agent maintains a persistent bidirectional gRPC stream to the server.

Authentication handshake (per stream)

Every time the agent connects or reconnects, it performs a challenge-response authentication:

Server → AuthChallenge { nonce: 32 random bytes }
Agent  → AuthResponse  { agent_id, timestamp, signature }
         where signature = Ed25519.Sign(private_key,
                             nonce || agent_uuid_bytes || timestamp_be64)
Server → validates signature, checks agent status = active,
         checks |server_time - client_time| ≤ 300s

This design requires no PKI infrastructure. The agent's public key is registered once at enrollment.

Event collection

After authentication the agent collects and pushes events continuously:

Event type Frequency
Container start/stop/die/pause Real-time
Per-container resource metrics (CPU, memory, network, disk) Every 10 s
Host-level metrics (machine CPU, memory, disk) Every 10 s
Certificate scans Every 60 s

Reconnection

If the connection drops, the agent reconnects automatically with exponential backoff:

delay = min(60s, 1s × 2^attempt) ± 25% jitter

The attempt counter resets to 0 if the previous stream was stable for more than 30 seconds. Reconnection stops permanently only if the server responds with agent_revoked.

Rate limiting

The server enforces a per-agent limit of 1 000 events/second (token bucket). If exceeded, the server sends an in-stream error with a retry_after_ms hint.


Per-Host Resource Metrics

In addition to per-container stats, each agent reports the machine-level CPU, memory and disk usage of the host it runs on. The central server keeps the latest sample for every host in memory (local server + each agent) and exposes it to the UI.

Host selector

The dashboard's resource header (CPU / MEM / DISK gauges) shows a host selector as soon as more than one host is present. Pick a host to scope the gauges to that machine; the top consumers widget follows the same selection. With a single host the selector is hidden and behaviour is unchanged.

Each container card carries a host badge (hostname / label) so you can tell at a glance which machine a workload runs on. The badge is hidden when every visible container lives on the same host — there is nothing to disambiguate.

Endpoints

Endpoint Description
GET /api/v1/resources/hosts Lists every host (local + agents) with its current CPU / memory / disk and running-container count.
GET /api/v1/resources/summary?agent_id=local\|<id> Resource summary scoped to one host. Omitting agent_id returns the local server.
GET /api/v1/resources/top?...&agent_id=local\|<id> Top consumers scoped to one host. Omitting agent_id aggregates all hosts.

Requirements

Host CPU and memory are read from /proc. When the agent runs inside a container it needs the host /proc mounted read-only:

-v /proc:/host/proc:ro

The generated docker run, Compose and Kubernetes install snippets already include this mount, so no extra configuration is needed when you use them. Bare-metal/systemd agents read /proc natively.


Data Model

All monitored entities (containers, endpoints, heartbeats, resources, certificates) carry an agent_id column:

Value Meaning
NULL Local origin — mode embedded or embedded agent on the server
<uuid> Event from a remote agent

Deleting an agent purges all its associated rows via SQL ON DELETE CASCADE.


Agent Management

From Agents in the web UI (Pro):

Action Effect
Revoke Closes the active stream immediately. Agent receives PermissionDenied: agent_revoked and stops retrying.
Delete Revokes the stream and purges all historical events for that agent in a single transaction. Irreversible.
Edit label Updates the display name (max 64 chars). Takes effect on next UI refresh.

Agent status is updated in real time via SSE. The connection_state field reflects whether the agent is actively streaming (connected) or has not been seen for more than 60 seconds (disconnected).


Security

Concern Mechanism
Transport encryption TLS required on gRPC. Plaintext rejected at server boot.
Per-stream authentication Ed25519 challenge-response, fresh nonce per connection
Clock skew tolerance ±5 minutes between agent and server clocks
Token exposure Cleartext shown once at creation, stored hashed, masked in subsequent reads
Key compromise Revoke the agent from the UI; re-enroll generates a new keypair
CE enforcement All /api/v1/agents/* endpoints return 402 pro_required in Community Edition

--grpc-insecure-skip-tls-verify

Available for development and testing against self-signed certificates. A boot-time warning is logged. Do not use in production.


Configuration Reference

Variable / Flag Default Description
MAINTENANT_GRPC_LISTEN / --grpc-listen 127.0.0.1:8443 gRPC bind address (server mode)
MAINTENANT_GRPC_PUBLIC_URL / --grpc-public-url (inferred) Public gRPC URL injected into install commands
--grpc-tls-cert Path to TLS certificate (server mode)
--grpc-tls-key Path to TLS private key (server mode)
--grpc-insecure-skip-tls-verify false Skip TLS cert verification (agent mode, dev only)
--server Server gRPC URL (agent mode, e.g. grpcs://monitoring.example.com; port defaults to 443)
--enrollment-token One-time enrollment token (agent mode, first boot only)
--label (hostname) Display label for this agent
--runtime (auto-detected) Override runtime detection: docker, swarm, kubernetes
MAINTENANT_AGENT_RATE_LIMIT_PER_SECOND 1000 Max events/s per agent (server mode)
MAINTENANT_AGENT_STALE_THRESHOLD_SECONDS 60 Seconds before an agent is considered disconnected