Docker Swarm Monitoring¶

Automatic discovery and monitoring of Docker Swarm clusters. Services, tasks, nodes, and rolling updates — all visible without configuration.

How It Works¶

When maintenant runs on a Swarm manager node, it automatically detects Swarm mode via the Docker daemon and discovers all services, tasks, and nodes in the cluster. No configuration is needed — the same "observe without config" experience as standalone Docker containers.

Swarm services appear in the existing container dashboard, grouped by service (and stack when applicable). Standalone containers not managed by Swarm continue to appear normally alongside Swarm services.

Swarm Detection¶

On startup, maintenant queries the Docker daemon to determine:

Is the engine in Swarm mode? — If not, standard Docker monitoring is used.
Is this node a manager? — Only manager nodes have access to the Swarm management API.

Scenario	Behavior
Not a Swarm node	Standard Docker container monitoring
Swarm worker node	Standard Docker container monitoring + log message
Swarm manager node	Full Swarm monitoring (services, tasks, nodes)

Detection is fully automatic. If the node is demoted from manager to worker at runtime, maintenant gracefully degrades to container-only monitoring and broadcasts a status update.

Service Discovery¶

maintenant calls the Docker Swarm API (ServiceList, TaskList) to discover all services and their tasks. Each service is tracked with:

Service name and image
Mode — replicated (fixed replica count) or global (one task per node)
Replica health — desired vs running count (e.g., "3/3 running")
Task states — running, failed, shutdown, pending, preparing
Published ports — with protocol, target port, published port, and publish mode (ingress or host)
Attached networks — overlay, ingress, and custom networks with scope

Services with zero running tasks display "0/N running" correctly.

Real-Time Updates¶

Swarm service events (create, update, remove) are streamed in real time via the Docker event API and pushed to the browser via SSE. A new service appears in the dashboard within seconds of creation.

Startup Reconciliation¶

On startup (or after a reconnection), maintenant performs a full reconciliation — discovering all services and tasks to ensure the dashboard reflects the current cluster state, even if events were missed.

Grouping¶

Swarm services are grouped in the dashboard using two mechanisms:

Stack Grouping¶

Services deployed via docker stack deploy myapp are automatically grouped under their stack name. maintenant reads the com.docker.stack.namespace label set by Docker.

docker stack deploy -c docker-compose.yml myapp
# All services appear grouped under "myapp"

Custom Grouping¶

Override the default grouping with the maintenant.group label on the service:

# In your docker-compose.yml (for stack deploy)
services:
  api:
    image: myapp:latest
    deploy:
      labels:
        maintenant.group: "production"

Label precedence

When both com.docker.stack.namespace and maintenant.group are present, maintenant.group takes precedence.

Supported Labels¶

maintenant reads labels from Swarm service definitions (not individual containers). All standard maintenant.* labels are supported:

Label	Values	Description
`maintenant.group`	any string	Custom group name (overrides stack namespace)
`maintenant.ignore`	`true`	Exclude this service from monitoring
`maintenant.alert.severity`	`critical`, `warning`, `info`	Default alert severity
`maintenant.alert.restart_threshold`	integer	Restart loop threshold
`maintenant.alert.channels`	comma-separated	Route alerts to specific channels

Labels must be placed in the deploy.labels section (service-level), not the top-level labels section (container-level):

services:
  api:
    image: myapp:latest
    deploy:
      labels:
        maintenant.group: "backend"
        maintenant.alert.severity: "critical"
        maintenant.alert.channels: "ops-webhook"

Community vs Enterprise Features¶

Community Edition (Free)¶

Basic Swarm awareness — the "observe without config" equivalent for Swarm:

Swarm mode detection and auto-discovery
Service listing with mode, image, and replica counts
Task states (running, failed, shutdown)
Service grouping by stack and custom labels
Published ports with mode (ingress/host)
Attached network visibility
Real-time SSE updates on service events
Startup reconciliation
Worker node fallback

Enterprise Edition¶

Cluster intelligence — analysis, alerting, and a dedicated dashboard:

Node health overview — All nodes with role, status, availability, engine version, task count
Node alerting — Alerts when nodes go down, are drained, or quorum is degraded
Crash-loop detection — Automatic detection of task failure patterns (3+ failures in 5 minutes)
Replica health alerting — Alerts when running replicas fall below desired count
Rolling update tracking — Real-time progress, stall detection, rollback alerts
Dedicated Swarm dashboard — Cluster-wide view with nodes, services, task distribution, and aggregate health

API Endpoints¶

Community Edition¶

Method	Endpoint	Description
`GET`	`/api/v1/swarm/info`	Cluster status (active, cluster_id, is_manager, node counts)
`GET`	`/api/v1/swarm/services`	List all services with task summary. Supports `?stack=` filter
`GET`	`/api/v1/swarm/services/{serviceID}`	Service detail with full task list

Enterprise Edition¶

Method	Endpoint	Description
`GET`	`/api/v1/swarm/nodes`	List all nodes with status and task counts
`GET`	`/api/v1/swarm/nodes/{nodeID}`	Node detail with task list
`GET`	`/api/v1/swarm/services/{serviceID}/update-status`	Rolling update progress
`GET`	`/api/v1/swarm/dashboard`	Aggregated cluster dashboard data

SSE Events¶

Event	Edition	Description
`swarm.status`	CE	Cluster status changes (active, manager demotion)
`swarm.service_discovered`	CE	New service detected
`swarm.service_updated`	CE	Service configuration or task state changed
`swarm.service_removed`	CE	Service removed from cluster
`swarm.node_status_changed`	Enterprise	Node status or availability changed
`swarm.task_failed`	Enterprise	Individual task failure
`swarm.crash_loop_detected`	Enterprise	Crash-loop pattern detected on a service
`swarm.crash_loop_recovered`	Enterprise	Service recovered from crash-loop
`swarm.update_progress`	Enterprise	Rolling update progress tick
`swarm.update_completed`	Enterprise	Rolling update finished

Worker Node Fallback¶

When maintenant detects it is running on a worker node (Swarm active but not a manager), it:

Logs a clear message: Swarm monitoring requires a manager node
Falls back to standard Docker container monitoring
Broadcasts swarm.status with is_manager: false

There are no errors in the UI — the dashboard shows standalone containers as usual. To enable Swarm monitoring, deploy maintenant on a manager node.

Docker Labels Reference — Full label reference
Container Monitoring — Standalone container monitoring
Alert Engine — Alert routing and notification channels
Swarm Deployment Guide — Deployment requirements and setup

Docker Swarm Monitoring¶

How It Works¶

Swarm Detection¶

Service Discovery¶

Real-Time Updates¶

Startup Reconciliation¶

Grouping¶

Stack Grouping¶

Custom Grouping¶

Supported Labels¶

Community vs Enterprise Features¶

Community Edition (Free)¶

Enterprise Edition¶

API Endpoints¶

Community Edition¶

Enterprise Edition¶

SSE Events¶

Worker Node Fallback¶

Related¶