Skip to content

docs: address issue #24139#24148

Open
github-actions[bot] wants to merge 1 commit intomainfrom
agent/issue-24139
Open

docs: address issue #24139#24148
github-actions[bot] wants to merge 1 commit intomainfrom
agent/issue-24139

Conversation

@github-actions
Copy link
Contributor

Summary

Added comprehensive documentation of Docker Prometheus metrics including descriptions, types, and usage examples.

Changes

  • Added "Available metrics" section to Prometheus documentation
  • Documented metric types (Counter, Gauge, Histogram) with explanations
  • Explained histogram suffixes (_bucket, _count, _sum) with practical examples
  • Added table of 11 engine_daemon_* metrics with descriptions
  • Added table of 15 swarm_* metrics with descriptions
  • Organized metrics by category (Engine vs Swarm) for easy reference

Fixes #24139
Fixes #19759


🤖 Generated with cagent

This change was automatically generated by the documentation agent team
in response to issue #24139.

🤖 Generated with cagent
@netlify
Copy link

netlify bot commented Feb 16, 2026

Deploy Preview for docsdocker ready!

Name Link
🔨 Latest commit 77294e1
🔍 Latest deploy log https://app.netlify.com/projects/docsdocker/deploys/6992fa58c2dbcc0008914ed1
😎 Deploy Preview https://deploy-preview-24148--docsdocker.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@usha-mandya
Copy link
Member

@thaJeztah Could you PTAL?

Copy link
Contributor

@dvdksn dvdksn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked opus to review this PR against moby source code and here's what it had to say:


Thanks for working on this — documenting Prometheus metrics is valuable. I cross-referenced the metrics descriptions against the moby/moby source code and found several inaccuracies that need to be fixed before this can be merged.

Incorrect label references (engine metrics)

  • engine_daemon_container_actions_seconds: The description says "start, stop, create, etc." but the actual labels in the source are: start, changes, commit, create, delete. There is no stop label. (source: daemon/internal/metrics/metrics.go:56-64)
  • engine_daemon_network_actions_seconds: The description says "create, connect, disconnect, etc." but the actual labels used in the source are: update, allocate, connect, release. There is no create or disconnect label. (source: daemon/container_operations.go)
  • engine_daemon_events_total: The description says "Labels indicate the event action and type" but this is a plain counter with no labels. It's defined as metricsNS.NewCounter("events", "The number of events logged"). (source: daemon/internal/metrics/metrics.go:49)

Inaccurate descriptions (swarm metrics)

  • swarm_dispatcher_scheduling_delay_seconds: PR says "Time from task creation to scheduling decision." The source help text says: "Scheduling delay is the time a task takes to go from NEW to RUNNING state." — this measures the full NEW→RUNNING duration, not just scheduling. (source: vendor/.../dispatcher/dispatcher.go:74-75)
  • swarm_raft_snapshot_latency_seconds: PR says "Time taken to create and restore Raft snapshots." The source says "Raft snapshot create latency." — "restore" is not part of this metric. (source: vendor/.../raft/storage.go:26-27)
  • swarm_raft_transaction_latency_seconds: PR says "Time taken to commit Raft transactions. Measures consensus performance." The source just says "Raft transaction latency." The additions ("commit", "consensus performance") are not grounded in the source. (source: vendor/.../raft/raft.go:207)
  • swarm_store_lookup_latency_seconds: PR says "Time taken for lookup operations in the swarm store." The source help text is "Raft store read latency." (source: vendor/.../store/memory.go:105-106)
  • swarm_store_memory_store_lock_duration_seconds: PR says "Duration of lock acquisitions in the memory store." The source says "Duration for which the raft memory store lock was held." These are different things — acquisition time vs. hold duration. (source: vendor/.../store/memory.go:109-110)

Missing metrics

The following metrics exist in the source but are not documented in the PR:

  • engine_daemon_image_actions_seconds (Histogram) — "The number of seconds it takes to process each image action", labels: delete, push, history, pull
  • engine_daemon_health_check_start_duration_seconds (Histogram) — "The number of seconds it takes to prepare to run health checks"
  • swarm_node_info (LabeledGauge) — "Information related to the swarm", labels: swarm_id, node_id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document Prometheus metrics Describe prometheus metrics

3 participants