This article focuses on deploying a containerized Prometheus monitoring stack by building images, orchestrating services, and integrating alerting across Node Exporter, Alertmanager, Grafana, Prometheus, and cAdvisor. It addresses common issues in traditional monitoring setups, such as fragmented installation, complex configuration, and poor scalability. Keywords: Prometheus, Docker, Grafana.
Technical Specifications Snapshot
| Parameter | Description |
|---|---|
| Languages | Shell, YAML, Dockerfile |
| Base Runtime | centos:centos7.9.2009 |
| Communication Protocols | HTTP, SMTP |
| Core Ports | 9090, 9100, 9093, 9094, 3000, 8080 |
| Core Dependencies | Prometheus 2.13.0, Alertmanager 0.19.0, Grafana 6.4.1, Node Exporter 0.18.1, cAdvisor 0.37.5 |
| Orchestration Method | Docker Compose |
| Stars | Not provided in the original article |
This monitoring architecture covers hosts, containers, visualization, and alerting end to end.
Prometheus Server acts as the collection and query core. It periodically pulls metrics from target endpoints and writes them into its local time-series database. Node Exporter provides host-level metrics, cAdvisor provides container-level resource visibility, Grafana handles visualization, and Alertmanager handles alert deduplication, routing, and notifications.
AI Visual Insight: This diagram shows a typical container monitoring data flow: cAdvisor exposes container metrics, Node Exporter exposes host metrics, Prometheus pulls them into a unified backend, Grafana visualizes the results, and Alertmanager distributes anomalies as notifications. Together, they form a complete loop of collection, storage, query, visualization, and alerting.
You can quickly identify each component by its port.
- Prometheus: 9090, responsible for scraping, storage, PromQL queries, and rule evaluation.
- Node Exporter: 9100, responsible for host CPU, memory, disk, and network metrics.
- Alertmanager: 9093/9094, responsible for alert aggregation and notification delivery.
- Grafana: 3000, responsible for data source integration and dashboard visualization.
- cAdvisor: 8080, responsible for container lifecycle and resource metrics.
# Extract installation packages and import the base image
# Core logic: prepare all offline binary packages and the base image in advance
mkdir -p Monitor && cd Monitor
tar -zxvf ../Monitor.tar.gz
for i in *.tar.gz; do tar xf "$i"; done
docker load -i CentOS_7.9.2009.tar
These commands prepare the offline environment and serve as a prerequisite for all subsequent image builds.
The key to the Node Exporter image is minimizing the startup path.
Node Exporter uses direct binary deployment. The image keeps only the extracted directory, the startup script, and port 9100, which makes it ideal for quickly validating that the host monitoring pipeline works.
FROM centos:centos7.9.2009
MAINTAINER gxl [email protected]
ADD node_exporter-0.18.1.linux-amd64.tar.gz /opt
RUN mv /opt/node_exporter-0.18.1.linux-amd64 /opt/node_exporter
COPY exporter.sh /opt/node_exporter/
WORKDIR /opt/node_exporter
EXPOSE 9100
CMD ["/bin/bash","/opt/node_exporter/exporter.sh"]
This Dockerfile packages Node Exporter quickly by using the binary release directly.
The startup script should keep a single responsibility.
#!/bin/bash
# Core logic: start node_exporter in the foreground so the container's main process stays alive
./node_exporter
This script runs the exporter so that the container process remains active.
Alertmanager focuses on routing configuration and SMTP notifications.
Alertmanager does not collect metrics. Instead, it receives rule evaluation results from Prometheus. The original configuration demonstrates both webhook and SMTP email notifications. In production, you should keep one clear notification path whenever possible to avoid unnecessary configuration complexity.
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:587'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: 'your_qq_auth_code'
smtp_require_tls: true
route:
receiver: 'email'
group_by: ['alertname']
receivers:
- name: 'email'
email_configs:
- to: '[email protected]'
send_resolved: true
This configuration defines the minimum viable email alerting path.
Grafana turns PromQL results into readable dashboards.
Grafana has the simplest container build in the stack. The key is to keep grafana-server running in the foreground. After deployment, configure Prometheus as a data source named Prometheus, and then import an official or community dashboard.
FROM centos:centos7.9.2009
MAINTAINER gxl [email protected]
ADD grafana-6.4.1.linux-amd64.tar.gz /opt
RUN mv /opt/grafana-6.4.1 /opt/grafana
COPY grafana.sh /opt/grafana/bin/
WORKDIR /opt/grafana/bin/
EXPOSE 3000
CMD ["/bin/bash","/opt/grafana/bin/grafana.sh"]
This image packages the Grafana service and exposes port 3000 externally.
AI Visual Insight: This image shows the Prometheus Targets page. The key validation point is whether each job appears as UP, which directly reflects whether service discovery, network connectivity, and exporter exposure are working correctly.
AI Visual Insight: This screen is the Grafana login page, which confirms that the visualization layer is exposed successfully. The next focus should be binding the data source and importing dashboards, not the service installation itself.
AI Visual Insight: A successful Save & Test result means Grafana can reach the Prometheus API. In most cases, it also confirms that the URL, port, and container network configuration are correct.
The Prometheus configuration file determines the breadth of the monitoring surface.
The core of Prometheus is prometheus.yml. The original article configures Prometheus self-monitoring, Node Exporter, Alertmanager, Grafana, and later cAdvisor. Use explicit job names consistently to avoid query confusion caused by empty or unclear job definitions.
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['monitor-alertmanager:9093']
rule_files:
- '*rules.yml'
scrape_configs:
- job_name: 'monitor-prometheus'
static_configs:
- targets: ['monitor-prometheus:9090']
- job_name: 'monitor-node'
static_configs:
- targets: ['monitor-node:9100']
- job_name: 'monitor-cadvisor'
static_configs:
- targets: ['monitor-cadvisor:8080']
This configuration defines the scrape interval, alert integration, and core monitoring targets.
Docker Compose makes multi-service relationships repeatable and deliverable.
Compose packages exporter, alertmanager, grafana, prometheus, and cadvisor into one reproducible deployment unit. It works well for lab environments and small to medium-sized deployments. For cAdvisor, mounting host directories and enabling privileged mode are essential to collecting container metrics.
version: '3'
services:
monitor-node:
container_name: monitor-node
image: monitor-exporter:v1.0
ports:
- '9100:9100'
monitor-alertmanager:
container_name: monitor-alertmanager
image: monitor-alert:v1.0
ports:
- '9093:9093'
- '9094:9094'
monitor-grafana:
container_name: monitor-grafana
image: monitor-grafana:v1.0
ports:
- '3000:3000'
monitor-prometheus:
container_name: monitor-prometheus
image: monitor-prometheus:v1.0
ports:
- '9090:9090'
monitor-cadvisor:
container_name: monitor-cadvisor
image: monitor-cadvisor:v1.0
ports:
- '8080:8080'
privileged: true
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
This Compose file defines the complete monitoring stack and the required host mounts for cAdvisor.
cAdvisor extends monitoring depth into the container layer.
The cAdvisor image is minimal. You only need to copy the binary and grant execute permission. It requires additional mounts because it directly inspects Docker, kernel, and filesystem runtime state.
FROM centos:centos7.9.2009
MAINTAINER gxl [email protected]
COPY cadvisor /usr/bin/cadvisor
RUN chmod +x /usr/bin/cadvisor
EXPOSE 8080
CMD ["/usr/bin/cadvisor", "-logtostderr"]
This image provides container-level monitoring with minimal changes.
AI Visual Insight: This dashboard shows container monitoring after importing a cAdvisor template. It typically includes multidimensional metrics such as CPU, memory, filesystem, and network usage, which confirms that Prometheus has successfully scraped container-level data and Grafana is consuming it.
Alerting rules should be designed around actionable responses, not just metric volume.
The original article provides multiple rules for CPU, memory, network, filesystem, restart counts, and non-running states. The quality of alerting depends less on the number of rules and more on whether thresholds, durations, and notification targets match operational reality.
groups:
- name: Docker的告警规则
rules:
- alert: 容器CPU利用率高
expr: sum(rate(container_cpu_usage_seconds_total{name!=""}[5m])) by (instance,name) * 100 > 70
for: 3m
labels:
severity: warning
annotations:
summary: "实例{{ $labels.instance }}的容器{{ $labels.name }}CPU利用率高"
This rule identifies containers with sustained high CPU usage for more than three minutes.
Validate the email notification path by checking SMTP first, then template rendering.
QQ Mail alerting depends on three key points: enabling SMTP service, using an authorization code instead of the login password, and ensuring the container network can reach smtp.qq.com:587. If Save & Test succeeds but no email arrives, the problem is usually mailbox permissions or an invalid authorization code.
AI Visual Insight: This image shows a delivered alert email. The key is not the interface style but whether the alert fields correctly map severity, alertname, instance, and summary, because that directly affects first-line operational response efficiency.
FAQ
1. Why does Prometheus Targets show down?
Common causes include containers not running, ports not exposed, unreachable Compose networking, or incorrect job names and target addresses. Start by checking docker ps, container logs, and the targets entries in prometheus.yml.
2. Why does cAdvisor need privileged mode and multiple mounted directories?
Because it must read low-level host information such as /sys, /var/lib/docker, and /dev/disk to correctly correlate container, filesystem, and device metrics. Without these permissions, the exported metrics will be incomplete.
3. Why does Alertmanager still fail to send QQ Mail even when the configuration looks correct?
Check these three items first: whether SMTP is enabled, whether you are using the 16-character authorization code, and whether the container can reach smtp.qq.com:587. If you see 535 Login fail, the authorization code is almost certainly incorrect.
Core Summary
This article reconstructs a containerized deployment workflow for a Prometheus monitoring stack based on CentOS 7.9. It covers image builds for Node Exporter, Alertmanager, Grafana, Prometheus, and cAdvisor, Docker Compose orchestration, Grafana data source integration, and QQ Mail alert configuration.