Prometheus Monitoring: Complete Setup & Best Practices
Set up robust infrastructure monitoring with Prometheus
Prometheus has become the de facto standard for monitoring cloud-native applications and infrastructure, offering metrics collection, querying, and integration with visualization tools.

What is Prometheus?
Prometheus is an open-source monitoring and alerting toolkit originally developed at SoundCloud in 2012 and now a Cloud Native Computing Foundation (CNCF) graduated project. It’s specifically designed for reliability and scalability in dynamic cloud environments, making it the go-to solution for monitoring microservices, containers, and Kubernetes clusters.
Key Features
Time-Series Database: Prometheus stores all data as time-series, identified by metric names and key-value pairs (labels), enabling flexible and powerful querying capabilities.
Pull-Based Model: Unlike traditional push-based systems, Prometheus actively scrapes metrics from configured targets at specified intervals, making it more reliable and easier to configure.
PromQL Query Language: A powerful functional query language allows you to slice and dice your metrics data in real-time, performing aggregations, transformations, and complex calculations.
Service Discovery: Automatic discovery of monitoring targets through various mechanisms including Kubernetes, Consul, EC2, and static configurations.
No External Dependencies: Prometheus operates as a single binary with no required external dependencies, simplifying deployment and reducing operational complexity.
Built-in Alerting: AlertManager handles alerts from Prometheus, providing deduplication, grouping, and routing to notification channels like email, PagerDuty, or Slack.
Architecture Overview
Understanding Prometheus architecture is crucial for effective deployment. The main components include:
- Prometheus Server: Scrapes and stores metrics, evaluates rules, and serves queries
- Client Libraries: Instrument application code to expose metrics
- Exporters: Bridge third-party systems to Prometheus format
- AlertManager: Handles alerts and notifications
- Pushgateway: Accepts metrics from short-lived jobs that can’t be scraped
The typical data flow: Applications expose metrics endpoints → Prometheus scrapes these endpoints → Data is stored in time-series database → PromQL queries retrieve and analyze data → Alerts are generated based on rules → AlertManager processes and routes notifications.
When deploying infrastructure on Ubuntu 24.04, Prometheus provides an excellent foundation for comprehensive monitoring.
Installing Prometheus on Ubuntu
Let’s walk through installing Prometheus on a Linux system. We’ll use Ubuntu as the example, but the process is similar for other distributions.
Download and Install
First, create a dedicated user for Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
Download the latest Prometheus release:
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
tar xvf prometheus-2.48.0.linux-amd64.tar.gz
cd prometheus-2.48.0.linux-amd64
Copy binaries and create directories:
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
sudo cp -r consoles /etc/prometheus
sudo cp -r console_libraries /etc/prometheus
sudo cp prometheus.yml /etc/prometheus/prometheus.yml
sudo chown -R prometheus:prometheus /etc/prometheus
For package management on Ubuntu, refer to our comprehensive Ubuntu Package Management guide.
Configure Prometheus
Edit /etc/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- 'alert_rules.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
Create Systemd Service
Create /etc/systemd/system/prometheus.service:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--storage.tsdb.retention.time=30d
[Install]
WantedBy=multi-user.target
Start and enable Prometheus:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
sudo systemctl status prometheus
Access the Prometheus web interface at http://localhost:9090.
Setting Up Node Exporter
Node Exporter exposes hardware and OS metrics for Linux systems. Install it to monitor your servers:
cd /tmp
wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvf node_exporter-1.7.0.linux-amd64.tar.gz
sudo cp node_exporter-1.7.0.linux-amd64/node_exporter /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/node_exporter
Create systemd service /etc/systemd/system/node_exporter.service:
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
Start Node Exporter:
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
Node Exporter now exposes metrics on port 9100.
Understanding PromQL
PromQL (Prometheus Query Language) is the heart of querying Prometheus data. Here are essential query patterns:
Basic Queries
Select all time-series for a metric:
node_cpu_seconds_total
Filter by label:
node_cpu_seconds_total{mode="idle"}
Multiple label filters:
node_cpu_seconds_total{mode="idle",cpu="0"}
Range Vectors and Aggregations
Calculate rate over time:
rate(node_cpu_seconds_total{mode="idle"}[5m])
Sum across all CPUs:
sum(rate(node_cpu_seconds_total{mode="idle"}[5m]))
Group by label:
sum by (mode) (rate(node_cpu_seconds_total[5m]))
Practical Examples
CPU usage percentage:
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Memory usage:
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100
Disk usage:
(node_filesystem_size_bytes - node_filesystem_avail_bytes) / node_filesystem_size_bytes * 100
Network traffic rate:
rate(node_network_receive_bytes_total[5m])
Docker Deployment
Running Prometheus in Docker containers offers flexibility and easier management:
Create docker-compose.yml:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
ports:
- "9090:9090"
restart: unless-stopped
node_exporter:
image: prom/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
volumes:
- '/:/host:ro,rslave'
ports:
- "9100:9100"
restart: unless-stopped
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
- alertmanager_data:/alertmanager
ports:
- "9093:9093"
restart: unless-stopped
volumes:
prometheus_data:
alertmanager_data:
Start the stack:
docker-compose up -d
Kubernetes Monitoring
Prometheus excels at monitoring Kubernetes clusters. The kube-prometheus-stack Helm chart provides a complete monitoring solution.
Install using Helm:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack
This installs:
- Prometheus Operator
- Prometheus instance
- AlertManager
- Grafana
- Node Exporter
- kube-state-metrics
- Pre-configured dashboards and alerts
Access Grafana:
kubectl port-forward svc/prometheus-grafana 3000:80
Default credentials: admin/prom-operator
For various Kubernetes distributions, the deployment process is similar with minor adjustments for platform-specific features.
Setting Up Alerting
AlertManager handles alerts sent by Prometheus. Configure alert rules and notification channels.
Alert Rules
Create /etc/prometheus/alert_rules.yml:
groups:
- name: system_alerts
interval: 30s
rules:
- alert: HighCPUUsage
expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% (current value: {{ $value }}%)"
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 85% (current value: {{ $value }}%)"
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 15
for: 10m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Available disk space is below 15% on {{ $labels.mountpoint }}"
- alert: InstanceDown
expr: up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} is down"
description: "{{ $labels.job }} instance {{ $labels.instance }} has been down for more than 2 minutes"
AlertManager Configuration
Create /etc/prometheus/alertmanager.yml:
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: 'alerts@example.com'
smtp_auth_username: 'alerts@example.com'
smtp_auth_password: 'your-password'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'team-email'
routes:
- match:
severity: critical
receiver: 'team-pagerduty'
- match:
severity: warning
receiver: 'team-slack'
receivers:
- name: 'team-email'
email_configs:
- to: 'team@example.com'
headers:
Subject: '{{ .GroupLabels.alertname }}: {{ .Status | toUpper }}'
- name: 'team-slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
channel: '#alerts'
title: 'Alert: {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
- name: 'team-pagerduty'
pagerduty_configs:
- service_key: 'your-pagerduty-key'
Integration with Grafana
While Prometheus has a basic web interface, Grafana provides superior visualization capabilities for creating comprehensive dashboards.
Add Prometheus as Data Source
- Open Grafana and navigate to Configuration → Data Sources
- Click “Add data source”
- Select “Prometheus”
- Set URL to
http://localhost:9090(or your Prometheus server) - Click “Save & Test”
Popular Dashboard IDs
Import pre-built dashboards from grafana.com:
- Node Exporter Full (ID: 1860): Comprehensive Linux metrics
- Kubernetes Cluster Monitoring (ID: 7249): K8s overview
- Docker Container Monitoring (ID: 193): Container metrics
- Prometheus Stats (ID: 2): Prometheus internal metrics
Creating Custom Dashboards
Create panels using PromQL queries:
{
"title": "CPU Usage",
"targets": [{
"expr": "100 - (avg(rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)"
}]
}
Popular Exporters
Extend Prometheus monitoring with specialized exporters:
Blackbox Exporter
Probes endpoints over HTTP, HTTPS, DNS, TCP, and ICMP:
scrape_configs:
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://example.com
- https://api.example.com
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: localhost:9115
Database Exporters
- mysqld_exporter: MySQL/MariaDB metrics
- postgres_exporter: PostgreSQL metrics
- mongodb_exporter: MongoDB metrics
- redis_exporter: Redis metrics
Application Exporters
- nginx_exporter: NGINX web server metrics
- apache_exporter: Apache HTTP server metrics
- haproxy_exporter: HAProxy load balancer metrics
Cloud Exporters
- cloudwatch_exporter: AWS CloudWatch metrics
- stackdriver_exporter: Google Cloud metrics
- azure_exporter: Azure Monitor metrics
Best Practices
Data Retention
Configure appropriate retention based on your needs:
--storage.tsdb.retention.time=30d
--storage.tsdb.retention.size=50GB
Recording Rules
Pre-calculate frequently queried expressions:
groups:
- name: example_rules
interval: 30s
rules:
- record: job:node_cpu_utilization:avg
expr: 100 - (avg by (job) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
Label Management
- Keep label cardinality low
- Use consistent naming conventions
- Avoid high-cardinality labels (user IDs, timestamps)
Security
- Enable authentication and HTTPS
- Restrict access to Prometheus API
- Use network policies in Kubernetes
- Implement RBAC for sensitive metrics
High Availability
- Run multiple Prometheus instances
- Use Thanos or Cortex for long-term storage
- Implement federation for hierarchical setups
Troubleshooting Common Issues
High Memory Usage
- Reduce scrape frequency
- Decrease retention period
- Optimize PromQL queries
- Implement recording rules
Missing Metrics
- Check target status in
/targets - Verify network connectivity
- Validate scrape configuration
- Check exporter logs
Slow Queries
- Use recording rules for complex aggregations
- Optimize label filters
- Reduce time range
- Add indices if using remote storage
Performance Optimization
Query Optimization
# Bad: High cardinality
sum(rate(http_requests_total[5m]))
# Good: Group by relevant labels
sum by (status, method) (rate(http_requests_total[5m]))
Resource Limits
For Kubernetes deployments:
resources:
requests:
memory: "2Gi"
cpu: "1000m"
limits:
memory: "4Gi"
cpu: "2000m"
Conclusion
Prometheus provides a robust, scalable monitoring solution for modern infrastructure. Its pull-based architecture, powerful query language, and extensive ecosystem of exporters make it ideal for monitoring everything from bare-metal servers to complex Kubernetes clusters.
By combining Prometheus with Grafana for visualization and AlertManager for notifications, you create a comprehensive observability platform capable of handling enterprise-scale monitoring requirements. The active community and CNCF backing ensure continued development and support.
Start with basic metrics collection, gradually add exporters for your specific services, and refine your alerting rules based on real-world experience. Prometheus scales with your infrastructure, from single-server deployments to multi-datacenter monitoring architectures.
Related Resources
- How to Install Ubuntu 24.04 & useful tools
- Ubuntu Package Management: APT and dpkg Cheatsheet
- Install and Use Grafana on Ubuntu: Complete Guide
- Kubernetes Cheatsheet
- Kubernetes distributions - quick overview of kubeadm, k3s, MicroK8s, Minikube, Talos Linux and RKE2
- Docker Cheatsheet