Grafana 可视化:打造炫酷的实时监控大屏

Grafana 可视化:打造炫酷的实时监控大屏
引言
在云原生和 DevOps 时代,实时监控已成为运维工作的核心能力。Grafana 作为业界领先的可视化平台,以其强大的数据展示能力和灵活的配置选项,成为监控大屏的首选工具。
为什么选择 Grafana?
❌ 传统监控工具的痛点:
- 可视化效果差,难以快速发现问题
- 多数据源整合困难
- 定制化能力弱
- 协作和分享不便
✅ Grafana 的优势:
- 强大的数据可视化能力
- 支持 40+ 数据源
- 丰富的面板类型和主题
- 完善的告警和通知机制
- 活跃的用户社区
适用场景:
- 📊 服务器监控大屏
- 🌐 应用性能监控(APM)
- 📈 业务指标可视化
- 🔔 告警通知和展示
- 📱 移动端监控
适用读者: 运维工程师、SRE、数据分析师
—
Grafana 核心概念
1. 核心组件
┌─────────────────────────────────────────────────────────────┐
│ Grafana 核心概念 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Datasources(数据源) │
│ └─ 连接外部监控系统(Prometheus、InfluxDB 等) │
│ │ │
│ ├─ Panels(面板) │
│ │ └─ 单个图表或统计信息展示 │
│ │ │
│ ├─ Dashboards(仪表板) │
│ │ └─ 包含多个面板的组合 │
│ │ │
│ └─ Variables(变量) │
│ └─ 动态参数,用于灵活查询 │
│ │
└─────────────────────────────────────────────────────────────┘
2. 数据源支持
| 数据源类型 | 典型应用 |
|---|---|
| 时序数据库 | Prometheus、InfluxDB、TimescaleDB |
| 关系数据库 | MySQL、PostgreSQL、Oracle |
| 云监控 | AWS CloudWatch、Azure Monitor |
| 日志系统 | Elasticsearch、Loki |
| 消息队列 | Kafka |
3. 面板类型
- Time series(时间序列):展示指标随时间变化
- Stat(统计值):显示关键指标数值
- Graph(图表):折线图、柱状图等
- Table(表格):数据表格展示
- Heatmap(热力图):数据密度展示
- Pie chart(饼图):占比分析
- Gauge(仪表盘):进度条显示
—
安装与配置
1. Docker 安装
“`yaml
docker-compose.yml
version: ‘3.8’
services:
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- “3000:3000”
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin123
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_ROOT_URL=http://localhost:3000
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
restart: unless-stopped
healthcheck:
test: [“CMD”, “curl”, “-f”, “http://localhost:3000/api/health”]
interval: 30s
timeout: 10s
retries: 5
volumes:
grafana-data:
2. YUM 安装
bash
添加 Grafana 仓库
cat < sudo yum install grafana -y sudo systemctl start grafana-server sudo systemctl status grafana-server bash grafana-cli plugins install grafana-piechart-panel grafana-cli plugins list grafana-cli plugins enable grafana-piechart-panel grafana-cli plugins update-all yaml [users] [auth.anonymous] [security] [auth.basic] [auth.proxy] yaml apiVersion: 1 datasources: type: prometheus type: loki matcherRegex: “\\[([a-f0-9]+)\\]” yaml type: influxdb yaml type: mysql bash curl -X GET http://localhost:3000/api/datasources curl -X POST http://localhost:3000/api/datasources \ json yaml { json json yaml { yaml { yaml groups: rules: expr: 100 – (avg(rate(node_cpu_seconds_total{mode=”idle”}[5m])) by (instance) * 100) > 80 expr: (1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90 yaml { yaml { yaml { yaml [grafana.ini] [alerting] [dashboard_snapshots] Grafana 使用工作流: ├─ 安装 Grafana ├─ 选择合适面板类型 ├─ 合理布局面板 ├─ 性能监控 — *本文档最后更新时间:2026 年 04 月 28 日*启动服务
sudo systemctl enable grafana-server查看状态
3. 插件管理
安装常见插件
grafana-cli plugins install grafana-worldmap-panel
grafana-cli plugins install grafana-clock-panel查看已安装插件
启用/禁用插件
更新所有插件
4. 权限控制
配置文件 /etc/grafana/grafana.ini
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
enabled = false
admin_user = admin
admin_password = admin123
secret_key = SW2YcwTIb9zpOOhoPsMm
enabled = true
enabled = true
header_name = X-User-Name
header_property = username
auto_sign_up = true
---
数据源配置
1. Prometheus 配置
数据源配置文件
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
timeInterval: “15s”
queryTimeout: “60s”
httpMethod: POST
access: proxy
url: http://loki:3100
editable: true
jsonData:
maxLines: 1000
derivedFields:
name: TraceID
url: “$${__value.raw}”
2. InfluxDB 配置
access: proxy
url: http://influxdb:8086
database: metrics
user: admin
password: admin123
editable: true
jsonData:
httpMode: POST
version: Flux
organization: my-org
3. MySQL 配置
access: proxy
url: mysql:3306
database: metrics_db
user: grafana
password: grafana123
jsonData:
maxOpenConns: 10
maxIdleConns: 5
connMaxLifetime: 14400
4. 数据源测试
测试数据源连接
curl -X GET http://localhost:3000/api/datasources/proxy/1/api/v1/status使用 API 添加数据源
-H “Content-Type: application/json” \
-H “Authorization: Bearer YOUR_TOKEN” \
-d ‘{
“name”: “Prometheus”,
“type”: “prometheus”,
“url”: “http://prometheus:9090”,
“access”: “proxy”,
“isDefault”: true
}’
---
可视化面板
1. 基础图表配置
{
“dashboard”: {
“title”: “服务器监控”,
“panels”: [
{
“id”: 1,
“type”: “graph”,
“title”: “CPU 使用率”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “100 – (avg(rate(node_cpu_seconds_total{mode=\”idle\”}[5m])) by (instance) * 100)”,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“gridPos”: {“x”: 0, “y”: 0, “w”: 12, “h”: 8}
},
{
“id”: 2,
“type”: “graph”,
“title”: “内存使用率”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “(1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100”,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“gridPos”: {“x”: 12, “y”: 0, “w”: 12, “h”: 8}
}
],
“refresh”: “30s”,
“time”: {
“from”: “now-1h”,
“to”: “now”
}
}
}
2. 复杂统计图配置
面板配置
“id”: 3,
“type”: “stat”,
“title”: “平均响应时间”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))”,
“format”: “time_series”,
“legendFormat”: “平均响应时间”,
“refId”: “A”
}
],
“options”: {
“colorMode”: “background”,
“graphMode”: “area”,
“justifyMode”: “auto”,
“orientation”: “auto”,
“reduceOptions”: {
“calcs”: [“lastNotNull”],
“fields”: “”,
“values”: false
},
“textMode”: “auto”
},
“gridPos”: {“x”: 0, “y”: 8, “w”: 6, “h”: 4}
}
3. 热力图配置
{
“id”: 4,
“type”: “heatmap”,
“title”: “请求延迟热力图”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))”,
“format”: “heatmap”,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“options”: {
“calculate”: true,
“cellGap”: 1,
“color”: {
“exponent”: 0.5,
“fill”: “dark-orange”,
“mode”: “opacity”,
“reverse”: false,
“scale”: “exponential”,
“steps”: 5
},
“exemplars”: {
“color”: “rgba(255,0,255,0.7)”
},
“filterValues”: true,
“legend”: {
“show”: true
},
“rowsFrame”: {
“layout”: “auto”
},
“tooltip”: {
“show”: true,
“showHistogram”: true
},
“yAxis”: {
“max”: null,
“min”: null
}
},
“gridPos”: {“x”: 6, “y”: 8, “w”: 12, “h”: 6}
}
4. 统计面板配置
{
“id”: 5,
“type”: “gauge”,
“title”: “磁盘使用率”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “(1 – (node_filesystem_avail_bytes{mountpoint=\”/\”} / node_filesystem_size_bytes{mountpoint=\”/\”})) * 100″,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“options”: {
“showThresholdLabels”: true,
“showThresholdMarkers”: true,
“orientation”: “horizontal”
},
“gridPos”: {“x”: 18, “y”: 8, “w”: 6, “h”: 4}
}
---
大屏设计
1. 布局技巧
大屏布局配置
“dashboard”: {
“title”: “生产环境监控大屏”,
“timezone”: “browser”,
“refresh”: “10s”,
“schemaVersion”: 30,
“panels”: [
{
“id”: 1,
“type”: “row”,
“title”: “概览”,
“gridPos”: {“x”: 0, “y”: 0, “w”: 24, “h”: 2},
“options”: {
“collapseable”: false,
“repeat”: null,
“expandState”: “expanded”
}
},
{
“id”: 2,
“type”: “stat”,
“title”: “总服务器数”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “count(up)”,
“legendFormat”: “总数”,
“refId”: “A”
}
],
“gridPos”: {“x”: 0, “y”: 2, “w”: 4, “h”: 4}
},
{
“id”: 3,
“type”: “stat”,
“title”: “健康度”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “count(up) / scalar(count(up)) * 100”,
“legendFormat”: “百分比”,
“refId”: “A”
}
],
“gridPos”: {“x”: 4, “y”: 2, “w”: 4, “h”: 4}
}
]
}
}
2. 实时更新配置
自动刷新配置
“dashboard”: {
“refresh”: “5s”,
“timezone”: “utc”,
“liveNow”: true,
“panels”: [
{
“id”: 1,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “rate(node_cpu_seconds_total[1m])”,
“refId”: “A”,
“format”: “time_series”
}
],
“options”: {
“minSpread”: 1,
“lineWidth”: 2,
“fillOpacity”: 10,
“gradientMode”: “opacity”
}
}
]
}
}
3. 警报通知配置
告警规则配置
for: 5m
labels:
severity: warning
annotations:
summary: “CPU 使用率超过 80%”
description: “{{ $labels.instance }} 的 CPU 使用率超过了 80%”
for: 5m
labels:
severity: critical
annotations:
summary: “内存使用率超过 90%”
description: “{{ $labels.instance }} 的内存使用率超过了 90%”
---
最佳实践
1. 面板优化
性能优化配置
“dashboard”: {
“refresh”: “30s”,
“panels”: [
{
“id”: 1,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “rate(http_requests_total[5m])”,
“legendFormat”: “{{instance}}”,
“interval”: “10s”
}
],
“options”: {
“legend”: {
“show”: true,
“calcs”: [“lastNotNull”, “mean”, “max”]
}
}
}
],
“time”: {
“from”: “now-6h”,
“to”: “now”
}
}
}
2. 模板变量
变量定义
“templating”: {
“list”: [
{
“name”: “instance”,
“type”: “query”,
“datasource”: “Prometheus”,
“query”: “label_values(up, instance)”,
“refresh”: 1,
“multi”: true,
“includeAll”: true,
“allValue”: “.+”,
“current”: {
“value”: [“All”],
“text”: “All”
}
},
{
“name”: “job”,
“type”: “query”,
“datasource”: “Prometheus”,
“query”: “label_values(up, job)”,
“refresh”: 1,
“multi”: false,
“includeAll”: true
}
]
},
“panels”: [
{
“id”: 1,
“targets”: [
{
“expr”: “up{instance=~\”$instance\”,job=~\”$job\”}”,
“refId”: “A”
}
]
}
]
}
3. 团队协作
权限配置
“permissions”: {
“items”: [
{
“uid”: “dashboard-uid”,
“permissionType”: “Edit”,
“userId”: 2,
“teamId”: 1
},
{
“uid”: “dashboard-uid”,
“permissionType”: “View”,
“userId”: 3,
“teamId”: 2
}
],
“canAdmin”: true,
“canShare”: true,
“canSave”: true
}
}
---
总结
核心要点回顾
✅ Grafana 核心:Datasources、Panels、Dashboards、Variables
✅ 安装配置:Docker/YUM 安装、插件管理、权限控制
✅ 数据源:Prometheus、InfluxDB、MySQL 等连接
✅ 可视化:面板类型、查询语句、统计图
✅ 最佳实践:面板优化、模板变量、团队协作
性能优化建议
优化配置
server = {
max_request_timeout = 300
static_path = /var/lib/grafana/public
}
max_annotation_age = 24h
max_alerts_per_query = 100
external_enabled = true
external_snapshot_url = https://snapshots.raintank.io
推荐工作流
├─ 配置数据源
└─ 安装必要插件
├─ 编写查询语句
└─ 优化展示效果
├─ 设置刷新频率
└─ 添加告警通知
├─ 定期维护
└─ 版本管理
“`
*作者:creator | 适用 Grafana 9.x +*



发表评论