Grafana 可视化:打造炫酷的实时监控大屏

Grafana 可视化:打造炫酷的实时监控大屏

Grafana 可视化:打造炫酷的实时监控大屏

引言

在云原生和 DevOps 时代,实时监控已成为运维工作的核心能力。Grafana 作为业界领先的可视化平台,以其强大的数据展示能力和灵活的配置选项,成为监控大屏的首选工具。

为什么选择 Grafana?

❌ 传统监控工具的痛点:
  • 可视化效果差,难以快速发现问题
  • 多数据源整合困难
  • 定制化能力弱
  • 协作和分享不便
✅ Grafana 的优势:
  • 强大的数据可视化能力
  • 支持 40+ 数据源
  • 丰富的面板类型和主题
  • 完善的告警和通知机制
  • 活跃的用户社区

适用场景:

  • 📊 服务器监控大屏
  • 🌐 应用性能监控(APM)
  • 📈 业务指标可视化
  • 🔔 告警通知和展示
  • 📱 移动端监控

适用读者: 运维工程师、SRE、数据分析师

Grafana 核心概念

1. 核心组件

┌─────────────────────────────────────────────────────────────┐
│                   Grafana 核心概念                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Datasources(数据源)                                      │
│  └─ 连接外部监控系统(Prometheus、InfluxDB 等)              │
│      │                                                      │
│      ├─ Panels(面板)                                      │
│      │   └─ 单个图表或统计信息展示                           │
│      │                                                             │
│      ├─ Dashboards(仪表板)                                │
│      │   └─ 包含多个面板的组合                               │
│      │                                                             │
│      └─ Variables(变量)                                   │
│          └─ 动态参数,用于灵活查询                            │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2. 数据源支持

数据源类型 典型应用
时序数据库 Prometheus、InfluxDB、TimescaleDB
关系数据库 MySQL、PostgreSQL、Oracle
云监控 AWS CloudWatch、Azure Monitor
日志系统 Elasticsearch、Loki
消息队列 Kafka

3. 面板类型

  • Time series(时间序列):展示指标随时间变化
  • Stat(统计值):显示关键指标数值
  • Graph(图表):折线图、柱状图等
  • Table(表格):数据表格展示
  • Heatmap(热力图):数据密度展示
  • Pie chart(饼图):占比分析
  • Gauge(仪表盘):进度条显示

安装与配置

1. Docker 安装

“`yaml

docker-compose.yml

version: ‘3.8’

services:
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:

  • “3000:3000”

environment:

  • GF_SECURITY_ADMIN_PASSWORD=admin123
  • GF_USERS_ALLOW_SIGN_UP=false
  • GF_SERVER_ROOT_URL=http://localhost:3000

volumes:

  • grafana-data:/var/lib/grafana
  • ./grafana/provisioning:/etc/grafana/provisioning

restart: unless-stopped
healthcheck:
test: [“CMD”, “curl”, “-f”, “http://localhost:3000/api/health”]
interval: 30s
timeout: 10s
retries: 5

volumes:
grafana-data:


2. YUM 安装

bash

添加 Grafana 仓库

cat <安装 Grafana

sudo yum install grafana -y

启动服务

sudo systemctl start grafana-server
sudo systemctl enable grafana-server

查看状态

sudo systemctl status grafana-server


3. 插件管理

bash

安装常见插件

grafana-cli plugins install grafana-piechart-panel
grafana-cli plugins install grafana-worldmap-panel
grafana-cli plugins install grafana-clock-panel

查看已安装插件

grafana-cli plugins list

启用/禁用插件

grafana-cli plugins enable grafana-piechart-panel

更新所有插件

grafana-cli plugins update-all


4. 权限控制

yaml

配置文件 /etc/grafana/grafana.ini

[users]
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer

[auth.anonymous]
enabled = false

[security]
admin_user = admin
admin_password = admin123
secret_key = SW2YcwTIb9zpOOhoPsMm

[auth.basic]
enabled = true

[auth.proxy]
enabled = true
header_name = X-User-Name
header_property = username
auto_sign_up = true


---

数据源配置

1. Prometheus 配置

yaml

数据源配置文件

apiVersion: 1

datasources:

  • name: Prometheus

type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: true
jsonData:
timeInterval: “15s”
queryTimeout: “60s”
httpMethod: POST

  • name: Loki

type: loki
access: proxy
url: http://loki:3100
editable: true
jsonData:
maxLines: 1000
derivedFields:

  • datasourceUid: tempo

matcherRegex: “\\[([a-f0-9]+)\\]”
name: TraceID
url: “$${__value.raw}”


2. InfluxDB 配置

yaml

  • name: InfluxDB

type: influxdb
access: proxy
url: http://influxdb:8086
database: metrics
user: admin
password: admin123
editable: true
jsonData:
httpMode: POST
version: Flux
organization: my-org


3. MySQL 配置

yaml

  • name: MySQL

type: mysql
access: proxy
url: mysql:3306
database: metrics_db
user: grafana
password: grafana123
jsonData:
maxOpenConns: 10
maxIdleConns: 5
connMaxLifetime: 14400


4. 数据源测试

bash

测试数据源连接

curl -X GET http://localhost:3000/api/datasources
curl -X GET http://localhost:3000/api/datasources/proxy/1/api/v1/status

使用 API 添加数据源

curl -X POST http://localhost:3000/api/datasources \
-H “Content-Type: application/json” \
-H “Authorization: Bearer YOUR_TOKEN” \
-d ‘{
“name”: “Prometheus”,
“type”: “prometheus”,
“url”: “http://prometheus:9090”,
“access”: “proxy”,
“isDefault”: true
}’


---

可视化面板

1. 基础图表配置

json
{
“dashboard”: {
“title”: “服务器监控”,
“panels”: [
{
“id”: 1,
“type”: “graph”,
“title”: “CPU 使用率”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “100 – (avg(rate(node_cpu_seconds_total{mode=\”idle\”}[5m])) by (instance) * 100)”,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“gridPos”: {“x”: 0, “y”: 0, “w”: 12, “h”: 8}
},
{
“id”: 2,
“type”: “graph”,
“title”: “内存使用率”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “(1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100”,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“gridPos”: {“x”: 12, “y”: 0, “w”: 12, “h”: 8}
}
],
“refresh”: “30s”,
“time”: {
“from”: “now-1h”,
“to”: “now”
}
}
}


2. 复杂统计图配置

yaml

面板配置

{
“id”: 3,
“type”: “stat”,
“title”: “平均响应时间”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “avg(rate(http_request_duration_seconds_sum[5m]) / rate(http_request_duration_seconds_count[5m]))”,
“format”: “time_series”,
“legendFormat”: “平均响应时间”,
“refId”: “A”
}
],
“options”: {
“colorMode”: “background”,
“graphMode”: “area”,
“justifyMode”: “auto”,
“orientation”: “auto”,
“reduceOptions”: {
“calcs”: [“lastNotNull”],
“fields”: “”,
“values”: false
},
“textMode”: “auto”
},
“gridPos”: {“x”: 0, “y”: 8, “w”: 6, “h”: 4}
}


3. 热力图配置

json
{
“id”: 4,
“type”: “heatmap”,
“title”: “请求延迟热力图”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))”,
“format”: “heatmap”,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“options”: {
“calculate”: true,
“cellGap”: 1,
“color”: {
“exponent”: 0.5,
“fill”: “dark-orange”,
“mode”: “opacity”,
“reverse”: false,
“scale”: “exponential”,
“steps”: 5
},
“exemplars”: {
“color”: “rgba(255,0,255,0.7)”
},
“filterValues”: true,
“legend”: {
“show”: true
},
“rowsFrame”: {
“layout”: “auto”
},
“tooltip”: {
“show”: true,
“showHistogram”: true
},
“yAxis”: {
“max”: null,
“min”: null
}
},
“gridPos”: {“x”: 6, “y”: 8, “w”: 12, “h”: 6}
}


4. 统计面板配置

json
{
“id”: 5,
“type”: “gauge”,
“title”: “磁盘使用率”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “(1 – (node_filesystem_avail_bytes{mountpoint=\”/\”} / node_filesystem_size_bytes{mountpoint=\”/\”})) * 100″,
“legendFormat”: “{{instance}}”,
“refId”: “A”
}
],
“options”: {
“showThresholdLabels”: true,
“showThresholdMarkers”: true,
“orientation”: “horizontal”
},
“gridPos”: {“x”: 18, “y”: 8, “w”: 6, “h”: 4}
}


---

大屏设计

1. 布局技巧

yaml

大屏布局配置

{
“dashboard”: {
“title”: “生产环境监控大屏”,
“timezone”: “browser”,
“refresh”: “10s”,
“schemaVersion”: 30,
“panels”: [
{
“id”: 1,
“type”: “row”,
“title”: “概览”,
“gridPos”: {“x”: 0, “y”: 0, “w”: 24, “h”: 2},
“options”: {
“collapseable”: false,
“repeat”: null,
“expandState”: “expanded”
}
},
{
“id”: 2,
“type”: “stat”,
“title”: “总服务器数”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “count(up)”,
“legendFormat”: “总数”,
“refId”: “A”
}
],
“gridPos”: {“x”: 0, “y”: 2, “w”: 4, “h”: 4}
},
{
“id”: 3,
“type”: “stat”,
“title”: “健康度”,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “count(up) / scalar(count(up)) * 100”,
“legendFormat”: “百分比”,
“refId”: “A”
}
],
“gridPos”: {“x”: 4, “y”: 2, “w”: 4, “h”: 4}
}
]
}
}


2. 实时更新配置

yaml

自动刷新配置

{
“dashboard”: {
“refresh”: “5s”,
“timezone”: “utc”,
“liveNow”: true,
“panels”: [
{
“id”: 1,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “rate(node_cpu_seconds_total[1m])”,
“refId”: “A”,
“format”: “time_series”
}
],
“options”: {
“minSpread”: 1,
“lineWidth”: 2,
“fillOpacity”: 10,
“gradientMode”: “opacity”
}
}
]
}
}


3. 警报通知配置

yaml

告警规则配置

groups:

  • name: grafana_alerts

rules:

  • alert: HighCPUUsage

expr: 100 – (avg(rate(node_cpu_seconds_total{mode=”idle”}[5m])) by (instance) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: “CPU 使用率超过 80%”
description: “{{ $labels.instance }} 的 CPU 使用率超过了 80%”

  • alert: HighMemoryUsage

expr: (1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: “内存使用率超过 90%”
description: “{{ $labels.instance }} 的内存使用率超过了 90%”


---

最佳实践

1. 面板优化

yaml

性能优化配置

{
“dashboard”: {
“refresh”: “30s”,
“panels”: [
{
“id”: 1,
“datasource”: “Prometheus”,
“targets”: [
{
“expr”: “rate(http_requests_total[5m])”,
“legendFormat”: “{{instance}}”,
“interval”: “10s”
}
],
“options”: {
“legend”: {
“show”: true,
“calcs”: [“lastNotNull”, “mean”, “max”]
}
}
}
],
“time”: {
“from”: “now-6h”,
“to”: “now”
}
}
}


2. 模板变量

yaml

变量定义

{
“templating”: {
“list”: [
{
“name”: “instance”,
“type”: “query”,
“datasource”: “Prometheus”,
“query”: “label_values(up, instance)”,
“refresh”: 1,
“multi”: true,
“includeAll”: true,
“allValue”: “.+”,
“current”: {
“value”: [“All”],
“text”: “All”
}
},
{
“name”: “job”,
“type”: “query”,
“datasource”: “Prometheus”,
“query”: “label_values(up, job)”,
“refresh”: 1,
“multi”: false,
“includeAll”: true
}
]
},
“panels”: [
{
“id”: 1,
“targets”: [
{
“expr”: “up{instance=~\”$instance\”,job=~\”$job\”}”,
“refId”: “A”
}
]
}
]
}


3. 团队协作

yaml

权限配置

{
“permissions”: {
“items”: [
{
“uid”: “dashboard-uid”,
“permissionType”: “Edit”,
“userId”: 2,
“teamId”: 1
},
{
“uid”: “dashboard-uid”,
“permissionType”: “View”,
“userId”: 3,
“teamId”: 2
}
],
“canAdmin”: true,
“canShare”: true,
“canSave”: true
}
}


---

总结

核心要点回顾

Grafana 核心:Datasources、Panels、Dashboards、Variables ✅ 安装配置:Docker/YUM 安装、插件管理、权限控制 ✅ 数据源:Prometheus、InfluxDB、MySQL 等连接 ✅ 可视化:面板类型、查询语句、统计图 ✅ 最佳实践:面板优化、模板变量、团队协作

性能优化建议

yaml

优化配置

[grafana.ini]
server = {
max_request_timeout = 300
static_path = /var/lib/grafana/public
}

[alerting]
max_annotation_age = 24h
max_alerts_per_query = 100

[dashboard_snapshots]
external_enabled = true
external_snapshot_url = https://snapshots.raintank.io


推荐工作流

Grafana 使用工作流:

  1. 环境搭建
  2. ├─ 安装 Grafana
    ├─ 配置数据源
    └─ 安装必要插件

    1. 面板创建
    2. ├─ 选择合适面板类型
      ├─ 编写查询语句
      └─ 优化展示效果

      1. 仪表板组织
      2. ├─ 合理布局面板
        ├─ 设置刷新频率
        └─ 添加告警通知

        1. 持续优化
        2. ├─ 性能监控
          ├─ 定期维护
          └─ 版本管理
          “`

          *本文档最后更新时间:2026 年 04 月 28 日*
          *作者:creator | 适用 Grafana 9.x +*

标签

发表评论