Ansible 自动化运维:100 台服务器的批量管理

Ansible 自动化运维:100 台服务器的批量管理
引言
在云计算和 DevOps 时代,自动化运维已成为提升效率的关键。Ansible 作为一款无代理的自动化工具,以其简单、强大的特性,成为管理大规模服务器集群的首选方案。
为什么选择 Ansible?
❌ Shell 脚本的痛点:
-
- 难以维护和扩展
-
- 缺乏幂等性保证
-
- 错误处理复杂
-
- 状态追踪困难
❌ 传统配置管理工具:
-
- 需要安装 Agent(如 Puppet、Chef)
-
- 学习曲线陡峭
-
- 架构复杂度高
✅ Ansible 的优势:
-
- 无代理架构,零额外部署
-
- 基于 SSH,无需额外端口
-
- 简单易学的 YAML 语法
-
- 强大的模块生态系统(2000+ 模块)
-
- 完全幂等操作
适用场景:
- 🚀 服务器批量配置管理
- 📦 应用批量部署
- 🔄 批量升级和补丁管理
- 📊 合规性检查和审计
- 🔧 日常运维自动化
适用读者: 系统运维工程师、DevOps 工程师、IT 管理员
—
Ansible 核心概念
1. 核心组件
┌─────────────────────────────────────────────────────────────┐
│ Ansible 核心概念 │
├─────────────────────────────────────────────────────────────┤
│ │
│ Inventory (清单) │
│ └─ 定义要管理的主机列表和分组 │
│ │ │
│ ├─ Modules (模块) │
│ │ └─ 具体操作单元(如 copy、file、service 等) │
│ │ │
│ ├─ Playbooks (剧本) │
│ │ └─ YAML 文件,定义执行任务序列 │
│ │ │
│ └─ Roles (角色) │
│ └─ 可复用的任务集合 │
│ │
└─────────────────────────────────────────────────────────────┘
2. Inventory 配置
# hosts - 主机清单
[webservers]
web01 ansible_host=192.168.1.10
web02 ansible_host=192.168.1.11
web03 ansible_host=192.168.1.12
[database]
db01 ansible_host=192.168.1.20
db02 ansible_host=192.168.1.21
[all:vars]
ansible_user=ops
ansible_port=22
ansible_python_interpreter=/usr/bin/python3
[webservers:vars]
http_port=80
max_clients=200
3. Inventory 分组结构
# 多层级分组
[all:children]
webservers
dbservers
appservers
[webservers]
web01
web02
web03
[appservers]
app01
app02
—
安装与配置
1. 安装 Ansible
# CentOS/RHEL
sudo yum install epel-release
sudo yum install ansible
Ubuntu/Debian
sudo apt update sudo apt install ansible
使用 pip 安装最新版本
pip install ansible
验证安装
ansible –version ansible-config list
2. SSH 密钥配置
# 生成 SSH 密钥
ssh-keygen -t ed25519 -C "ops@ansible"
分发公钥到所有主机
for host in web01 web02 web03 db01 db02; do ssh-copy-id ops@$host done
或者使用批量分发
ansible all -m copy -a “src=/root/.ssh/id_ed25519.pub dest=~/.ssh/authorized_keys”
3. 配置文件
“`yaml
ansible.cfg 配置示例
[defaults]
inventory = ./hosts
host_key_checking = False
remote_user = ops
timeout = 30
forks = 50
retry_files_enabled = False
stdout_callback = yaml
roles_path = ./roles
[inventory]
enable_plugins = host_list, script, yaml, ini
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
---
批量管理
1. 主机分组管理
yaml
按环境分组
[production]
web01 ansible_host=10.0.1.10
web02 ansible_host=10.0.1.11
web03 ansible_host=10.0.1.12
[staging]
web04 ansible_host=10.0.2.10
web05 ansible_host=10.0.2.11
[production:children]
webservers
loadbalancers
2. 变量定义
yaml
group_vars/webservers.yml
http_port: 80
max_connections: 200
ssl_enabled: true
ssl_certificate: /etc/ssl/nginx.crt
ssl_key: /etc/ssl/nginx.key
host_vars/web01.yml
nginx_worker_processes: 4
nginx_cache_size: 500m
3. 模板配置
yaml
templates/nginx.conf.j2
server {
listen {{ http_port }};
server_name {{ ansible_fqdn }};
worker_processes {{ nginx_worker_processes | default(2) }};
location / {
proxy_pass http://localhost:{{ application_port | default(3000) }};
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
{% if ssl_enabled %}
ssl_certificate {{ ssl_certificate }};
ssl_key {{ ssl_key }};
{% endif %}
}
---
批量任务
1. 批量执行命令
yaml
执行单条命令
ansible all -m command -a “uptime”
ansible webservers -m shell -a “df -h”
使用模块执行
ansible all -m ping
ansible webservers -m copy -a “src=file.txt dest=/tmp/”
2. 批量配置文件
yaml
playbook.yml
- name: 配置 Nginx
hosts: webservers
become: yes
vars:
http_port: 80
tasks:
- name: 安装 Nginx
apt:
name: nginx
state: present
when: ansible_os_family == “Debian”
- name: 复制配置文件
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
mode: ‘0644’
notify: restart nginx
- name: 启动 Nginx
service:
name: nginx
state: started
enabled: yes
handlers:
- name: restart nginx
service:
name: nginx
state: restarted
3. 批量软件安装
yaml
- name: 批量安装常用软件
hosts: all
become: yes
vars:
packages:
- vim
- htop
- net-tools
- git
- curl
tasks:
- name: 安装软件包
package:
name: “{{ packages }}”
state: present
- name: 验证安装
command: “{{ item }} –version”
register: version_output
loop: “{{ packages }}”
failed_when: false
- name: 显示版本信息
debug:
var: version_output.stdout
---
安全管理
1. SSH 密钥管理
yaml
使用 SSH 密钥认证
- name: 使用 SSH 密钥
hosts: webservers
connection: ssh
user: ops
private_key_file: ~/.ssh/id_ed25519
become: yes
become_user: root
2. Vault 加密
bash
创建加密文件
ansible-vault create secrets.yml
编辑加密文件
ansible-vault edit secrets.yml
运行加密 Playbook
ansible-playbook playbook.yml –ask-vault-pass
使用密码文件
ansible-playbook playbook.yml –vault-password-file=.vault_pass
yaml
secrets.yml (加密内容)
db_password: “s3cur3p@ssw0rd”
api_key: “sk-1234567890abcdef”
jwt_secret: “your-jwt-secret-here”
3. 权限控制
yaml
playbook.yml 权限控制
- name: 受限管理
hosts: webservers
become: yes
vars_files:
- secrets.yml
tasks:
- name: 只读操作(无需特权)
ansible.builtin.shell: cat /etc/hostname
changed_when: false
check_mode: yes
- name: 系统配置(需要特权)
ansible.builtin.copy:
content: “privileged content”
dest: /etc/privileged.conf
mode: ‘0600’
become: yes
---
实战案例
1. Web 服务器批量部署
yaml
- name: Web 服务器批量部署
hosts: webservers
become: yes
gather_facts: yes
vars:
app_user: www-data
app_group: www-data
app_dir: /var/www/myapp
tasks:
- name: 安装依赖包
apt:
name:
- nginx
- python3
- python3-pip
- build-essential
state: present
- name: 创建应用用户
user:
name: “{{ app_user }}”
group: “{{ app_group }}”
shell: /bin/bash
create_home: no
- name: 创建应用目录
file:
path: “{{ app_dir }}”
state: directory
owner: “{{ app_user }}”
group: “{{ app_group }}”
mode: ‘0755’
- name: 部署应用代码
synchronize:
src: ./app/
dest: “{{ app_dir }}/”
owner: “{{ app_user }}”
group: “{{ app_group }}”
archive: true
- name: 配置 Nginx
template:
src: templates/nginx.conf.j2
dest: /etc/nginx/nginx.conf
owner: root
group: root
mode: ‘0644’
notify: restart nginx
- name: 测试 Nginx 配置
command: nginx -t
changed_when: false
- name: 启用网站
command: ln -s /etc/nginx/sites-available/default /etc/nginx/sites-enabled/
args:
creates: /etc/nginx/sites-enabled/default
handlers:
- name: restart nginx
service:
name: nginx
state: restarted
daemon_reload: yes
2. 数据库服务器批量配置
yaml
- name: 数据库服务器配置
hosts: database
become: yes
vars:
mysql_root_password: “{{ vault_mysql_root_password }}”
mysql_datadir: /var/lib/mysql
mysql_bind_address: 127.0.0.1
tasks:
- name: 安装 MySQL
apt:
name:
- mysql-server
- mysql-client
- python3-mysqldb
state: present
- name: 配置 MySQL
template:
src: templates/mysql.cnf.j2
dest: /etc/mysql/mysql.conf.d/server.cnf
owner: root
group: root
mode: ‘0644’
notify: restart mysql
- name: 设置 root 密码
mysql_user:
name: root
host: localhost
password: “{{ mysql_root_password }}”
when: ansible_os_family == “Debian”
- name: 创建数据库
mysql_db:
name: “{{ item }}”
state: present
collation: utf8_general_ci
encoding: utf8
loop:
- production_db
- staging_db
- name: 创建应用用户
mysql_user:
name: “{{ item.user }}”
password: “{{ item.password }}”
priv: “*.*:ALL”
host: “%”
state: present
loop:
- { user: ‘app_user’, password: “{{ vault_app_password }}” }
- { user: ‘readonly_user’, password: “{{ vault_readonly_password }}” }
handlers:
- name: restart mysql
service:
name: mysql
state: restarted
---
最佳实践
1. Playbook 组织
yaml
project/
├── ansible.cfg
├── inventory/
│ ├── hosts
│ ├── group_vars/
│ │ ├── all.yml
│ │ │ └── webservers.yml
│ └── host_vars/
│ └── web01.yml
├── playbooks/
│ ├── deploy.yml
│ ├── upgrade.yml
│ └── cleanup.yml
├── roles/
│ ├── nginx/
│ │ ├── tasks/
│ │ ├── handlers/
│ │ ├── templates/
│ │ ├── vars/
│ │ └── defaults/
│ └── mysql/
└── vault/
└── secrets.yml
2. CI/CD 集成
yaml
.github/workflows/ansible.yml
name: Ansible CI
on:
push:
paths:
- ‘ansible/**’
pull_request:
paths:
- ‘ansible/**’
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Ansible
run: pip install ansible-lint ansible-vault
- name: Lint Playbooks
run: ansible-lint playbooks/
- name: Syntax Check
run: ansible-playbook playbooks/*.yml –syntax-check
- name: Check Vault Files
run: ansible-vault view vault/secrets.yml
3. 错误处理
yaml
- name: 错误处理示例
hosts: all
tasks:
- name: 执行可能失败的操作
shell: risky_command
ignore_errors: yes
register: result
- name: 检查错误
fail:
msg: “操作失败: {{ result.msg }}”
when: result.failed
- name: 执行重试逻辑
shell: recovery_command
when: result.failed
- name: 记录错误
debug:
msg: “发生错误:{{ result }}”
when: result.failed
- name: 发送通知
shell: “notify_admin ‘{{ inventory_hostname }} – {{ result.msg }}'”
when: result.failed and ansible_run_tags != [‘skip_notify’]
---
总结
核心要点回顾
✅ Ansible 优势:无代理、SSH 管理、YAML 简单 ✅ 核心概念:Inventory、Modules、Playbooks、Roles ✅ 批量管理:主机分组、变量、模板 ✅ 安全管理:SSH 密钥、Vault、权限控制 ✅ 最佳实践:Playbook 组织、CI/CD、错误处理
性能优化建议
yaml
优化配置
[defaults]
forks = 50 # 并发数
timeout = 30
retry_files_enabled = False
[ssh_connection]
pipelining = True
control_path = /tmp/ansible-ssh-%%h-%%p-%%r
推荐工作流
Ansible 工作流:
-
- 环境准备
├─ 安装 Ansible
├─ 配置 SSH 密钥
└─ 创建 Inventory
-
-
- 开发 Playbook
-
├─ 编写任务
├─ 创建 Roles
└─ 测试验证
-
-
-
- 部署执行
-
-
├─ 干运行 (–check)
├─ 分批部署
└─ 验证结果
-
-
-
-
- 持续维护
-
-
-
├─ 版本管理
├─ 文档更新
└─ 定期审计
“`



发表评论