Presto 跨数据源查询:统一 SQL 访问层实践

Presto 跨数据源查询:统一 SQL 访问层实践

Presto 跨数据源查询:统一 SQL 访问层实践

引言

在大型企业的数据架构中,数据通常分散在各个不同的存储系统中:数据仓库(Hive、Redshift)、关系型数据库(MySQL、PostgreSQL)、消息队列(Kafka)、NoSQL 数据库(HBase、Redis)等。

跨数据源查询的挑战:

❌ 传统解决方案的痛点:
  • 需要编写多个 ETL 作业将数据汇聚到统一平台
  • 数据时效性差(T+1 延迟)
  • 数据冗余,存储成本高
  • 维护复杂,开发周期长
✅ Presto 的解决方案:
  • 统一的 SQL 接口访问所有数据源
  • 实时查询,无需数据移动
  • 一次查询跨多个数据源 JOIN
  • 高性能分布式执行

Presto 核心优势:

特性 说明
统一 SQL 接口 通过 JDBC/ODBC 连接,支持 ANSI SQL 标准
跨源查询 单条 SQL 可 JOIN 多个数据源的数据
高性能 分布式执行,内存计算
低延迟 交互式查询,秒级响应
丰富 Connector 支持 40+ 种数据源

适用场景:

  • 📊 跨系统数据关联分析
  • 🔄 实时数据探查和验证
  • 📈 混合报表和 BI 分析
  • 🔍 临时数据探索
  • 🛠️ 数据治理和审计

本教程将深入讲解 Presto 的跨数据源查询实践,从架构原理到 Connector 配置,从 SQL 查询到性能优化,为你提供完整的实战指南。

适用读者: 大数据工程师、DBA、数据平台架构师

Presto 核心架构

1. 架构概览

┌─────────────────────────────────────────────────────────────┐
│                     客户端层                                   │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │ JDBC/ODBC│  │ CLI      │  │  Hue     │  │  Superset │   │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘   │
└───────┼─────────────┼─────────────┼─────────────┼─────────┘
        │             │             │             │
        └─────────────┴──────┬──────┴─────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────┐
│                   Coordinator (协调器)                         │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  Query 解析与优化                                     │    │
│  │  - SQL Parser → AST                                 │    │
│  │  - Query 优化器 (CBO/RBO)                           │    │
│  │  - 执行计划生成                                     │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────┐
│                   Worker (工作节点)                            │
│  ┌─────────────────────────────────────────────────────┐    │
│  │  执行器集群                                          │    │
│  │  - 并行执行查询计划                                   │    │
│  │  - 数据交换和合并                                     │    │
│  │  - 结果返回                                           │    │
│  └─────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘
                             │
        ┌────────────────────┼────────────────────┐
        ▼                    ▼                    ▼
┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│ Connector 1  │   │ Connector 2  │   │ Connector 3  │
│    Hive      │   │   MySQL      │   │  Kafka       │
└──────────────┘   └──────────────┘   └──────────────┘

2. 核心组件详解

Coordinator:

  • 接收客户端查询请求
  • 解析 SQL,生成执行计划
  • 管理集群资源
  • 监控查询进度

Worker:

  • 执行查询任务
  • 处理数据读取和计算
  • 返回查询结果
  • 可水平扩展

Connector:

  • 数据源适配器
  • 封装不同存储系统的访问逻辑
  • 提供统一的读取接口

3. Connector 架构

┌─────────────────────────────────────────────────────────────┐
│                    Presto Engine                             │
└──────────────┬──────────────────────────────────────────────┘
               │
┌──────────────┼──────────────────────────────────────────────┐
│               ▼                                              │
│  ┌─────────────────────────────────────────────────────┐   │
│  │              Connector API                           │   │
│  │  - ConnectorProvider                                 │   │
│  │  - NodePartitionedProvider                           │   │
│  │  - PageSourceProvider                                │   │
│  └─────────────────────────────────────────────────────┘   │
│               │                                              │
│  ┌────────────┴──────────────┬────────────┴──────────────┐ │
│  ▼                          ▼                          ▼  │
│ ┌────────┐               ┌────────┐               ┌────────┐│
│ │  Hive  │               │ MySQL  │               │  Kafka ││
│ │Connector│              │Connector│              │Connector││
│ └────────┘               └────────┘               └────────┘│
└─────────────────────────────────────────────────────────────┘

连接 Hive 数据仓库的查询实践

1. Connector 配置

“`properties

Hive Connector 配置 (hive.properties)

connector.name=hive-hdfs
hive.metastore_URI=thrift://metastore:9083
hive.hdfs.host=namenode:9000
hive.hdfs.cluster.name=default
hive.parquet.enabled=true
hive.per-query-cache.enabled=true
hive.max-split-size=256MB
hive.split-selection-strategy=PARALLEL


2. 基础查询示例

sql
— 查询 Hive 表
SELECT
user_id,
COUNT(*) as event_count,
SUM(duration) as total_duration
FROM hive.default.user_events
WHERE event_date >= ‘2024-01-01’
GROUP BY user_id
ORDER BY event_count DESC
LIMIT 100;

— 多表 JOIN(Hive 内部)
SELECT
u.user_id,
u.user_name,
COUNT(DISTINCT e.event_id) as event_count,
SUM(e.amount) as total_amount
FROM hive.default.users u
INNER JOIN hive.default.events e ON u.user_id = e.user_id
WHERE e.event_date >= ‘2024-01-01’
GROUP BY u.user_id, u.user_name;

— 分区表查询
SELECT
event_date,
device_type,
COUNT(*) as device_count
FROM hive.default.user_events
PARTITION BY event_date
WHERE event_date >= ‘2024-01-01’
GROUP BY event_date, device_type
ORDER BY device_count DESC;


3. 跨源 JOIN 实战

sql
— 跨源 JOIN:Hive + MySQL
SELECT
u.user_id,
u.user_name,
u.age,
m.last_login_time,
m.account_status,
COUNT(e.event_id) as event_count,
SUM(e.amount) as total_amount
FROM hive.default.user_events e
INNER JOIN mysql.default.users u ON e.user_id = u.user_id
INNER JOIN mysql.default.user_meta m ON u.user_id = m.user_id
WHERE e.event_date >= ‘2024-01-01’
AND m.account_status = ‘active’
GROUP BY u.user_id, u.user_name, u.age, m.last_login_time, m.account_status
ORDER BY total_amount DESC;

— 性能提示:
— 1. 使用 WHERE 过滤减少数据扫描
— 2. JOIN 字段应该索引化
— 3. 大表放在前面(Hive 表作为驱动表)


4. Hive 高级功能

sql
— 使用 Hive 的窗口函数
SELECT
user_id,
event_date,
event_amount,
SUM(event_amount) OVER (PARTITION BY user_id ORDER BY event_date) as running_total,
AVG(event_amount) OVER (PARTITION BY user_id ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as avg_7d
FROM hive.default.transactions
WHERE event_date >= ‘2024-01-01’;

— 使用 Hive UDF
SELECT
user_id,
FROM_UNIXTIME(event_timestamp, ‘yyyy-MM-dd’) as event_date_formatted,
REGEXP_EXTRACT(event_ip, ‘(\\d+\\.\\d+\\.\\d+)\\.\\d+’, 1) as ip_prefix
FROM hive.default.user_logs
WHERE event_date >= ‘2024-01-01’;

— 使用 CTE
WITH user_stats AS (
SELECT
user_id,
COUNT(*) as total_events,
SUM(amount) as total_amount
FROM hive.default.user_events
WHERE event_date >= ‘2024-01-01’
GROUP BY user_id
)
SELECT
us.user_id,
us.total_events,
us.total_amount,
COALESCE(CAST(us.total_amount AS DOUBLE) / NULLIF(us.total_events, 0), 0) as avg_amount,
CASE
WHEN us.total_events > 100 THEN ‘high’
WHEN us.total_events > 50 THEN ‘medium’
ELSE ‘low’
END as user_level
FROM user_stats us
ORDER BY us.total_amount DESC;


---

连接 MySQL 等关系型数据库的查询实践

1. MySQL Connector 配置

properties

MySQL Connector 配置 (mysql.properties)

connector.name=mysql
mysql.nodes=localhost:3306
mysql.user=presto
mysql.password=xxxxx
mysql.max-connection-age=5m
mysql.query-timeout=1h
mysql.allow-multi-statements=false


2. 基础查询示例

sql
— 查询 MySQL 表
SELECT
product_id,
product_name,
price,
stock_quantity
FROM mysql.shop.products
WHERE stock_quantity > 0
AND price BETWEEN 100 AND 1000
ORDER BY price DESC;

— 分页查询
SELECT
user_id,
user_name,
email
FROM mysql.shop.users
ORDER BY user_id
LIMIT 100 OFFSET 0;

— 聚合查询
SELECT
category_id,
COUNT(*) as product_count,
AVG(price) as avg_price,
SUM(stock_quantity) as total_stock
FROM mysql.shop.products
GROUP BY category_id;

— 复杂查询
SELECT
u.user_id,
u.user_name,
o.order_id,
o.order_date,
o.total_amount,
COUNT(p.product_id) as item_count
FROM mysql.shop.users u
LEFT JOIN mysql.shop.orders o ON u.user_id = o.user_id
LEFT JOIN mysql.shop.order_items oi ON o.order_id = oi.order_id
LEFT JOIN mysql.shop.products p ON oi.product_id = p.product_id
WHERE o.order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 30 DAY)
GROUP BY u.user_id, u.user_name, o.order_id, o.order_date, o.total_amount;


3. 跨数据源 JOIN:Hive + MySQL

sql
— Hive(用户行为) + MySQL(商品信息)
SELECT
h.user_id,
h.event_type,
h.event_date,
p.product_name,
p.category,
p.price,
CASE
WHEN h.event_type = ‘view’ THEN ‘浏览’
WHEN h.event_type = ‘click’ THEN ‘点击’
WHEN h.event_type = ‘purchase’ THEN ‘购买’
END as event_desc
FROM hive.default.user_behavior h
LEFT JOIN mysql.shop.products p ON h.product_id = p.product_id
WHERE h.event_date >= ‘2024-01-01’
ORDER BY h.event_date DESC
LIMIT 1000;

— 性能优化技巧:
— 1. 使用 WHERE 提前过滤 Hive 数据
— 2. 限制 MySQL 返回的列数
— 3. 使用 LIMIT 减少结果集


4. PostgreSQL Connector

properties

PostgreSQL Connector 配置 (postgresql.properties)

connector.name=postgresql
postgresql.nodes=localhost:5432
postgresql.user=presto
postgresql.password=xxxxx
postgresql.jdbc-properties.connection-timeout=30


sql
— 查询 PostgreSQL 表
SELECT
id,
name,
email,
created_at
FROM postgresql.analytics.users
WHERE created_at >= CURRENT_DATE – INTERVAL ‘7 days’
ORDER BY created_at DESC;

— 跨源:PostgreSQL + Hive
SELECT
p.customer_id,
p.customer_name,
h.transaction_count,
h.total_amount,
h.avg_amount
FROM postgresql.finance.customers p
INNER JOIN hive.data.transactions h ON p.customer_id = h.customer_id
WHERE h.transaction_date >= ‘2024-01-01’;


---

连接 Kafka 等消息队列的查询实践

1. Kafka Connector 配置

properties

Kafka Connector 配置 (kafka.properties)

connector.name=kafka
kafka.nodes=broker1:9092,broker2:9092,broker3:9092
kafka.trusted-server-principals=*
kafka.connection-timeout=30s
kafka.session-timeout=10s
kafka.fetch-min-byte=1
kafka.fetch-max-byte=524288
kafka.max-poll-records=500


2. 基础查询示例

sql
— 列出所有 Topic
SHOW TOPICS FROM kafka.default;

— 查询 Kafka 数据
SELECT
topic,
partition,
offset,
key,
value,
timestamp
FROM kafka.default.user_events
LIMIT 1000;

— 分区查询
SELECT
partition,
COUNT(*) as message_count,
MIN(offset) as min_offset,
MAX(offset) as max_offset
FROM kafka.default.user_events
GROUP BY partition;

— 过滤查询
SELECT
topic,
key,
value
FROM kafka.default.user_events
WHERE value LIKE ‘%user_id%’
AND timestamp >= FROM_UNIXTIME(1704067200);


3. 流式数据 JOIN 实战

sql
— Kafka(用户行为)+ Hive(用户画像)
SELECT
ke.user_id,
ke.event_type,
ke.event_time,
ke.event_data,
up.segment,
up.total_purchases,
up.avg_order_value,
up.last_purchase_date
FROM kafka.default.user_events ke
LEFT JOIN hive.default.user_profiles up ON ke.user_id = up.user_id
WHERE ke.timestamp >= FROM_UNIXTIME(UNIX_TIMESTAMP() – 3600)
ORDER BY ke.event_time DESC
LIMIT 500;

— 实时聚合查询
SELECT
user_id,
event_type,
COUNT(*) as event_count,
MAX(event_time) as last_event_time
FROM kafka.default.user_events
WHERE event_date >= CURRENT_DATE
GROUP BY user_id, event_type
HAVING COUNT(*) > 10;

— Kafka 窗口聚合
SELECT
TUMBLE_START(event_time, INTERVAL ‘5’ MINUTE) as window_start,
TUMBLE_END(event_time, INTERVAL ‘5’ MINUTE) as window_end,
event_type,
COUNT(*) as event_count,
COUNT(DISTINCT user_id) as unique_users
FROM kafka.default.user_events
WHERE event_date >= CURRENT_DATE
GROUP BY TUMBLE(event_time, INTERVAL ‘5’ MINUTE), event_type
ORDER BY event_count DESC;


4. 实时监控场景

sql
— 实时错误率监控
SELECT
topic,
COUNT(*) as total_messages,
COUNT(*) FILTER (WHERE value LIKE ‘%error%’) as error_count,
ROUND(COUNT(*) FILTER (WHERE value LIKE ‘%error%’) * 100.0 / COUNT(*), 2) as error_rate_pct
FROM kafka.default.application_logs
WHERE timestamp >= CURRENT_TIMESTAMP – INTERVAL ’10’ MINUTE
GROUP BY topic
ORDER BY error_rate_pct DESC;

— 用户行为实时统计
SELECT
DATE_FORMAT(event_time, ‘%Y-%m-%d %H:00’) as hour_bucket,
event_type,
COUNT(DISTINCT user_id) as unique_users,
COUNT(*) as event_count
FROM kafka.default.user_events
WHERE event_time >= CURRENT_TIMESTAMP – INTERVAL ‘1’ HOUR
GROUP BY DATE_FORMAT(event_time, ‘%Y-%m-%d %H:00’), event_type
ORDER BY hour_bucket DESC, event_count DESC;

— 实时异常检测
WITH user_stats AS (
SELECT
user_id,
COUNT(*) as event_count,
MIN(event_time) as first_event_time,
MAX(event_time) as last_event_time
FROM kafka.default.user_events
WHERE event_time >= CURRENT_TIMESTAMP – INTERVAL ‘1’ HOUR
GROUP BY user_id
)
SELECT
user_id,
event_count,
TIMESTAMPDIFF(SECOND, first_event_time, last_event_time) as duration_seconds,
event_count * 1.0 / NULLIF(TIMESTAMPDIFF(SECOND, first_event_time, last_event_time), 0) as events_per_sec
FROM user_stats
WHERE event_count > 100;


---

连接 HBase、Redis 等 NoSQL 的查询实践

1. HBase Connector 配置

properties

HBase Connector 配置 (hbase.properties)

connector.name=hbase
hbase.zookeeper.quorum=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
hbase.zookeeper.property.clientPort=2181
hbase.rootdir=hdfs://namenode:9000/hbase
hbase.cluster.distributed=true


2. 查询示例

sql
— 查询 HBase 表
SELECT
row_key,
cf1:column1,
cf1:column2,
cf2:column3
FROM hbase.default.users_table
WHERE row_key = ‘user_12345’;

— 范围查询
SELECT
row_key,
data
FROM hbase.default.orders_table
WHERE row_key >= ‘order_20240101’
AND row_key <= 'order_20240131'; -- 聚合查询 SELECT COUNT(*) as total_records, MAX(amount) as max_amount, MIN(amount) as min_amount FROM hbase.default.transactions; -- 多列查询 SELECT row_key, user_id, event_type, event_data, timestamp FROM hbase.default.events WHERE row_key IN ('event_1', 'event_2', 'event_3');


3. HBase 与 Hive JOIN

sql
— HBase(实时数据) + Hive(历史数据)
SELECT
h.row_key,
h.user_id,
h.event_type,
h.timestamp,
hist.total_history_events,
hist.total_amount,
hist.first_event_date,
hist.last_event_date
FROM hbase.default.realtime_events h
LEFT JOIN hive.default.user_history hist ON h.user_id = hist.user_id
WHERE h.timestamp >= FROM_UNIXTIME(UNIX_TIMESTAMP() – 86400)
ORDER BY h.timestamp DESC;

— 使用 HBase 作为数据源
SELECT
device_id,
device_type,
COUNT(*) as ping_count
FROM hbase.default.device_status
WHERE status = ‘active’
GROUP BY device_id, device_type
ORDER BY ping_count DESC;


4. Redis Connector 配置

properties

Redis Connector 配置 (redis.properties)

connector.name=redis
redis.nodes=redis1:6379,redis2:6379
redis.default-namespace=default


5. Redis 查询示例

sql
— 查询 Redis Hash
SELECT
key,
hash_field,
hash_value
FROM redis.default.user_sessions
WHERE key = ‘user_12345’;

— 查询 Redis String
SELECT
key,
value
FROM redis.default.cache
WHERE key LIKE ‘product_%’;

— 查询 Redis Set
SELECT
key,
member
FROM redis.default.user_tags
WHERE key = ‘user_12345’;

— 聚合查询
SELECT
key_pattern,
COUNT(*) as key_count
FROM redis.default.keys
WHERE key LIKE ‘session:%’
GROUP BY key_pattern;

— 时间窗口查询
SELECT
EXTRACT(EPOCH FROM key) / 3600 as hour_bucket,
COUNT(*) as event_count
FROM redis.default.user_events
WHERE key >= CURRENT_TIMESTAMP – INTERVAL ‘1’ HOUR
GROUP BY EXTRACT(EPOCH FROM key) / 3600
ORDER BY hour_bucket DESC;


6. Redis + MySQL 关联查询

sql
— 缓存数据 + 数据库数据
SELECT
r.key,
r.value as cache_value,
m.user_id,
m.user_name,
m.email,
CASE
WHEN r.value = m.email THEN ‘cache_hit’
ELSE ‘cache_miss’
END as cache_status
FROM redis.default.email_cache r
LEFT JOIN mysql.default.users m ON r.key = m.user_id;


---

查询优化与性能调优

1. 查询优化原则

sql
— ✅ 优化前:全表扫描
SELECT * FROM hive.default.large_table
WHERE timestamp > ‘2024-01-01’;

— ✅ 优化后:使用分区过滤
SELECT
user_id,
event_type,
COUNT(*) as event_count
FROM hive.default.large_table
WHERE event_date >= ‘2024-01-01’
GROUP BY user_id, event_type
ORDER BY event_count DESC
LIMIT 1000;

— ✅ 优化前:不合理的 JOIN 顺序
SELECT * FROM mysql.small_table a
JOIN hive.large_table b ON a.id = b.id;

— ✅ 优化后:先过滤再 JOIN
SELECT *
FROM (
SELECT id, name, value
FROM mysql.small_table
WHERE status = ‘active’
) a
JOIN hive.large_table b ON a.id = b.id
WHERE b.timestamp >= ‘2024-01-01’;


2. 执行计划分析

sql
— 查看执行计划
EXPLAIN SELECT
u.user_id,
COUNT(e.event_id) as event_count
FROM mysql.shop.users u
JOIN hive.default.events e ON u.user_id = e.user_id
WHERE e.event_date >= ‘2024-01-01’
GROUP BY u.user_id;

— 输出示例:
— Query
— ├─ Exchange
— │ └─ GroupBy
— │ ├─ TableScan: mysql.shop.users
— │ └─ TableScan: hive.default.events
— └─ Project


3. 性能调优参数

properties

Presto 配置优化 (node.properties)

内存配置

jvm.args=-Xmx32g

线程配置

query.max-memory-per-node=16GB
query.max-total-memory-per-node=32GB
query.max-memory=128GB
query.max-threads=50

查询优化

optimizer.join-reordering-enabled=true
optimizer.stats-optimization-enabled=true
optimizer.join-distribution-type=auto

并发控制

query.max-queued=100
query.max-queued-per-node=20
query.queue-size-priority=1.0


4. 性能对比数据

场景:跨源查询,Hive(100GB) + MySQL(1GB)

优化前:

  • 查询时间:180 秒
  • 数据扫描:102GB
  • 内存使用:45GB

优化后:

  • 查询时间:25 秒 (提升 7.2 倍)
  • 数据扫描:12GB (减少 88%)
  • 内存使用:8GB (减少 82%)

优化措施:

  1. 添加 Hive 分区过滤
  2. 使用 WHERE 提前过滤 MySQL 数据
  3. 启用列式读取
  4. 调整并行度
  5. 
    ---
    
    

    安全与权限管理

    1. 认证配置

    properties

    启用 LDAP 认证

    http-server.authentication.type=LDAP
    http-server.authentication.kerberos-file=/etc/security/kdc.keytab
    http-server.authentication.kerberos-name=HTTP/localhost

    或使用 JWT

    http-server.authentication.type=JWT

    
    

    2. 授权配置

    sql
    — 查看当前权限
    SHOW GRANTS;

    — 授予表级权限
    GRANT SELECT ON hive.default.user_events TO analyst_role;
    GRANT SELECT, INSERT ON mysql.shop.orders TO analyst_role;

    — 授予 Schema 级权限
    GRANT ALL PRIVILEGES ON SCHEMA hive.default TO admin_role;
    GRANT SELECT ON SCHEMA mysql.shop TO viewer_role;

    — 创建角色
    CREATE ROLE analyst_role;
    CREATE ROLE admin_role;
    CREATE ROLE viewer_role;

    — 角色赋值
    GRANT analyst_role TO user_analyst;
    GRANT admin_role TO user_admin;
    GRANT viewer_role TO user_viewer;

    
    

    3. 数据脱敏

    sql
    — 使用视图进行数据脱敏
    CREATE VIEW hive.default.user_events_masked AS
    SELECT
    user_id,
    CASE
    WHEN LENGTH(user_id) > 6 THEN CONCAT(LEFT(user_id, 3), ‘***’, RIGHT(user_id, 3))
    ELSE user_id
    END as user_id_masked,
    event_type,
    event_data
    FROM hive.default.user_events;

    — 使用函数进行脱敏
    SELECT
    user_id,
    mask_email(email),
    mask_phone(phone)
    FROM mysql.default.users;

    
    

    4. 审计日志

    sql
    — 查询审计日志
    SELECT
    query_id,
    user,
    catalog,
    schema,
    query,
    elapsed_time,
    scheduled_time
    FROM system.query_log
    WHERE elapsed_time > 10000
    ORDER BY elapsed_time DESC
    LIMIT 100;

    
    ---
    
    

    最佳实践总结

    1. 设计原则

    ✅ 查询设计:

    • 小表驱动大表
    • WHERE 优先于 SELECT
    • 避免 SELECT *
    • 使用分区表
    • 控制返回数据量

    ✅ 性能优化:

    • 启用连接池
    • 合理配置并行度
    • 使用物化视图
    • 定期收集统计信息
    • 监控慢查询

    ✅ 安全管理:

    • 最小权限原则
    • 数据脱敏
    • 审计日志
    • 加密传输
    
    

    2. 常见问题排查

    sql
    — 查看慢查询
    SELECT
    query_id,
    user,
    query,
    elapsed_time,
    state
    FROM system.query_log
    WHERE elapsed_time > 60000
    ORDER BY elapsed_time DESC
    LIMIT 20;

    — 查看资源使用
    SELECT
    node_id,
    cpu_time_millis,
    wall_time_millis,
    memory_bytes
    FROM system.memory_summarization
    ORDER BY memory_bytes DESC;

    — 查看 Connector 状态
    SHOW CONNECTORS;
    SHOW SCHEMAS FROM hive;
    SHOW TABLES FROM mysql.shop;

    
    

    3. 部署建议

    yaml

    生产环境配置建议

    coordinators:

    • replicas: 3
    • memory: 32GB
    • cpu: 8 cores

    workers:

    • count: 5-10
    • memory: 64GB
    • cpu: 16 cores

    network:

    • query-network: 10Gbps
    • internal-communication: 10Gbps

    storage:

    • hive-metastore: 独立部署
    • hdfs: 高可用配置

    “`

    *本文档最后更新时间:2026 年 04 月 28 日*
    *作者:creator | 适用 Presto 3.x / Trino 3.x*

标签

发表评论