diff --git a/deployments/observability/README.md b/deployments/observability/README.md new file mode 100644 index 0000000..4d9602b --- /dev/null +++ b/deployments/observability/README.md @@ -0,0 +1,174 @@ +# chat-deploy 可观测性部署指南 + +本目录包含将 chat-deploy 接入 itom-platform 可观测中心所需的配置文件。 + +## 功能说明 + +- **日志采集**:通过 Promtail 采集 Docker 容器日志,推送到 itom-platform 的 Loki +- **MongoDB 监控**:通过 MongoDB Exporter 采集 MongoDB 指标 +- **Redis 监控**(可选):通过 Redis Exporter 采集 Redis 指标 +- **系统监控**(可选):通过 Node Exporter 采集系统级指标(CPU/Memory/Disk/Network) + +## 快速开始 + +### 1. 配置环境变量 + +复制示例配置文件并修改: + +```bash +cp config.env.example config.env +``` + +**必须修改的配置项**: + +| 配置项 | 说明 | 示例 | +|--------|------|------| +| `OBS_HOST` | itom-platform 可观测中心地址 | `192.168.1.100` 或 `obs.example.com` | +| `OBS_AUTH_TOKEN` | 鉴权 token(与中心侧一致) | `your-secret-token` | +| `MONGODB_URI` | MongoDB 连接 URI | `mongodb://user:pass@host:27017/db` | + +### 2. 启动日志采集 + +仅启用日志采集: + +```bash +docker compose --env-file config.env -f docker-compose-observability.yaml up -d +``` + +### 3. 启动完整可观测性(日志 + 指标) + +同时启用日志和指标采集: + +```bash +docker compose --env-file config.env -f docker-compose-observability.yaml --profile metrics up -d +``` + +### 4. 验证服务状态 + +```bash +# 查看服务状态 +docker compose --env-file config.env -f docker-compose-observability.yaml ps + +# 查看 Promtail 日志 +docker logs chat-deploy-promtail + +# 查看 MongoDB Exporter 指标 +curl http://localhost:9216/metrics | head -50 +``` + +## 配置说明 + +### MongoDB 配置 + +MongoDB Exporter 需要正确的连接 URI。根据 `config/mongodb.yml` 中的配置: + +```env +# 默认配置 +MONGODB_URI=mongodb://openIM:openIM123@localhost:37017/openim_v3?authSource=openim_v3 +``` + +如果 MongoDB 运行在 Docker 容器中,请使用容器名或宿主机 IP: + +```env +# 使用容器名(需要在同一网络) +MONGODB_URI=mongodb://openIM:openIM123@mongo:27017/openim_v3?authSource=openim_v3 + +# 使用宿主机 IP +MONGODB_URI=mongodb://openIM:openIM123@host.docker.internal:37017/openim_v3?authSource=openim_v3 +``` + +### Redis 配置(可选) + +如果需要监控 Redis: + +```env +REDIS_ADDR=localhost:6379 +REDIS_PASSWORD=your_redis_password +``` + +### 网络配置 + +如果 chat-deploy 的服务运行在其他 Docker 网络中,需要将可观测性组件加入该网络。 + +编辑 `docker-compose-observability.yaml`,添加外部网络: + +```yaml +networks: + chat-deploy-obs: + driver: bridge + # 添加外部网络 + chat-deploy-network: + external: true + +services: + mongodb-exporter: + networks: + - chat-deploy-obs + - chat-deploy-network # 加入 chat-deploy 网络 +``` + +## 监控面板 + +在 itom-platform 的 Grafana 中,使用 `chat-deploy-dashboard` 查看: + +- 服务健康状态 +- Go 运行时指标(CPU、内存、GC) +- MongoDB 操作统计(query/insert/update/delete) +- MongoDB 连接数 +- Redis 指标(如已配置) +- 应用日志 + +## 故障排查 + +### 日志未采集 + +1. 检查 Promtail 是否正常运行: + ```bash + docker logs chat-deploy-promtail + ``` + +2. 确认 LOKI_URL 配置正确: + ```bash + echo $LOKI_URL + # 应输出类似:http://192.168.1.100/loki/api/v1/push + ``` + +3. 测试网络连通性: + ```bash + curl -v http:///loki/api/v1/push + ``` + +### MongoDB 指标缺失 + +1. 检查 MongoDB Exporter 是否正常运行: + ```bash + docker logs chat-deploy-mongodb-exporter + ``` + +2. 验证 MongoDB 连接: + ```bash + curl http://localhost:9216/metrics | grep mongodb_up + # 应输出:mongodb_up 1 + ``` + +3. 确认 MongoDB URI 格式正确且网络可达 + +### 指标未推送到中心 + +1. 检查 Prometheus Agent 日志: + ```bash + docker logs chat-deploy-prometheus-agent + ``` + +2. 确认 remote_write URL 配置正确 + +3. 检查网络防火墙是否允许访问中心端口 + +## 更新配置 + +修改 `config.env` 后,重启服务: + +```bash +docker compose --env-file config.env -f docker-compose-observability.yaml down +docker compose --env-file config.env -f docker-compose-observability.yaml --profile metrics up -d +``` diff --git a/deployments/observability/config.env.example b/deployments/observability/config.env.example new file mode 100644 index 0000000..1e523ab --- /dev/null +++ b/deployments/observability/config.env.example @@ -0,0 +1,71 @@ +# ============================== +# chat-deploy 可观测性配置 +# ============================== + +# 项目标识(必须与 itom-platform 中配置一致) +OBS_PROJECT=chat-deploy +OBS_SERVICE=chat-deploy +OBS_SERVICE_NAME=chat-deploy +OBS_ENV=prod + +# ============================== +# 中心侧连接配置(必须修改) +# ============================== +# itom-platform 可观测中心的地址 +OBS_HOST=CHANGE_ME +OBS_SCHEME=http + +# 鉴权 token(必须与 itom-platform 中心侧一致) +OBS_AUTH_TOKEN=CHANGE_ME + +# ============================== +# 日志采集配置(Promtail) +# ============================== +# Loki 写入 URL(走网关) +LOKI_URL=${OBS_SCHEME}://${OBS_HOST}/loki/api/v1/push + +# Promtail 镜像版本 +PROMTAIL_IMAGE=grafana/promtail:3.0.0 + +# Docker API 版本(避免与旧 Docker daemon 协议不兼容) +DOCKER_API_VERSION=1.44 + +# ============================== +# 指标采集配置(Prometheus Agent) +# ============================== +# 中心侧 remote_write receiver 地址 +METRICS_REMOTE_WRITE_URL=${OBS_SCHEME}://${OBS_HOST}/api/v1/write + +# 中心侧是否开启 remote_write 鉴权 +OBS_AUTH_ENABLE=false + +# 业务服务指标采集目标(可选,格式:name=host:port,name2=host:port) +METRICS_TARGETS= + +# ============================== +# MongoDB 配置(必须修改) +# ============================== +# MongoDB 连接 URI(用于 mongodb-exporter) +# 格式:mongodb://username:password@host:port/database?authSource=admin +MONGODB_URI=mongodb://openIM:openIM123@localhost:37017/openim_v3?authSource=openim_v3 + +# MongoDB Exporter 采集目标(自动配置,通常无需修改) +MONGODB_EXPORTER_TARGETS=mongodb-exporter:9216 +MONGODB_EXPORTER_SERVICE=chat-deploy + +# ============================== +# Redis 配置(可选) +# ============================== +# Redis 地址(用于 redis-exporter) +REDIS_ADDR=localhost:6379 +REDIS_PASSWORD= + +# Redis Exporter 采集目标 +REDIS_EXPORTER_TARGETS=redis-exporter:9121 +REDIS_EXPORTER_SERVICE=chat-deploy + +# ============================== +# Node Exporter 配置(可选) +# ============================== +NODE_EXPORTER_TARGETS=node-exporter:9100 +NODE_EXPORTER_SERVICE=chat-deploy diff --git a/deployments/observability/config/prometheus-agent-entrypoint.sh b/deployments/observability/config/prometheus-agent-entrypoint.sh new file mode 100644 index 0000000..289482e --- /dev/null +++ b/deployments/observability/config/prometheus-agent-entrypoint.sh @@ -0,0 +1,236 @@ +#!/usr/bin/env sh +set -eu + +# ------------------------------ +# chat-deploy 指标采集:Prometheus Agent Entrypoint +# - 根据环境变量生成 /prometheus/prometheus.yml 与 /prometheus/targets.json +# - 然后以 agent 模式启动 Prometheus(remote_write 推送到 itom-platform 中心) +# +# 关键环境变量(来自 config.env): +# - METRICS_REMOTE_WRITE_URL=http(s):///api/v1/write +# - METRICS_TARGETS=name=host:port,name2=host:port +# - OBS_AUTH_ENABLE=false/true(中心侧是否要求鉴权) +# - OBS_AUTH_TOKEN=xxxxx(当 OBS_AUTH_ENABLE=true 时必填) +# - OBS_PROJECT/OBS_ENV:写入 labels,便于中心侧筛选 +# ------------------------------ + +METRICS_REMOTE_WRITE_URL="${METRICS_REMOTE_WRITE_URL:-}" +OBS_AUTH_ENABLE="${OBS_AUTH_ENABLE:-false}" +OBS_AUTH_TOKEN="${OBS_AUTH_TOKEN:-}" +METRICS_TARGETS="${METRICS_TARGETS:-}" + +OBS_PROJECT="${OBS_PROJECT:-chat-deploy}" +OBS_ENV="${OBS_ENV:-prod}" +OBS_SERVICE="${OBS_SERVICE:-chat-deploy}" +OBS_SERVICE_NAME="${OBS_SERVICE_NAME:-$OBS_SERVICE}" +if [ -z "$OBS_SERVICE_NAME" ]; then + OBS_SERVICE_NAME="chat-deploy" +fi + +is_truthy() { case "$1" in 1|true|TRUE|yes|YES|on|ON) return 0 ;; esac; return 1; } + +if [ -z "$METRICS_REMOTE_WRITE_URL" ]; then + echo "[prometheus-agent] FAIL: METRICS_REMOTE_WRITE_URL 为空" >&2 + exit 2 +fi + +if is_truthy "$OBS_AUTH_ENABLE" && [ -z "$OBS_AUTH_TOKEN" ]; then + echo "[prometheus-agent] FAIL: OBS_AUTH_ENABLE=true 但 OBS_AUTH_TOKEN 为空" >&2 + exit 2 +fi + +# 确保数据目录存在且有写权限 +mkdir -p /prometheus/data +chmod -R 777 /prometheus 2>/dev/null || true + +TARGETS_JSON="/prometheus/targets.json" +echo "[" > "$TARGETS_JSON" + +first=1 +IFS=',' +for item in $METRICS_TARGETS; do + item="$(printf "%s" "$item" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')" + [ -z "$item" ] && continue + + name="$(printf "%s" "$item" | cut -d= -f1 | tr -d '[:space:]')" + target="$(printf "%s" "$item" | cut -d= -f2- | tr -d '[:space:]')" + [ -z "$name" ] && continue + [ -z "$target" ] && continue + + target="$(printf "%s" "$target" | sed -E 's#^https?://##; s#/.*$##')" + + if [ $first -eq 1 ]; then first=0; else echo "," >> "$TARGETS_JSON"; fi + cat >> "$TARGETS_JSON" <> "$TARGETS_JSON" + +if [ "$first" -eq 1 ]; then + echo "[prometheus-agent] WARN: METRICS_TARGETS 为空或格式无效,仅采集 prometheus-agent 自身指标和 Exporter 指标" >&2 +fi + +CONFIG="/prometheus/prometheus.yml" +cat > "$CONFIG" <> "$CONFIG" +echo " # 自动发现/手动指定的 Exporter 指标" >> "$CONFIG" + +trim() { + printf "%s" "$1" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//' +} + +normalize_target() { + raw="$1" + default_port="$2" + raw="$(printf "%s" "$raw" | sed -E 's#^https?://##; s#/.*$##')" + raw="$(trim "$raw")" + [ -z "$raw" ] && return 1 + case "$raw" in + *:*) printf "%s" "$raw" ;; + *) printf "%s:%s" "$raw" "$default_port" ;; + esac +} + +append_exporter_job() { + job="$1" + targets="$2" + default_port="$3" + service_label="$4" + targets_list="" + + IFS=',' + for item in $targets; do + item="$(normalize_target "$item" "$default_port" || true)" + [ -z "$item" ] && continue + if [ -z "$targets_list" ]; then + targets_list="$item" + else + targets_list="${targets_list},$item" + fi + done + unset IFS + + if [ -z "$targets_list" ]; then + return 0 + fi + + echo " - job_name: '$job'" >> "$CONFIG" + echo " static_configs:" >> "$CONFIG" + echo " - targets:" >> "$CONFIG" + IFS=',' + for item in $targets_list; do + echo " - '$item'" >> "$CONFIG" + done + unset IFS + cat >> "$CONFIG" </dev/null 2>&1; then + REDIS_EXPORTER_HOST="redis-exporter" +elif getent hosts chat-deploy-redis-exporter >/dev/null 2>&1; then + REDIS_EXPORTER_HOST="chat-deploy-redis-exporter" +fi +if [ -n "$REDIS_EXPORTER_HOST" ]; then + if [ -n "$REDIS_EXPORTER_TARGETS" ]; then + REDIS_EXPORTER_TARGETS="${REDIS_EXPORTER_TARGETS},${REDIS_EXPORTER_HOST}:9121" + else + REDIS_EXPORTER_TARGETS="${REDIS_EXPORTER_HOST}:9121" + fi +fi +append_exporter_job "redis" "$REDIS_EXPORTER_TARGETS" "9121" "$REDIS_EXPORTER_SERVICE" +if [ -n "$REDIS_EXPORTER_TARGETS" ]; then + echo "[prometheus-agent] Redis Exporter 采集目标已配置(project=${OBS_PROJECT} service=${REDIS_EXPORTER_SERVICE})" +fi + +# MongoDB Exporter(端口 9216)- chat-deploy 使用 MongoDB +MONGODB_EXPORTER_HOST="" +MONGODB_EXPORTER_TARGETS="${MONGODB_EXPORTER_TARGETS:-}" +MONGODB_EXPORTER_SERVICE="${MONGODB_EXPORTER_SERVICE:-$OBS_SERVICE_NAME}" +if getent hosts mongodb-exporter >/dev/null 2>&1; then + MONGODB_EXPORTER_HOST="mongodb-exporter" +elif getent hosts chat-deploy-mongodb-exporter >/dev/null 2>&1; then + MONGODB_EXPORTER_HOST="chat-deploy-mongodb-exporter" +fi +if [ -n "$MONGODB_EXPORTER_HOST" ]; then + if [ -n "$MONGODB_EXPORTER_TARGETS" ]; then + MONGODB_EXPORTER_TARGETS="${MONGODB_EXPORTER_TARGETS},${MONGODB_EXPORTER_HOST}:9216" + else + MONGODB_EXPORTER_TARGETS="${MONGODB_EXPORTER_HOST}:9216" + fi +fi +append_exporter_job "mongodb" "$MONGODB_EXPORTER_TARGETS" "9216" "$MONGODB_EXPORTER_SERVICE" +if [ -n "$MONGODB_EXPORTER_TARGETS" ]; then + echo "[prometheus-agent] MongoDB Exporter 采集目标已配置(project=${OBS_PROJECT} service=${MONGODB_EXPORTER_SERVICE})" +fi + +# Node Exporter(端口 9100)- 用于系统级指标(CPU/Memory/Disk/Network/IO) +NODE_EXPORTER_HOST="" +NODE_EXPORTER_TARGETS="${NODE_EXPORTER_TARGETS:-}" +NODE_EXPORTER_SERVICE="${NODE_EXPORTER_SERVICE:-$OBS_SERVICE_NAME}" +if getent hosts node-exporter >/dev/null 2>&1; then + NODE_EXPORTER_HOST="node-exporter" +elif getent hosts chat-deploy-node-exporter >/dev/null 2>&1; then + NODE_EXPORTER_HOST="chat-deploy-node-exporter" +fi +if [ -n "$NODE_EXPORTER_HOST" ]; then + if [ -n "$NODE_EXPORTER_TARGETS" ]; then + NODE_EXPORTER_TARGETS="${NODE_EXPORTER_TARGETS},${NODE_EXPORTER_HOST}:9100" + else + NODE_EXPORTER_TARGETS="${NODE_EXPORTER_HOST}:9100" + fi +fi +append_exporter_job "node" "$NODE_EXPORTER_TARGETS" "9100" "$NODE_EXPORTER_SERVICE" +if [ -n "$NODE_EXPORTER_TARGETS" ]; then + echo "[prometheus-agent] Node Exporter 采集目标已配置(project=${OBS_PROJECT} service=${NODE_EXPORTER_SERVICE})" +fi + +cat >> "$CONFIG" <> "$CONFIG" + echo "[prometheus-agent] remote_write 鉴权已启用" >&2 +else + echo "[prometheus-agent] remote_write 鉴权未启用" >&2 +fi + +echo "[prometheus-agent] 配置文件已生成:" +cat "$CONFIG" +echo "" + +# Prometheus 3.x 不再需要 --enable-feature=agent +exec /bin/prometheus --config.file=/prometheus/prometheus.yml --storage.tsdb.path=/prometheus/data --web.enable-lifecycle diff --git a/deployments/observability/config/promtail.yaml b/deployments/observability/config/promtail.yaml new file mode 100644 index 0000000..d386ac5 --- /dev/null +++ b/deployments/observability/config/promtail.yaml @@ -0,0 +1,61 @@ +server: + http_listen_port: 9080 + grpc_listen_port: 0 + +positions: + filename: /tmp/positions.yaml + +clients: + - url: ${LOKI_URL} + bearer_token: ${OBS_AUTH_TOKEN} + +scrape_configs: + # ============================================ + # chat-deploy 业务层日志采集 + # ============================================ + - job_name: chat-deploy-logs + static_configs: + - targets: + - localhost + labels: + job: chat-deploy-logs + project: ${OBS_PROJECT} + service: ${OBS_SERVICE} + log_layer: business + __path__: /var/lib/docker/containers/*/*-json.log + + pipeline_stages: + # 解析 Docker JSON 日志格式 + - docker: {} + + # 从文件路径提取容器 ID + - regex: + source: filename + expression: '/var/lib/docker/containers/(?P[0-9a-f]{12})[0-9a-f]*/' + - labels: + container_id: + + # ============================================ + # 过滤规则:只丢弃基础设施层日志 + # 保留 chat-deploy 业务层日志(包括Go服务的caller日志) + # ============================================ + + # 1. 丢弃 promtail/prometheus/grafana 等基础设施组件的内部日志 + - drop: + expression: 'caller=.*(promtail|prometheus|grafana|loki|exporter).*\.go:[0-9]+' + drop_counter_reason: infra_go_internal_log + + # 2. 丢弃基础设施组件日志(prometheus/grafana/exporter等) + - drop: + expression: '(component=|target=|scrape_pool=|instance=.*exporter)' + drop_counter_reason: infrastructure_component + + # 3. 丢弃 Docker 服务发现和 API 错误日志 + - drop: + expression: '(docker_discovery|Unable to refresh target groups|client version.*is too old|Minimum supported API version)' + drop_counter_reason: docker_api_error + + # 4. 丢弃文件监控事件日志(promtail 内部) + - drop: + expression: '(file watcher event|filetargetmanager|fsnotify)' + drop_counter_reason: file_watcher diff --git a/deployments/observability/docker-compose-observability.yaml b/deployments/observability/docker-compose-observability.yaml new file mode 100644 index 0000000..7db0c26 --- /dev/null +++ b/deployments/observability/docker-compose-observability.yaml @@ -0,0 +1,135 @@ +# chat-deploy 可观测性组件部署配置 +# 用于向 itom-platform 可观测中心推送日志和指标 +# +# 使用方法: +# 1. 复制 config.env.example 为 config.env 并修改配置 +# 2. 启动服务:docker compose --env-file config.env -f docker-compose-observability.yaml up -d +# 3. 如需启用指标采集:docker compose --env-file config.env -f docker-compose-observability.yaml --profile metrics up -d + +services: + # ============================== + # 日志采集(Promtail) + # 采集 Docker 容器日志并推送到 itom-platform 的 Loki + # ============================== + promtail: + image: "${PROMTAIL_IMAGE:-grafana/promtail:3.0.0}" + container_name: chat-deploy-promtail + restart: always + user: "0" + command: ["-config.file=/etc/promtail/config.yml", "-config.expand-env=true"] + # 禁用 promtail 自身的日志输出到 Docker,避免日志循环 + logging: + driver: "none" + environment: + - LOKI_URL=${LOKI_URL} + - OBS_AUTH_TOKEN=${OBS_AUTH_TOKEN} + - OBS_PROJECT=${OBS_PROJECT} + - OBS_SERVICE=${OBS_SERVICE} + - DOCKER_API_VERSION=${DOCKER_API_VERSION:-1.44} + volumes: + - /var/lib/docker/containers:/var/lib/docker/containers:ro + - /var/run/docker.sock:/var/run/docker.sock:ro + - ./config/promtail.yaml:/etc/promtail/config.yml:ro + networks: + - chat-deploy-obs + + # ============================== + # 指标采集(Prometheus Agent Remote Write) + # 可选,需要 --profile metrics 启用 + # ============================== + prometheus-agent: + profiles: ["metrics"] + image: prom/prometheus:latest + container_name: chat-deploy-prometheus-agent + restart: always + user: "0" + command: + - "--config.file=/prometheus/prometheus.yml" + - "--storage.tsdb.path=/prometheus" + environment: + - METRICS_REMOTE_WRITE_URL=${METRICS_REMOTE_WRITE_URL} + - METRICS_TARGETS=${METRICS_TARGETS} + - OBS_AUTH_ENABLE=${OBS_AUTH_ENABLE:-false} + - OBS_AUTH_TOKEN=${OBS_AUTH_TOKEN} + - OBS_PROJECT=${OBS_PROJECT} + - OBS_SERVICE=${OBS_SERVICE} + - OBS_SERVICE_NAME=${OBS_SERVICE_NAME} + - OBS_ENV=${OBS_ENV:-prod} + - REDIS_EXPORTER_TARGETS=${REDIS_EXPORTER_TARGETS} + - MONGODB_EXPORTER_TARGETS=${MONGODB_EXPORTER_TARGETS} + - NODE_EXPORTER_TARGETS=${NODE_EXPORTER_TARGETS} + - REDIS_EXPORTER_SERVICE=${REDIS_EXPORTER_SERVICE} + - MONGODB_EXPORTER_SERVICE=${MONGODB_EXPORTER_SERVICE} + - NODE_EXPORTER_SERVICE=${NODE_EXPORTER_SERVICE} + volumes: + - prometheus_agent_data:/prometheus + - ./config/prometheus-agent-entrypoint.sh:/etc/prometheus/entrypoint.sh:ro + entrypoint: ["/bin/sh", "/etc/prometheus/entrypoint.sh"] + networks: + - chat-deploy-obs + depends_on: + - mongodb-exporter + + # ============================== + # MongoDB Exporter + # 采集 MongoDB 指标,需要 --profile metrics 启用 + # ============================== + mongodb-exporter: + profiles: ["metrics"] + image: percona/mongodb_exporter:0.40.0 + container_name: chat-deploy-mongodb-exporter + restart: always + command: + - "--mongodb.uri=${MONGODB_URI}" + - "--compatible-mode" + - "--collect-all" + environment: + - MONGODB_URI=${MONGODB_URI} + ports: + - "9216:9216" + networks: + - chat-deploy-obs + + # ============================== + # Redis Exporter(可选) + # 采集 Redis 指标,需要 --profile metrics 启用 + # ============================== + redis-exporter: + profiles: ["metrics"] + image: oliver006/redis_exporter:latest + container_name: chat-deploy-redis-exporter + restart: always + environment: + - REDIS_ADDR=${REDIS_ADDR} + - REDIS_PASSWORD=${REDIS_PASSWORD} + ports: + - "9121:9121" + networks: + - chat-deploy-obs + + # ============================== + # Node Exporter(可选) + # 采集系统级指标(CPU/Memory/Disk/Network) + # 需要 --profile metrics 启用 + # ============================== + node-exporter: + profiles: ["metrics"] + image: prom/node-exporter:latest + container_name: chat-deploy-node-exporter + restart: always + command: + - "--path.rootfs=/host" + - "--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)" + volumes: + - /:/host:ro,rslave + ports: + - "9100:9100" + networks: + - chat-deploy-obs + +networks: + chat-deploy-obs: + driver: bridge + +volumes: + prometheus_agent_data: {}