feat: 添加日志文件输出功能和心跳故障排查工具
- 新增日志文件输出功能,支持配置日志文件路径和级别 - 添加心跳故障排查脚本 check-heartbeat.sh - 支持通过环境变量 LOG_FILE 设置日志文件路径 - 日志自动创建目录,支持相对路径和绝对路径 - 优化日志初始化逻辑,支持直接写入文件 - 改进配置加载,支持日志配置项 - 完善文档,添加故障排查章节和日志功能说明 - 更新版本号至 v1.1.0
This commit is contained in:
131
README.md
131
README.md
@@ -13,6 +13,8 @@ LinkMaster 节点服务,用于执行网络测试任务。
|
||||
- FindPing IP段批量ping检测
|
||||
- 持续 Ping/TCPing 测试
|
||||
- 心跳上报
|
||||
- 日志文件输出(支持配置日志文件路径和级别)
|
||||
- 心跳故障排查工具
|
||||
|
||||
## 安装
|
||||
|
||||
@@ -87,6 +89,7 @@ BACKEND_URL=http://your-backend-server:8080 ./run.sh start
|
||||
|
||||
- `BACKEND_URL`: 后端服务地址(必需,默认: http://localhost:8080)
|
||||
- `CONFIG_PATH`: 配置文件路径(可选,默认: config.yaml)
|
||||
- `LOG_FILE`: 日志文件路径(可选,默认: node.log)
|
||||
|
||||
### 配置文件(可选)
|
||||
|
||||
@@ -99,9 +102,18 @@ backend:
|
||||
url: http://your-backend-server:8080
|
||||
heartbeat:
|
||||
interval: 60
|
||||
log:
|
||||
file: node.log # 日志文件路径(默认: node.log,空则输出到标准错误)
|
||||
level: info # 日志级别: debug, info, warn, error(默认: info)
|
||||
debug: false
|
||||
```
|
||||
|
||||
**日志配置说明:**
|
||||
- `log.file`: 日志文件路径。如果为空,日志将输出到标准错误(stderr)
|
||||
- `log.level`: 日志级别,支持 `debug`、`info`、`warn`、`error`
|
||||
- 也可以通过环境变量 `LOG_FILE` 设置日志文件路径
|
||||
- 日志文件会自动创建,如果目录不存在会自动创建
|
||||
|
||||
## 运行脚本
|
||||
|
||||
使用 `run.sh` 脚本管理节点端。**每次启动时会自动拉取最新代码并重新编译**:
|
||||
@@ -464,3 +476,122 @@ go build -mod=vendor -o agent ./cmd/agent
|
||||
### GET /api/health
|
||||
|
||||
健康检查
|
||||
|
||||
## 故障排查
|
||||
|
||||
### 心跳同步问题排查
|
||||
|
||||
如果节点无法同步心跳,可以使用排查脚本进行诊断:
|
||||
|
||||
```bash
|
||||
# 运行心跳故障排查脚本
|
||||
./check-heartbeat.sh
|
||||
```
|
||||
|
||||
排查脚本会自动检查以下项目:
|
||||
|
||||
1. **进程状态** - 检查节点进程是否正在运行
|
||||
2. **配置文件** - 检查配置文件是否存在和正确
|
||||
3. **网络连接** - 检查能否连接到后端服务器
|
||||
4. **日志分析** - 分析日志中的心跳相关错误
|
||||
5. **手动测试** - 手动发送心跳测试连接
|
||||
6. **系统资源** - 检查磁盘空间和内存使用情况
|
||||
|
||||
**常见问题及解决方案:**
|
||||
|
||||
1. **进程未运行**
|
||||
```bash
|
||||
./run.sh start
|
||||
```
|
||||
|
||||
2. **网络连接失败**
|
||||
- 检查后端服务是否正常运行
|
||||
- 检查防火墙规则(确保可以访问后端端口)
|
||||
- 检查 BACKEND_URL 配置是否正确
|
||||
|
||||
3. **心跳发送失败**
|
||||
- 查看日志: `./run.sh logs`
|
||||
- 检查后端服务日志
|
||||
- 确认后端 `/api/node/heartbeat` 接口正常
|
||||
|
||||
4. **配置文件问题**
|
||||
- 检查 `config.yaml` 文件格式是否正确
|
||||
- 确认 `BACKEND_URL` 环境变量或配置文件中的 URL 正确
|
||||
|
||||
5. **查看详细日志**
|
||||
```bash
|
||||
# 实时查看日志
|
||||
./run.sh logs
|
||||
|
||||
# 查看完整日志
|
||||
./run.sh logs-all
|
||||
```
|
||||
|
||||
### 日志功能
|
||||
|
||||
节点端支持将日志直接写入文件,便于排查问题和监控运行状态。
|
||||
|
||||
**日志配置方式:**
|
||||
|
||||
1. **环境变量**(推荐)
|
||||
```bash
|
||||
LOG_FILE=/var/log/linkmaster-node.log ./run.sh start
|
||||
```
|
||||
|
||||
2. **配置文件**
|
||||
在 `config.yaml` 中配置:
|
||||
```yaml
|
||||
log:
|
||||
file: node.log # 日志文件路径
|
||||
level: info # 日志级别: debug, info, warn, error
|
||||
```
|
||||
|
||||
3. **默认行为**
|
||||
- 默认日志文件:`node.log`(当前目录)
|
||||
- 默认日志级别:`info`
|
||||
- 如果未设置日志文件,日志输出到标准错误(stderr)
|
||||
|
||||
**日志特性:**
|
||||
- ✅ 自动创建日志文件和目录
|
||||
- ✅ 追加模式,不会覆盖已有日志
|
||||
- ✅ JSON 格式,便于日志分析
|
||||
- ✅ 包含调用信息(文件名和行号)
|
||||
- ✅ Error 级别日志包含堆栈信息
|
||||
|
||||
**查看日志:**
|
||||
```bash
|
||||
# 实时查看日志
|
||||
tail -f node.log
|
||||
|
||||
# 查看心跳相关日志
|
||||
grep -i "心跳" node.log
|
||||
|
||||
# 查看错误日志
|
||||
grep -i "error" node.log
|
||||
|
||||
# 查看最后100行
|
||||
tail -n 100 node.log
|
||||
```
|
||||
|
||||
## 更新日志
|
||||
|
||||
### v1.1.0 (最新)
|
||||
|
||||
**新增功能:**
|
||||
- ✨ 添加日志文件输出功能,支持配置日志文件路径和级别
|
||||
- ✨ 添加心跳故障排查工具 `check-heartbeat.sh`
|
||||
- ✨ 支持通过环境变量 `LOG_FILE` 设置日志文件路径
|
||||
- ✨ 日志自动创建目录,支持相对路径和绝对路径
|
||||
|
||||
**改进:**
|
||||
- 🔧 优化日志初始化逻辑,支持直接写入文件
|
||||
- 🔧 改进配置加载,支持日志配置项
|
||||
- 📝 完善文档,添加故障排查章节
|
||||
|
||||
### v1.0.0
|
||||
|
||||
- 🎉 初始版本发布
|
||||
- ✅ 支持 HTTP GET/POST 测试
|
||||
- ✅ 支持 Ping、DNS、Traceroute 等网络测试
|
||||
- ✅ 支持持续 Ping/TCPing 测试
|
||||
- ✅ 支持心跳上报
|
||||
|
||||
512
check-heartbeat.sh
Executable file
512
check-heartbeat.sh
Executable file
@@ -0,0 +1,512 @@
|
||||
#!/bin/bash
|
||||
|
||||
# ============================================
|
||||
# LinkMaster 节点心跳故障排查脚本
|
||||
# 用途:诊断节点心跳同步问题
|
||||
# ============================================
|
||||
|
||||
set -e
|
||||
|
||||
# 颜色输出
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
CYAN='\033[0;36m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 脚本目录
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
cd "$SCRIPT_DIR"
|
||||
|
||||
# 配置
|
||||
BINARY_NAME="agent"
|
||||
LOG_FILE="node.log"
|
||||
PID_FILE="node.pid"
|
||||
CONFIG_FILE="${CONFIG_PATH:-config.yaml}"
|
||||
|
||||
# 检查结果
|
||||
ISSUES=0
|
||||
WARNINGS=0
|
||||
|
||||
# 打印分隔线
|
||||
print_separator() {
|
||||
echo -e "${CYAN}========================================${NC}"
|
||||
}
|
||||
|
||||
# 打印检查项标题
|
||||
print_check_title() {
|
||||
echo -e "\n${BLUE}▶ $1${NC}"
|
||||
}
|
||||
|
||||
# 打印成功信息
|
||||
print_success() {
|
||||
echo -e "${GREEN}✓ $1${NC}"
|
||||
}
|
||||
|
||||
# 打印警告信息
|
||||
print_warning() {
|
||||
echo -e "${YELLOW}⚠ $1${NC}"
|
||||
((WARNINGS++))
|
||||
}
|
||||
|
||||
# 打印错误信息
|
||||
print_error() {
|
||||
echo -e "${RED}✗ $1${NC}"
|
||||
((ISSUES++))
|
||||
}
|
||||
|
||||
# 打印信息
|
||||
print_info() {
|
||||
echo -e "${CYAN}ℹ $1${NC}"
|
||||
}
|
||||
|
||||
# 获取PID
|
||||
get_pid() {
|
||||
if [ -f "$PID_FILE" ]; then
|
||||
PID=$(cat "$PID_FILE")
|
||||
if ps -p "$PID" > /dev/null 2>&1; then
|
||||
echo "$PID"
|
||||
else
|
||||
rm -f "$PID_FILE"
|
||||
echo ""
|
||||
fi
|
||||
else
|
||||
echo ""
|
||||
fi
|
||||
}
|
||||
|
||||
# 1. 检查进程状态
|
||||
check_process() {
|
||||
print_check_title "检查进程状态"
|
||||
|
||||
PID=$(get_pid)
|
||||
if [ -z "$PID" ]; then
|
||||
print_error "节点进程未运行"
|
||||
print_info "请使用 ./run.sh start 启动服务"
|
||||
return 1
|
||||
else
|
||||
print_success "节点进程正在运行 (PID: $PID)"
|
||||
|
||||
# 检查进程运行时间
|
||||
if command -v ps > /dev/null 2>&1; then
|
||||
RUNTIME=$(ps -o etime= -p "$PID" 2>/dev/null | tr -d ' ')
|
||||
if [ -n "$RUNTIME" ]; then
|
||||
print_info "进程运行时间: $RUNTIME"
|
||||
fi
|
||||
fi
|
||||
|
||||
# 检查进程资源使用
|
||||
if command -v ps > /dev/null 2>&1; then
|
||||
CPU_MEM=$(ps -o %cpu,%mem= -p "$PID" 2>/dev/null | tr -d ' ')
|
||||
if [ -n "$CPU_MEM" ]; then
|
||||
print_info "CPU/内存使用: $CPU_MEM"
|
||||
fi
|
||||
fi
|
||||
|
||||
return 0
|
||||
fi
|
||||
}
|
||||
|
||||
# 2. 检查配置文件
|
||||
check_config() {
|
||||
print_check_title "检查配置文件"
|
||||
|
||||
if [ ! -f "$CONFIG_FILE" ]; then
|
||||
print_warning "配置文件不存在: $CONFIG_FILE"
|
||||
print_info "将使用环境变量和默认配置"
|
||||
|
||||
# 检查环境变量
|
||||
if [ -n "$BACKEND_URL" ]; then
|
||||
print_info "使用环境变量 BACKEND_URL: $BACKEND_URL"
|
||||
else
|
||||
print_warning "未设置 BACKEND_URL 环境变量,将使用默认值: http://localhost:8080"
|
||||
fi
|
||||
return 0
|
||||
fi
|
||||
|
||||
print_success "配置文件存在: $CONFIG_FILE"
|
||||
|
||||
# 检查配置文件内容
|
||||
if command -v yq > /dev/null 2>&1; then
|
||||
BACKEND_URL_FROM_CONFIG=$(yq eval '.backend.url' "$CONFIG_FILE" 2>/dev/null || echo "")
|
||||
HEARTBEAT_INTERVAL=$(yq eval '.heartbeat.interval' "$CONFIG_FILE" 2>/dev/null || echo "")
|
||||
NODE_ID=$(yq eval '.node.id' "$CONFIG_FILE" 2>/dev/null || echo "")
|
||||
NODE_IP=$(yq eval '.node.ip' "$CONFIG_FILE" 2>/dev/null || echo "")
|
||||
else
|
||||
# 使用 grep 和 sed 简单解析
|
||||
BACKEND_URL_FROM_CONFIG=$(grep -E "^\s*url:" "$CONFIG_FILE" | head -1 | sed 's/.*url:\s*//' | tr -d '"' | tr -d "'" || echo "")
|
||||
HEARTBEAT_INTERVAL=$(grep -E "^\s*interval:" "$CONFIG_FILE" | head -1 | sed 's/.*interval:\s*//' | tr -d '"' | tr -d "'" || echo "")
|
||||
NODE_ID=$(grep -E "^\s*id:" "$CONFIG_FILE" | head -1 | sed 's/.*id:\s*//' | tr -d '"' | tr -d "'" || echo "")
|
||||
NODE_IP=$(grep -E "^\s*ip:" "$CONFIG_FILE" | head -1 | sed 's/.*ip:\s*//' | tr -d '"' | tr -d "'" || echo "")
|
||||
fi
|
||||
|
||||
# 确定使用的后端URL
|
||||
if [ -n "$BACKEND_URL" ]; then
|
||||
FINAL_BACKEND_URL="$BACKEND_URL"
|
||||
print_info "使用环境变量 BACKEND_URL: $FINAL_BACKEND_URL"
|
||||
elif [ -n "$BACKEND_URL_FROM_CONFIG" ]; then
|
||||
FINAL_BACKEND_URL="$BACKEND_URL_FROM_CONFIG"
|
||||
print_info "使用配置文件中的后端URL: $FINAL_BACKEND_URL"
|
||||
else
|
||||
FINAL_BACKEND_URL="http://localhost:8080"
|
||||
print_warning "未找到后端URL配置,使用默认值: $FINAL_BACKEND_URL"
|
||||
fi
|
||||
|
||||
if [ -n "$HEARTBEAT_INTERVAL" ]; then
|
||||
print_info "心跳间隔: ${HEARTBEAT_INTERVAL}秒"
|
||||
else
|
||||
print_info "心跳间隔: 60秒 (默认值)"
|
||||
fi
|
||||
|
||||
if [ -n "$NODE_ID" ] && [ "$NODE_ID" != "0" ] && [ "$NODE_ID" != "null" ]; then
|
||||
print_success "节点ID已配置: $NODE_ID"
|
||||
else
|
||||
print_warning "节点ID未配置或为0,将在首次心跳时获取"
|
||||
fi
|
||||
|
||||
if [ -n "$NODE_IP" ] && [ "$NODE_IP" != "null" ]; then
|
||||
print_success "节点IP已配置: $NODE_IP"
|
||||
else
|
||||
print_warning "节点IP未配置,将在首次心跳时获取"
|
||||
fi
|
||||
|
||||
export FINAL_BACKEND_URL
|
||||
}
|
||||
|
||||
# 3. 检查网络连接
|
||||
check_network() {
|
||||
print_check_title "检查网络连接"
|
||||
|
||||
if [ -z "$FINAL_BACKEND_URL" ]; then
|
||||
print_error "无法确定后端URL,跳过网络检查"
|
||||
return 1
|
||||
fi
|
||||
|
||||
# 提取主机和端口
|
||||
BACKEND_HOST=$(echo "$FINAL_BACKEND_URL" | sed -E 's|https?://||' | cut -d'/' -f1 | cut -d':' -f1)
|
||||
BACKEND_PORT=$(echo "$FINAL_BACKEND_URL" | sed -E 's|https?://||' | cut -d'/' -f1 | cut -d':' -f2)
|
||||
|
||||
if [ -z "$BACKEND_PORT" ]; then
|
||||
if echo "$FINAL_BACKEND_URL" | grep -q "https://"; then
|
||||
BACKEND_PORT=443
|
||||
else
|
||||
BACKEND_PORT=80
|
||||
fi
|
||||
fi
|
||||
|
||||
print_info "后端地址: $BACKEND_HOST:$BACKEND_PORT"
|
||||
|
||||
# 检查DNS解析
|
||||
if command -v nslookup > /dev/null 2>&1 || command -v host > /dev/null 2>&1; then
|
||||
if command -v nslookup > /dev/null 2>&1; then
|
||||
if nslookup "$BACKEND_HOST" > /dev/null 2>&1; then
|
||||
print_success "DNS解析成功: $BACKEND_HOST"
|
||||
else
|
||||
print_error "DNS解析失败: $BACKEND_HOST"
|
||||
return 1
|
||||
fi
|
||||
elif command -v host > /dev/null 2>&1; then
|
||||
if host "$BACKEND_HOST" > /dev/null 2>&1; then
|
||||
print_success "DNS解析成功: $BACKEND_HOST"
|
||||
else
|
||||
print_error "DNS解析失败: $BACKEND_HOST"
|
||||
return 1
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# 检查端口连通性
|
||||
if command -v nc > /dev/null 2>&1; then
|
||||
if nc -z -w 3 "$BACKEND_HOST" "$BACKEND_PORT" 2>/dev/null; then
|
||||
print_success "端口连通性检查通过: $BACKEND_HOST:$BACKEND_PORT"
|
||||
else
|
||||
print_error "端口无法连接: $BACKEND_HOST:$BACKEND_PORT"
|
||||
print_info "可能原因: 防火墙阻止、后端服务未启动、网络不通"
|
||||
return 1
|
||||
fi
|
||||
elif command -v timeout > /dev/null 2>&1 && command -v bash > /dev/null 2>&1; then
|
||||
# 使用 bash 内置的 TCP 连接测试
|
||||
if timeout 3 bash -c "echo > /dev/tcp/$BACKEND_HOST/$BACKEND_PORT" 2>/dev/null; then
|
||||
print_success "端口连通性检查通过: $BACKEND_HOST:$BACKEND_PORT"
|
||||
else
|
||||
print_error "端口无法连接: $BACKEND_HOST:$BACKEND_PORT"
|
||||
print_info "可能原因: 防火墙阻止、后端服务未启动、网络不通"
|
||||
return 1
|
||||
fi
|
||||
else
|
||||
print_warning "无法检查端口连通性(需要 nc 或 timeout 命令)"
|
||||
fi
|
||||
|
||||
# 检查HTTP连接
|
||||
HEARTBEAT_URL="${FINAL_BACKEND_URL%/}/api/node/heartbeat"
|
||||
print_info "测试心跳接口: $HEARTBEAT_URL"
|
||||
|
||||
if command -v curl > /dev/null 2>&1; then
|
||||
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 \
|
||||
-X POST \
|
||||
-H "Content-Type: application/x-www-form-urlencoded" \
|
||||
-d "type=pingServer" \
|
||||
"$HEARTBEAT_URL" 2>/dev/null || echo "000")
|
||||
|
||||
if [ "$HTTP_CODE" = "200" ]; then
|
||||
print_success "心跳接口响应正常 (HTTP 200)"
|
||||
elif [ "$HTTP_CODE" = "000" ]; then
|
||||
print_error "无法连接到心跳接口"
|
||||
print_info "可能原因: 网络不通、后端服务未启动、防火墙阻止"
|
||||
return 1
|
||||
else
|
||||
print_warning "心跳接口返回异常状态码: HTTP $HTTP_CODE"
|
||||
print_info "这可能是正常的,取决于后端实现"
|
||||
fi
|
||||
elif command -v wget > /dev/null 2>&1; then
|
||||
HTTP_CODE=$(wget --spider --server-response --timeout=5 --tries=1 \
|
||||
--post-data="type=pingServer" \
|
||||
--header="Content-Type: application/x-www-form-urlencoded" \
|
||||
"$HEARTBEAT_URL" 2>&1 | grep -E "HTTP/" | tail -1 | awk '{print $2}' || echo "000")
|
||||
|
||||
if [ "$HTTP_CODE" = "200" ]; then
|
||||
print_success "心跳接口响应正常 (HTTP 200)"
|
||||
elif [ "$HTTP_CODE" = "000" ]; then
|
||||
print_error "无法连接到心跳接口"
|
||||
return 1
|
||||
else
|
||||
print_warning "心跳接口返回异常状态码: HTTP $HTTP_CODE"
|
||||
fi
|
||||
else
|
||||
print_warning "无法测试HTTP连接(需要 curl 或 wget 命令)"
|
||||
fi
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
# 4. 检查日志
|
||||
check_logs() {
|
||||
print_check_title "检查日志文件"
|
||||
|
||||
if [ ! -f "$LOG_FILE" ]; then
|
||||
print_warning "日志文件不存在: $LOG_FILE"
|
||||
print_info "如果服务刚启动,日志文件可能还未创建"
|
||||
return 0
|
||||
fi
|
||||
|
||||
print_success "日志文件存在: $LOG_FILE"
|
||||
|
||||
# 检查日志文件大小
|
||||
LOG_SIZE=$(stat -f%z "$LOG_FILE" 2>/dev/null || stat -c%s "$LOG_FILE" 2>/dev/null || echo "0")
|
||||
if [ "$LOG_SIZE" -gt 10485760 ]; then
|
||||
print_warning "日志文件较大: $(($LOG_SIZE / 1024 / 1024))MB"
|
||||
fi
|
||||
|
||||
# 检查最近的心跳记录
|
||||
print_info "查找最近的心跳记录..."
|
||||
|
||||
HEARTBEAT_SUCCESS=$(grep -i "心跳发送成功\|heartbeat.*success\|心跳响应" "$LOG_FILE" 2>/dev/null | tail -5 || true)
|
||||
HEARTBEAT_FAILED=$(grep -i "心跳发送失败\|heartbeat.*fail\|发送心跳失败" "$LOG_FILE" 2>/dev/null | tail -5 || true)
|
||||
HEARTBEAT_ERROR=$(grep -i "error.*heartbeat\|心跳.*error" "$LOG_FILE" 2>/dev/null | tail -5 || true)
|
||||
|
||||
if [ -n "$HEARTBEAT_SUCCESS" ]; then
|
||||
echo -e "${GREEN}最近成功的心跳记录:${NC}"
|
||||
echo "$HEARTBEAT_SUCCESS" | while IFS= read -r line; do
|
||||
echo " $line"
|
||||
done
|
||||
fi
|
||||
|
||||
if [ -n "$HEARTBEAT_FAILED" ]; then
|
||||
echo -e "${YELLOW}最近失败的心跳记录:${NC}"
|
||||
echo "$HEARTBEAT_FAILED" | while IFS= read -r line; do
|
||||
echo " $line"
|
||||
done
|
||||
((WARNINGS++))
|
||||
fi
|
||||
|
||||
if [ -n "$HEARTBEAT_ERROR" ]; then
|
||||
echo -e "${RED}最近的心跳错误记录:${NC}"
|
||||
echo "$HEARTBEAT_ERROR" | while IFS= read -r line; do
|
||||
echo " $line"
|
||||
done
|
||||
((ISSUES++))
|
||||
fi
|
||||
|
||||
# 检查最近的错误
|
||||
RECENT_ERRORS=$(grep -i "error\|fail\|panic" "$LOG_FILE" 2>/dev/null | tail -10 || true)
|
||||
if [ -n "$RECENT_ERRORS" ]; then
|
||||
echo -e "${YELLOW}最近的错误记录(最后10条):${NC}"
|
||||
echo "$RECENT_ERRORS" | while IFS= read -r line; do
|
||||
echo " $line"
|
||||
done
|
||||
fi
|
||||
|
||||
# 检查最后的心跳时间
|
||||
LAST_HEARTBEAT=$(grep -i "心跳" "$LOG_FILE" 2>/dev/null | tail -1 || true)
|
||||
if [ -n "$LAST_HEARTBEAT" ]; then
|
||||
print_info "最后的心跳日志: $LAST_HEARTBEAT"
|
||||
else
|
||||
print_warning "日志中未找到心跳记录"
|
||||
fi
|
||||
}
|
||||
|
||||
# 5. 手动测试心跳
|
||||
test_heartbeat() {
|
||||
print_check_title "手动测试心跳发送"
|
||||
|
||||
if [ -z "$FINAL_BACKEND_URL" ]; then
|
||||
print_error "无法确定后端URL,跳过心跳测试"
|
||||
return 1
|
||||
fi
|
||||
|
||||
HEARTBEAT_URL="${FINAL_BACKEND_URL%/}/api/node/heartbeat"
|
||||
print_info "发送测试心跳到: $HEARTBEAT_URL"
|
||||
|
||||
if command -v curl > /dev/null 2>&1; then
|
||||
RESPONSE=$(curl -s -w "\n%{http_code}" --connect-timeout 10 --max-time 15 \
|
||||
-X POST \
|
||||
-H "Content-Type: application/x-www-form-urlencoded" \
|
||||
-d "type=pingServer" \
|
||||
"$HEARTBEAT_URL" 2>&1)
|
||||
|
||||
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
|
||||
BODY=$(echo "$RESPONSE" | sed '$d')
|
||||
|
||||
if [ "$HTTP_CODE" = "200" ]; then
|
||||
print_success "心跳发送成功 (HTTP 200)"
|
||||
if [ -n "$BODY" ]; then
|
||||
print_info "响应内容: $BODY"
|
||||
|
||||
# 尝试解析JSON响应
|
||||
if echo "$BODY" | grep -q "node_id\|node_ip"; then
|
||||
print_success "响应包含节点信息"
|
||||
echo "$BODY" | grep -o '"node_id":[0-9]*\|"node_ip":"[^"]*"' 2>/dev/null || true
|
||||
fi
|
||||
fi
|
||||
else
|
||||
print_error "心跳发送失败 (HTTP $HTTP_CODE)"
|
||||
if [ -n "$BODY" ]; then
|
||||
print_info "响应内容: $BODY"
|
||||
fi
|
||||
return 1
|
||||
fi
|
||||
elif command -v wget > /dev/null 2>&1; then
|
||||
RESPONSE=$(wget -qO- --post-data="type=pingServer" \
|
||||
--header="Content-Type: application/x-www-form-urlencoded" \
|
||||
--timeout=15 \
|
||||
"$HEARTBEAT_URL" 2>&1)
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
print_success "心跳发送成功"
|
||||
if [ -n "$RESPONSE" ]; then
|
||||
print_info "响应内容: $RESPONSE"
|
||||
fi
|
||||
else
|
||||
print_error "心跳发送失败"
|
||||
return 1
|
||||
fi
|
||||
else
|
||||
print_warning "无法测试心跳(需要 curl 或 wget 命令)"
|
||||
return 1
|
||||
fi
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
# 6. 检查系统资源
|
||||
check_resources() {
|
||||
print_check_title "检查系统资源"
|
||||
|
||||
# 检查磁盘空间
|
||||
if command -v df > /dev/null 2>&1; then
|
||||
DISK_USAGE=$(df -h . | tail -1 | awk '{print $5}' | sed 's/%//')
|
||||
if [ "$DISK_USAGE" -gt 90 ]; then
|
||||
print_error "磁盘空间不足: ${DISK_USAGE}%"
|
||||
elif [ "$DISK_USAGE" -gt 80 ]; then
|
||||
print_warning "磁盘空间紧张: ${DISK_USAGE}%"
|
||||
else
|
||||
print_success "磁盘空间充足: ${DISK_USAGE}%"
|
||||
fi
|
||||
fi
|
||||
|
||||
# 检查内存
|
||||
if command -v free > /dev/null 2>&1; then
|
||||
MEM_INFO=$(free -m | grep Mem)
|
||||
MEM_TOTAL=$(echo "$MEM_INFO" | awk '{print $2}')
|
||||
MEM_AVAIL=$(echo "$MEM_INFO" | awk '{print $7}')
|
||||
if [ -z "$MEM_AVAIL" ]; then
|
||||
MEM_AVAIL=$(echo "$MEM_INFO" | awk '{print $4}')
|
||||
fi
|
||||
|
||||
if [ -n "$MEM_TOTAL" ] && [ -n "$MEM_AVAIL" ]; then
|
||||
MEM_PERCENT=$((MEM_AVAIL * 100 / MEM_TOTAL))
|
||||
if [ "$MEM_PERCENT" -lt 10 ]; then
|
||||
print_error "可用内存不足: ${MEM_AVAIL}MB / ${MEM_TOTAL}MB (${MEM_PERCENT}%)"
|
||||
elif [ "$MEM_PERCENT" -lt 20 ]; then
|
||||
print_warning "可用内存紧张: ${MEM_AVAIL}MB / ${MEM_TOTAL}MB (${MEM_PERCENT}%)"
|
||||
else
|
||||
print_success "内存充足: ${MEM_AVAIL}MB / ${MEM_TOTAL}MB (${MEM_PERCENT}%)"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
echo -e "${CYAN}"
|
||||
echo "========================================"
|
||||
echo " LinkMaster 节点心跳故障排查工具"
|
||||
echo "========================================"
|
||||
echo -e "${NC}"
|
||||
|
||||
# 执行各项检查
|
||||
check_process
|
||||
PROCESS_OK=$?
|
||||
|
||||
check_config
|
||||
|
||||
if [ $PROCESS_OK -eq 0 ]; then
|
||||
check_network
|
||||
NETWORK_OK=$?
|
||||
|
||||
check_logs
|
||||
|
||||
if [ $NETWORK_OK -eq 0 ]; then
|
||||
echo ""
|
||||
read -p "是否执行手动心跳测试? (y/N): " -n 1 -r
|
||||
echo ""
|
||||
if [[ $REPLY =~ ^[Yy]$ ]]; then
|
||||
test_heartbeat
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
check_resources
|
||||
|
||||
# 总结
|
||||
print_separator
|
||||
echo -e "\n${BLUE}排查总结:${NC}"
|
||||
|
||||
if [ $ISSUES -eq 0 ] && [ $WARNINGS -eq 0 ]; then
|
||||
echo -e "${GREEN}✓ 未发现明显问题${NC}"
|
||||
echo -e "${CYAN}如果心跳仍然无法同步,请检查:${NC}"
|
||||
echo " 1. 后端服务是否正常运行"
|
||||
echo " 2. 后端数据库是否正常"
|
||||
echo " 3. 防火墙规则是否正确配置"
|
||||
echo " 4. 查看完整日志: ./run.sh logs-all"
|
||||
else
|
||||
if [ $ISSUES -gt 0 ]; then
|
||||
echo -e "${RED}发现 $ISSUES 个严重问题${NC}"
|
||||
fi
|
||||
if [ $WARNINGS -gt 0 ]; then
|
||||
echo -e "${YELLOW}发现 $WARNINGS 个警告${NC}"
|
||||
fi
|
||||
|
||||
echo -e "\n${CYAN}建议操作:${NC}"
|
||||
echo " 1. 根据上述检查结果修复问题"
|
||||
echo " 2. 重启服务: ./run.sh restart"
|
||||
echo " 3. 查看实时日志: ./run.sh logs"
|
||||
echo " 4. 查看完整日志: ./run.sh logs-all"
|
||||
fi
|
||||
|
||||
print_separator
|
||||
}
|
||||
|
||||
# 运行主函数
|
||||
main
|
||||
@@ -5,6 +5,7 @@ import (
|
||||
"fmt"
|
||||
"os"
|
||||
"os/signal"
|
||||
"path/filepath"
|
||||
"syscall"
|
||||
"time"
|
||||
|
||||
@@ -14,8 +15,11 @@ import (
|
||||
"linkmaster-node/internal/server"
|
||||
|
||||
"go.uber.org/zap"
|
||||
"go.uber.org/zap/zapcore"
|
||||
)
|
||||
|
||||
var version = "1.1.0" // 编译时通过 -ldflags "-X main.version=xxx" 设置
|
||||
|
||||
func main() {
|
||||
// 加载配置
|
||||
cfg, err := config.Load()
|
||||
@@ -32,7 +36,7 @@ func main() {
|
||||
}
|
||||
defer logger.Sync()
|
||||
|
||||
logger.Info("节点服务启动", zap.String("version", "1.0.0"))
|
||||
logger.Info("节点服务启动", zap.String("version", version))
|
||||
|
||||
// 初始化错误恢复
|
||||
recovery.Init()
|
||||
@@ -80,9 +84,69 @@ func main() {
|
||||
}
|
||||
|
||||
func initLogger(cfg *config.Config) (*zap.Logger, error) {
|
||||
// 确定日志级别
|
||||
var level zapcore.Level
|
||||
logLevel := cfg.Log.Level
|
||||
if logLevel == "" {
|
||||
if cfg.Debug {
|
||||
return zap.NewDevelopment()
|
||||
logLevel = "debug"
|
||||
} else {
|
||||
logLevel = "info"
|
||||
}
|
||||
}
|
||||
return zap.NewProduction()
|
||||
}
|
||||
|
||||
switch logLevel {
|
||||
case "debug":
|
||||
level = zapcore.DebugLevel
|
||||
case "info":
|
||||
level = zapcore.InfoLevel
|
||||
case "warn":
|
||||
level = zapcore.WarnLevel
|
||||
case "error":
|
||||
level = zapcore.ErrorLevel
|
||||
default:
|
||||
level = zapcore.InfoLevel
|
||||
}
|
||||
|
||||
// 编码器配置
|
||||
encoderConfig := zap.NewProductionEncoderConfig()
|
||||
if cfg.Debug {
|
||||
encoderConfig = zap.NewDevelopmentEncoderConfig()
|
||||
}
|
||||
encoderConfig.EncodeTime = zapcore.ISO8601TimeEncoder
|
||||
encoderConfig.EncodeLevel = zapcore.CapitalLevelEncoder
|
||||
|
||||
// 确定输出目标
|
||||
var writeSyncer zapcore.WriteSyncer
|
||||
if cfg.Log.File != "" {
|
||||
// 确保日志目录存在
|
||||
logDir := filepath.Dir(cfg.Log.File)
|
||||
if logDir != "." && logDir != "" {
|
||||
if err := os.MkdirAll(logDir, 0755); err != nil {
|
||||
return nil, fmt.Errorf("创建日志目录失败: %w", err)
|
||||
}
|
||||
}
|
||||
|
||||
// 打开日志文件(追加模式)
|
||||
logFile, err := os.OpenFile(cfg.Log.File, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("打开日志文件失败: %w", err)
|
||||
}
|
||||
writeSyncer = zapcore.AddSync(logFile)
|
||||
} else {
|
||||
// 输出到标准错误(兼容原有行为)
|
||||
writeSyncer = zapcore.AddSync(os.Stderr)
|
||||
}
|
||||
|
||||
// 创建核心
|
||||
core := zapcore.NewCore(
|
||||
zapcore.NewJSONEncoder(encoderConfig),
|
||||
writeSyncer,
|
||||
level,
|
||||
)
|
||||
|
||||
// 创建 logger
|
||||
logger := zap.New(core, zap.AddCaller(), zap.AddStacktrace(zapcore.ErrorLevel))
|
||||
|
||||
return logger, nil
|
||||
}
|
||||
|
||||
@@ -21,6 +21,11 @@ type Config struct {
|
||||
Interval int `yaml:"interval"` // 心跳间隔(秒)
|
||||
} `yaml:"heartbeat"`
|
||||
|
||||
Log struct {
|
||||
File string `yaml:"file"` // 日志文件路径(空则输出到标准错误)
|
||||
Level string `yaml:"level"` // 日志级别:debug, info, warn, error(默认: info)
|
||||
} `yaml:"log"`
|
||||
|
||||
Debug bool `yaml:"debug"`
|
||||
|
||||
// 节点信息(通过心跳获取并持久化)
|
||||
@@ -42,12 +47,13 @@ func Load() (*Config, error) {
|
||||
cfg.Heartbeat.Interval = 60
|
||||
cfg.Debug = false
|
||||
|
||||
// 从环境变量读取后端URL
|
||||
backendURL := os.Getenv("BACKEND_URL")
|
||||
if backendURL == "" {
|
||||
backendURL = "http://localhost:8080"
|
||||
// 默认日志配置
|
||||
logFile := os.Getenv("LOG_FILE")
|
||||
if logFile == "" {
|
||||
logFile = "node.log"
|
||||
}
|
||||
cfg.Backend.URL = backendURL
|
||||
cfg.Log.File = logFile
|
||||
cfg.Log.Level = "info"
|
||||
|
||||
// 尝试从配置文件读取
|
||||
configPath := os.Getenv("CONFIG_PATH")
|
||||
@@ -66,6 +72,24 @@ func Load() (*Config, error) {
|
||||
}
|
||||
}
|
||||
|
||||
// 如果配置文件中没有设置日志文件,使用环境变量或默认值
|
||||
if cfg.Log.File == "" {
|
||||
logFile := os.Getenv("LOG_FILE")
|
||||
if logFile == "" {
|
||||
logFile = "node.log"
|
||||
}
|
||||
cfg.Log.File = logFile
|
||||
}
|
||||
|
||||
// 如果配置文件中没有设置日志级别,使用默认值
|
||||
if cfg.Log.Level == "" {
|
||||
if cfg.Debug {
|
||||
cfg.Log.Level = "debug"
|
||||
} else {
|
||||
cfg.Log.Level = "info"
|
||||
}
|
||||
}
|
||||
|
||||
return cfg, nil
|
||||
}
|
||||
|
||||
@@ -102,4 +126,3 @@ func GetConfigPath() string {
|
||||
}
|
||||
return configPath
|
||||
}
|
||||
|
||||
|
||||
6
run.sh
6
run.sh
@@ -191,12 +191,14 @@ start() {
|
||||
|
||||
echo -e "${BLUE}启动节点端服务...${NC}"
|
||||
echo -e "${BLUE}后端地址: $BACKEND_URL${NC}"
|
||||
echo -e "${BLUE}日志文件: $LOG_FILE${NC}"
|
||||
|
||||
# 设置环境变量
|
||||
export BACKEND_URL="$BACKEND_URL"
|
||||
export LOG_FILE="$LOG_FILE"
|
||||
|
||||
# 后台运行
|
||||
nohup ./"$BINARY_NAME" > "$LOG_FILE" 2>&1 &
|
||||
# 后台运行(日志现在由程序直接写入文件,这里保留重定向作为备份)
|
||||
nohup ./"$BINARY_NAME" >> "$LOG_FILE" 2>&1 &
|
||||
NEW_PID=$!
|
||||
|
||||
# 保存PID
|
||||
|
||||
Reference in New Issue
Block a user