一、监控系统的核心维度

graph TDA[前端监控] --> B[性能监控]A --> C[错误监控]A --> D[行为监控]A --> E[安全监控]B --> B1[FP/FCP/LCP]B --> B2[CLS]B --> B3[TTI/TBT]C --> C1[JS异常]C --> C2[资源加载失败]C --> C3[API异常]D --> D1[PV/UV]D --> D2[点击热力图]D --> D3[用户路径]

二、技术选型方案对比

方案

接入成本

扩展性

数据精度

适用场景

Sentry




错误监控

ELK Stack


极强

自定义

企业级全链路监控

自建SDK

极高

无限

可调控

深度定制需求

Google Analytics

极低



基础行为分析

三、自建监控SDK核心实现

1. 性能数据采集
const perfObserver = new PerformanceObserver((list) => {const entries = list.getEntries();entries.forEach(entry => {if (entry.entryType === 'largest-contentful-paint') {sendToServer({type: 'LCP',value: entry.startTime,page: location.href});}});
});
perfObserver.observe({entryTypes: ['navigation', 'paint', 'largest-contentful-paint']});
2. 错误捕获机制
window.addEventListener('error', (event) => {const { filename, lineno, colno, message } = event;tracker.send('JS_ERROR', {msg: message,stack: `${filename}:${lineno}:${colno}`,ua: navigator.userAgent});
}, true);window.addEventListener('unhandledrejection', (event) => {tracker.send('PROMISE_ERROR', {reason: event.reason?.message || String(event.reason)});
});
3. 数据上报优化
class Tracker {constructor() {this.queue = [];this.timer = null;this.MAX_RETRY = 3;}send(eventType, payload) {this.queue.push({eventType, timestamp: Date.now(), ...payload});if (!this.timer) {this.timer = setTimeout(() => {this._flush();this.timer = null;}, 1000); // 1秒合并上报}}_flush(retryCount = 0) {if (navigator.sendBeacon) {const success = navigator.sendBeacon('/log-collector',JSON.stringify(this.queue));if (!success && retryCount < this.MAX_RETRY) {setTimeout(() => this._flush(retryCount + 1), 2000);}} else {// 降级方案fetch('/log-collector', {method: 'POST',body: JSON.stringify(this.queue)}).catch(e => console.error(e));}this.queue = [];}
}

四、数据处理架构设计

前端SDK → 日志收集服务(Kafka) → 流处理(Flink) → 实时分析(ClickHouse) → 可视化(Grafana)→ 报警系统(Prometheus AlertManager)

五、关键性能指标采集策略

  1. FCP (First Contentful Paint)
new PerformanceObserver((entryList) => {for (const entry of entryList.getEntriesByName('first-contentful-paint')) {console.log('FCP:', entry.startTime);}
}).observe({type: 'paint', buffered: true});
  1. CLS (Cumulative Layout Shift)
let clsValue = 0;
new PerformanceObserver((entryList) => {for (const entry of entryList.getEntries()) {if (!entry.hadRecentInput) {clsValue += entry.value;}}
}).observe({type: 'layout-shift', buffered: true});
  1. Long Tasks监控
new PerformanceObserver((list) => {for (const entry of list.getEntries()) {if (entry.duration > 50) {reportLongTask(entry);}}
}).observe({entryTypes: ['longtask']});

六、异常过滤与聚合算法

# 错误指纹生成算法示例
def generate_error_fingerprint(error):stack = error.get('stack', '')if not stack:return md5(f"{error['msg']}:{error['filename']}")# 提取关键堆栈帧frames = stack.split('\n').slice(0, 3)return md5(':'.join([error['msg'],*[frame.split('@')[0] for frame in frames]]))

七、可视化大屏关键技术

  1. ECharts实时渲染优化
const chart = echarts.init(dom);
let data = [];// WebSocket数据推送
socket.onmessage = (event) => {const newData = JSON.parse(event.data);data = [...data.slice(-100), ...newData]; // 滑动窗口chart.setOption({series: [{data: data,type: 'line'}]});
};
  1. Grafana自定义插件开发
package mainimport ("github.com/grafana/grafana-plugin-sdk-go/backend""github.com/grafana/grafana-plugin-sdk-go/data"
)func (d *Datasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {response := backend.NewQueryDataResponse()for _, q := range req.Queries {frames := data.Frames{data.NewFrame("response",data.NewField("time", nil, []time.Time{time.Now()}),data.NewField("value", nil, []float64{rand.Float64()}),),}response.Responses[q.RefID] = backend.DataResponse{Frames: frames,}}return response, nil
}

八、生产环境部署要点

  1. Nginx日志接收配置
location /log-collector {access_log off;proxy_pass http://log-processor;proxy_set_header X-Real-IP $remote_addr;proxy_http_version 1.1;proxy_set_header Connection "";
}
  1. Kafka Topic分区策略
// 按小时创建Topic分区
Properties props = new Properties();
props.put("partitioner.class", "io.confluent.kafka.partitioner.HourlyPartitioner");
props.put("log.timestamp.type", "LogAppendTime");
  1. ClickHouse表结构设计
CREATE TABLE frontend_logs (event_date Date,event_time DateTime,event_type String,session_id String,page_url String,metrics Nested(keys String,values Float32),INDEX idx_session session_id TYPE bloom_filter GRANULARITY 3
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, event_time);

九、前沿技术演进方向

  1. Web Vitals RUM (Real User Monitoring)
  2. OpenTelemetry 标准化采集
  3. AI异常检测(LSTM模型)
  4. 边缘计算预处理(Cloudflare Workers)

架构师思考:监控系统的终极目标不是收集数据,而是建立"感知-决策-行动"的闭环。当系统能自动识别性能退化模式并触发优化方案时,才是真正智能化的开始。建议从核心业务指标入手,逐步构建"可观测性金字塔"(指标 → 日志 → 链路追踪)。