一、监控系统的核心维度
graph TDA[前端监控] --> B[性能监控]A --> C[错误监控]A --> D[行为监控]A --> E[安全监控]B --> B1[FP/FCP/LCP]B --> B2[CLS]B --> B3[TTI/TBT]C --> C1[JS异常]C --> C2[资源加载失败]C --> C3[API异常]D --> D1[PV/UV]D --> D2[点击热力图]D --> D3[用户路径]
二、技术选型方案对比
方案 | 接入成本 | 扩展性 | 数据精度 | 适用场景 |
Sentry | 低 | 中 | 高 | 错误监控 |
ELK Stack | 高 | 极强 | 自定义 | 企业级全链路监控 |
自建SDK | 极高 | 无限 | 可调控 | 深度定制需求 |
Google Analytics | 极低 | 弱 | 中 | 基础行为分析 |
三、自建监控SDK核心实现
1. 性能数据采集
const perfObserver = new PerformanceObserver((list) => {const entries = list.getEntries();entries.forEach(entry => {if (entry.entryType === 'largest-contentful-paint') {sendToServer({type: 'LCP',value: entry.startTime,page: location.href});}});
});
perfObserver.observe({entryTypes: ['navigation', 'paint', 'largest-contentful-paint']});
2. 错误捕获机制
window.addEventListener('error', (event) => {const { filename, lineno, colno, message } = event;tracker.send('JS_ERROR', {msg: message,stack: `${filename}:${lineno}:${colno}`,ua: navigator.userAgent});
}, true);window.addEventListener('unhandledrejection', (event) => {tracker.send('PROMISE_ERROR', {reason: event.reason?.message || String(event.reason)});
});
3. 数据上报优化
class Tracker {constructor() {this.queue = [];this.timer = null;this.MAX_RETRY = 3;}send(eventType, payload) {this.queue.push({eventType, timestamp: Date.now(), ...payload});if (!this.timer) {this.timer = setTimeout(() => {this._flush();this.timer = null;}, 1000); // 1秒合并上报}}_flush(retryCount = 0) {if (navigator.sendBeacon) {const success = navigator.sendBeacon('/log-collector',JSON.stringify(this.queue));if (!success && retryCount < this.MAX_RETRY) {setTimeout(() => this._flush(retryCount + 1), 2000);}} else {// 降级方案fetch('/log-collector', {method: 'POST',body: JSON.stringify(this.queue)}).catch(e => console.error(e));}this.queue = [];}
}
四、数据处理架构设计
前端SDK → 日志收集服务(Kafka) → 流处理(Flink) → 实时分析(ClickHouse) → 可视化(Grafana)→ 报警系统(Prometheus AlertManager)
五、关键性能指标采集策略
- FCP (First Contentful Paint)
new PerformanceObserver((entryList) => {for (const entry of entryList.getEntriesByName('first-contentful-paint')) {console.log('FCP:', entry.startTime);}
}).observe({type: 'paint', buffered: true});
- CLS (Cumulative Layout Shift)
let clsValue = 0;
new PerformanceObserver((entryList) => {for (const entry of entryList.getEntries()) {if (!entry.hadRecentInput) {clsValue += entry.value;}}
}).observe({type: 'layout-shift', buffered: true});
- Long Tasks监控
new PerformanceObserver((list) => {for (const entry of list.getEntries()) {if (entry.duration > 50) {reportLongTask(entry);}}
}).observe({entryTypes: ['longtask']});
六、异常过滤与聚合算法
# 错误指纹生成算法示例
def generate_error_fingerprint(error):stack = error.get('stack', '')if not stack:return md5(f"{error['msg']}:{error['filename']}")# 提取关键堆栈帧frames = stack.split('\n').slice(0, 3)return md5(':'.join([error['msg'],*[frame.split('@')[0] for frame in frames]]))
七、可视化大屏关键技术
- ECharts实时渲染优化
const chart = echarts.init(dom);
let data = [];// WebSocket数据推送
socket.onmessage = (event) => {const newData = JSON.parse(event.data);data = [...data.slice(-100), ...newData]; // 滑动窗口chart.setOption({series: [{data: data,type: 'line'}]});
};
- Grafana自定义插件开发
package mainimport ("github.com/grafana/grafana-plugin-sdk-go/backend""github.com/grafana/grafana-plugin-sdk-go/data"
)func (d *Datasource) QueryData(ctx context.Context, req *backend.QueryDataRequest) (*backend.QueryDataResponse, error) {response := backend.NewQueryDataResponse()for _, q := range req.Queries {frames := data.Frames{data.NewFrame("response",data.NewField("time", nil, []time.Time{time.Now()}),data.NewField("value", nil, []float64{rand.Float64()}),),}response.Responses[q.RefID] = backend.DataResponse{Frames: frames,}}return response, nil
}
八、生产环境部署要点
- Nginx日志接收配置
location /log-collector {access_log off;proxy_pass http://log-processor;proxy_set_header X-Real-IP $remote_addr;proxy_http_version 1.1;proxy_set_header Connection "";
}
- Kafka Topic分区策略
// 按小时创建Topic分区
Properties props = new Properties();
props.put("partitioner.class", "io.confluent.kafka.partitioner.HourlyPartitioner");
props.put("log.timestamp.type", "LogAppendTime");
- ClickHouse表结构设计
CREATE TABLE frontend_logs (event_date Date,event_time DateTime,event_type String,session_id String,page_url String,metrics Nested(keys String,values Float32),INDEX idx_session session_id TYPE bloom_filter GRANULARITY 3
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_type, event_time);
九、前沿技术演进方向
- Web Vitals RUM (Real User Monitoring)
- OpenTelemetry 标准化采集
- AI异常检测(LSTM模型)
- 边缘计算预处理(Cloudflare Workers)
架构师思考:监控系统的终极目标不是收集数据,而是建立"感知-决策-行动"的闭环。当系统能自动识别性能退化模式并触发优化方案时,才是真正智能化的开始。建议从核心业务指标入手,逐步构建"可观测性金字塔"(指标 → 日志 → 链路追踪)。