Documentation Index
Fetch the complete documentation index at: https://mintlify.com/zeroclaw-labs/zeroclaw/llms.txt
Use this file to discover all available pages before exploring further.
Monitor your ZeroClaw agents with comprehensive logging, metrics, and alerting.
Observability Stack
ZeroClaw provides multiple observability layers:
Logs
Structured logging with tracing
Metrics
Prometheus metrics endpoint
Health Checks
Built-in diagnostics
Tracing
OpenTelemetry support
Logging
Log Levels
Control verbosity with RUST_LOG environment variable:
# Error only
export RUST_LOG=error
# Info (default)
export RUST_LOG=info
# Debug
export RUST_LOG=debug
# Trace (very verbose)
export RUST_LOG=trace
# Module-specific
export RUST_LOG=zeroclaw::agent=debug,zeroclaw::tools=trace
Logs use structured format with timestamps:
2026-03-03T12:00:00.000Z INFO zeroclaw::agent: Starting agent loop session_id="abc123"
2026-03-03T12:00:01.234Z DEBUG zeroclaw::tools::shell: Executing command cmd="ls -la"
2026-03-03T12:00:01.567Z WARN zeroclaw::security: Blocked command attempt cmd="rm -rf /"
Log Aggregation
Ship logs to aggregation services:
Grafana Loki
Elasticsearch
# Install promtail
wget https://github.com/grafana/loki/releases/download/v2.9.0/promtail-linux-amd64.zip
unzip promtail-linux-amd64.zip
# Configure promtail.yaml
cat > promtail.yaml <<EOF
server:
http_listen_port: 9080
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: zeroclaw
static_configs:
- targets:
- localhost
labels:
job: zeroclaw
__path__: /home/user/.zeroclaw/logs/*.log
EOF
# Start promtail
./promtail -config.file=promtail.yaml
Use Filebeat to ship logs:filebeat.inputs:
- type: log
enabled: true
paths:
- /home/user/.zeroclaw/logs/*.log
fields:
service: zeroclaw
output.elasticsearch:
hosts: ["localhost:9200"]
Metrics
Prometheus Endpoint
Enable metrics in config.toml:
[observability]
prometheus_enabled = true
prometheus_port = 9090
prometheus_path = "/metrics"
Metrics available at http://localhost:9090/metrics
Key Metrics
zeroclaw_requests_total{channel="telegram",status="success"} 1234
zeroclaw_request_duration_seconds{quantile="0.95"} 0.234
zeroclaw_provider_requests_total{provider="anthropic"} 890
zeroclaw_token_usage_total{provider="anthropic",type="input"} 456789
zeroclaw_token_usage_total{provider="anthropic",type="output"} 123456
zeroclaw_errors_total{type="provider_error"} 12
zeroclaw_errors_total{type="tool_failure"} 5
zeroclaw_errors_total{type="security_block"} 3
zeroclaw_memory_bytes 5242880
zeroclaw_cpu_usage_percent 12.5
zeroclaw_goroutines 42
Prometheus Configuration
Add ZeroClaw to Prometheus scrape config:
scrape_configs:
- job_name: 'zeroclaw'
static_configs:
- targets: ['localhost:9090']
scrape_interval: 15s
Grafana Dashboard
Import the ZeroClaw dashboard:
# Download dashboard JSON
wget https://github.com/zeroclaw-labs/zeroclaw/blob/main/grafana-dashboard.json
# Import in Grafana UI:
# Dashboards > Import > Upload JSON file
Dashboard includes:
- Request rate and latency
- Tool execution statistics
- Token usage and costs
- Error rates
- System resources
OpenTelemetry
Export traces and metrics to OTLP collectors:
[observability]
otel_enabled = true
otel_endpoint = "http://localhost:4317"
otel_service_name = "zeroclaw-prod"
Build with OTLP support:
cargo build --features observability-otel
Trace Example
SPAN: agent_loop
SPAN: provider_call provider=anthropic
SPAN: http_request duration=234ms
SPAN: tool_execution tool=shell
SPAN: security_check duration=2ms
SPAN: runtime_exec duration=1234ms
Health Checks
Liveness Probe
Check if the service is running:
curl http://localhost:3000/health
Response:
{
"status": "healthy",
"uptime_seconds": 86400,
"version": "0.1.8"
}
Readiness Probe
Check if the service is ready to handle requests:
curl http://localhost:3000/ready
Checks:
- Provider connectivity
- Channel health
- Memory backend
Kubernetes Probes
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
Alerting
Prometheus Alerts
Create alerting rules:
groups:
- name: zeroclaw
rules:
- alert: HighErrorRate
expr: rate(zeroclaw_errors_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
- alert: HighMemoryUsage
expr: zeroclaw_memory_bytes > 500000000
for: 10m
labels:
severity: critical
annotations:
summary: "Memory usage above 500MB"
- alert: ProviderDown
expr: zeroclaw_provider_health == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Provider unreachable"
Notification Channels
Configure Alertmanager:
receivers:
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#zeroclaw-alerts'
- name: 'pagerduty'
pagerduty_configs:
- service_key: 'YOUR_SERVICE_KEY'
Cost Tracking
Monitor API costs:
Output:
Provider | Input Tokens | Output Tokens | Estimated Cost
------------|--------------|---------------|--------------
Anthropic | 1,234,567 | 456,789 | $12.34
OpenAI | 567,890 | 234,567 | $8.90
Cost metrics:
zeroclaw_cost_total{provider="anthropic"} 12.34
zeroclaw_cost_total{provider="openai"} 8.90
Query Latency
Track provider response times:
zeroclaw_provider_latency_seconds{provider="anthropic",quantile="0.5"} 0.234
zeroclaw_provider_latency_seconds{provider="anthropic",quantile="0.95"} 0.567
zeroclaw_provider_latency_seconds{provider="anthropic",quantile="0.99"} 0.890
Monitor tool performance:
zeroclaw_tool_duration_seconds{tool="shell"} histogram
zeroclaw_tool_duration_seconds{tool="file_read"} histogram