How it works? - Apache APISIX Prometheus

August 5, 2025

Overview

The Prometheus plugin in Apache APISIX provides comprehensive monitoring capabilities by collecting metrics about HTTP requests, system health, and upstream services. It exposes these metrics in Prometheus format for integration with monitoring and alerting systems.

Key Features

Request Metrics: HTTP status codes, latencies, bandwidth usage
System Metrics: NGINX connections, ETCD connectivity, shared dictionary usage
Upstream Metrics: Health check status of backend services
Custom Labels: Support for additional labels using NGINX variables
Configurable Export: Flexible endpoint configuration and metric prefixes

Architecture Diagram

  graph TB
    %% Client requests and initialization
    Client[Client Request] --> APISIX[APISIX Gateway]
    
    %% Plugin initialization
    subgraph "Plugin Initialization"
        Init[Plugin Init Phase] --> Schema[Load Schema & Config]
        Schema --> Metrics[Initialize Metrics Objects]
        Metrics --> Exporter[Setup Prometheus Exporter]
    end
    
    %% Request processing flow
    APISIX --> Router[Route Matching]
    Router --> Plugin[Prometheus Plugin]
    
    subgraph "Request Processing"
        Plugin --> Upstream[Forward to Upstream]
        Upstream --> Response[Receive Response]
        Response --> LogPhase[Log Phase]
    end
    
    %% Metrics collection during log phase
    subgraph "Metrics Collection (Log Phase)"
        LogPhase --> StatusMetric[HTTP Status Counter]
        LogPhase --> LatencyMetric[Latency Histogram]
        LogPhase --> BandwidthMetric[Bandwidth Counter]
        
        StatusMetric --> Labels1[Labels: code, route, service, consumer, node]
        LatencyMetric --> Labels2[Labels: type, route, service, consumer, node]
        BandwidthMetric --> Labels3[Labels: type, route, service, consumer, node]
    end
    
    %% System metrics collection
    subgraph "System Metrics Collection"
        SystemJob[Background Collection] --> NGINXStatus[NGINX Status]
        SystemJob --> ETCDStatus[ETCD Connectivity]
        SystemJob --> SharedDict[Shared Dict Usage]
        SystemJob --> UpstreamHealth[Upstream Health Status]
        SystemJob --> NodeInfo[Node Information]
    end
    
    %% Metrics exposure
    subgraph "Metrics Exposure"
        PrometheusEndpoint[Prometheus Metrics Endpoint] --> Collect[collect function]
        Collect --> SystemJob
        Collect --> FormatMetrics[Format Prometheus Text]
        FormatMetrics --> MetricsResponse[Return Metrics Data]
    end
    
    %% External monitoring
    PrometheusServer[Prometheus Server] --> PrometheusEndpoint
    Grafana[Grafana Dashboard] --> PrometheusServer
    AlertManager[Alert Manager] --> PrometheusServer
    
    %% Configuration
    subgraph "Configuration Options"
        Config[config.yaml] --> ExportURI[export_uri setting]
        Config --> MetricPrefix[metric_prefix setting]
        Config --> ExportServer[enable_export_server setting]
        Config --> ExtraLabels[extra_labels configuration]
        Config --> Buckets[default_buckets for histograms]
    end
    
    %% Styling with better contrast
    classDef clientStyle fill:#4CAF50,stroke:#333,stroke-width:2px,color:#fff
    classDef pluginStyle fill:#2196F3,stroke:#333,stroke-width:2px,color:#fff
    classDef metricsStyle fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff
    classDef configStyle fill:#9C27B0,stroke:#333,stroke-width:2px,color:#fff
    classDef exposureStyle fill:#F44336,stroke:#333,stroke-width:2px,color:#fff
    
    class Client clientStyle
    class Plugin,LogPhase,StatusMetric,LatencyMetric,BandwidthMetric pluginStyle
    class SystemJob,NGINXStatus,ETCDStatus,SharedDict,UpstreamHealth,NodeInfo metricsStyle
    class Config,ExportURI,MetricPrefix,ExportServer,ExtraLabels,Buckets configStyle
    class PrometheusEndpoint,Collect,FormatMetrics,MetricsResponse,PrometheusServer,Grafana,AlertManager exposureStyle

Core Components

1. Plugin Structure

Main Plugin File: apisix/plugins/prometheus.lua
Exporter Module: apisix/plugins/prometheus/exporter.lua
Priority: 500 (medium priority in plugin execution order)
Execution Phase: Log phase (after response is sent)

2. Key Files Overview

File	Purpose
`apisix/plugins/prometheus.lua`	Main plugin registration and schema
`apisix/plugins/prometheus/exporter.lua`	Metrics collection and exposition logic
`docs/en/latest/plugins/prometheus.md`	Documentation and examples

Step-by-Step Workflow

Step 1: Plugin Registration and Initialization

Location: apisix/plugins/prometheus.lua:33-40

local _M = {
    version = 0.2,
    priority = 500,
    name = plugin_name,
    log  = exporter.http_log,
    schema = schema,
    run_policy = "prefer_route",
}

What happens:

Plugin registers with APISIX framework
Sets priority to 500 (determines execution order)
Defines log phase handler pointing to exporter.http_log
Uses “prefer_route” policy (executes once per route match)

Step 2: Metrics Objects Creation

Location: apisix/plugins/prometheus/exporter.lua:112-210

Process:

Initialize Prometheus Library: Creates prometheus instance with metric prefix
Create Metric Objects: Defines counters, histograms, and gauges
Set Default Buckets: Configures histogram buckets for latency measurements

Key Metrics Created:

-- Request metrics
metrics.status = prometheus:counter("http_status", ...)
metrics.latency = prometheus:histogram("http_latency", ...)  
metrics.bandwidth = prometheus:counter("bandwidth", ...)

-- System metrics
metrics.connections = prometheus:gauge("nginx_http_current_connections", ...)
metrics.etcd_reachable = prometheus:gauge("etcd_reachable", ...)
metrics.upstream_status = prometheus:gauge("upstream_status", ...)

Step 3: Request Processing Flow

Sequence:

Client Request → APISIX receives HTTP request
Route Matching → APISIX finds matching route configuration
Plugin Loading → Prometheus plugin is loaded if enabled on route
Upstream Forwarding → Request forwarded to backend service
Response Processing → Response received and sent back to client
Log Phase Execution → Prometheus plugin collects metrics

Step 4: Metrics Collection (Log Phase)

Location: apisix/plugins/prometheus/exporter.lua:237-296

Detailed Process:

4.1 Extract Request Context

function _M.http_log(conf, ctx)
    local vars = ctx.var
    local route_id = matched_route.id
    local service_id = matched_route.service_id  
    local consumer_name = ctx.consumer_name or ""
    local balancer_ip = ctx.balancer_ip or ""

4.2 Record HTTP Status Metrics

metrics.status:inc(1,
    gen_arr(vars.status, route_id, matched_uri, matched_host,
            service_id, consumer_name, balancer_ip,
            unpack(extra_labels("http_status", ctx))))

4.3 Record Latency Metrics

local latency, upstream_latency, apisix_latency = latency_details(ctx)

-- Total request latency
metrics.latency:observe(latency,
    gen_arr("request", route_id, service_id, consumer_name, balancer_ip, ...))

-- Upstream response time
metrics.latency:observe(upstream_latency,
    gen_arr("upstream", route_id, service_id, consumer_name, balancer_ip, ...))

-- APISIX processing time
metrics.latency:observe(apisix_latency,
    gen_arr("apisix", route_id, service_id, consumer_name, balancer_ip, ...))

4.4 Record Bandwidth Metrics

-- Ingress traffic (request size)
metrics.bandwidth:inc(vars.request_length,
    gen_arr("ingress", route_id, service_id, consumer_name, balancer_ip, ...))

-- Egress traffic (response size)  
metrics.bandwidth:inc(vars.bytes_sent,
    gen_arr("egress", route_id, service_id, consumer_name, balancer_ip, ...))

Step 5: System Metrics Collection

Location: apisix/plugins/prometheus/exporter.lua:441-503

Background Collection Process:

5.1 NGINX Status Collection

local function nginx_status()
    local res = ngx_capture("/apisix/nginx_status")
    -- Parses: Active connections, server accepts/handled/requests, Reading/Writing/Waiting
    for _, name in ipairs(ngx_status_items) do
        if name == "total" then
            metrics.requests:set(val[0])
        else
            metrics.connections:set(val[0], {name})
        end
    end
end

5.2 ETCD Connectivity Check

local function etcd_modify_index()
    -- Check routes, services, consumers, etc.
    local routes, routes_ver = get_routes()
    local services, services_ver = get_services()
    
    -- Set modify indexes for each resource type
    metrics.etcd_modify_indexes:set(max_idx, {key})
end

5.3 Shared Dictionary Monitoring

local function shared_dict_status()
    for shared_dict_name, shared_dict in pairs(ngx.shared) do
        metrics.shared_dict_capacity_bytes:set(shared_dict:capacity(), {shared_dict_name})
        metrics.shared_dict_free_space_bytes:set(shared_dict:free_space(), {shared_dict_name})
    end
end

5.4 Upstream Health Status

local stats = control.get_health_checkers()
for _, stat in ipairs(stats) do
    for _, node in ipairs(stat.nodes) do
        metrics.upstream_status:set(
            (node.status == "healthy" or node.status == "mostly_healthy") and 1 or 0,
            gen_arr(stat.name, node.ip, node.port)
        )
    end
end

Step 6: Metrics Exposition

Default Endpoint: http://127.0.0.1:9091/apisix/prometheus/metrics

Location: apisix/plugins/prometheus/exporter.lua:507-529

6.1 API Endpoint Definition

local function get_api(called_by_api_router)
    local export_uri = default_export_uri  -- "/apisix/prometheus/metrics"
    local attr = plugin.plugin_attr("prometheus")
    if attr and attr.export_uri then
        export_uri = attr.export_uri
    end

    local api = {
        methods = {"GET"},
        uri = export_uri,
        handler = collect  -- Main collection function
    }
    return {api}
end

6.2 Metrics Collection and Formatting

local function collect(ctx, stream_only)
    -- Collect all system metrics
    shared_dict_status()
    nginx_status()
    etcd_modify_index()
    
    -- Update upstream health metrics
    local stats = control.get_health_checkers()
    -- ... (update upstream_status metrics)
    
    -- Format and return metrics
    core.response.set_header("content_type", "text/plain")
    return 200, core.table.concat(prometheus:metric_data())
end

Step 7: External Integration

Prometheus Server Setup:

Scrape Configuration: Add APISIX endpoint to prometheus.yml
Data Collection: Prometheus scrapes metrics at regular intervals
Storage: Metrics stored in Prometheus time-series database

Grafana Dashboard:

Data Source: Configure Prometheus as data source
Visualization: Create charts and graphs using collected metrics
Alerting: Set up alerts based on metric thresholds

Code Deep Dive

Plugin Schema Definition

Location: apisix/plugins/prometheus.lua:22-30

local schema = {
    type = "object",
    properties = {
        prefer_name = {
            type = "boolean",
            default = false
        }
    },
}

Purpose:

prefer_name: When true, uses route/service names instead of IDs in metrics labels

Label Generation Logic

Location: apisix/plugins/prometheus/exporter.lua:76-97

local function extra_labels(name, ctx)
    clear_tab(extra_labels_tbl)
    local attr = plugin.plugin_attr("prometheus")
    local metrics = attr.metrics

    if metrics and metrics[name] and metrics[name].extra_labels then
        local labels = metrics[name].extra_labels
        for _, kv in ipairs(labels) do
            local val, v = next(kv)
            if ctx then
                val = ctx.var[v:sub(2)]  -- Extract NGINX variable
                if val == nil then
                    val = ""
                end
            end
            core.table.insert(extra_labels_tbl, val)
        end
    end
    return extra_labels_tbl
end

Purpose: Adds custom labels to metrics using NGINX variables

Latency Calculation

Location: Uses apisix.utils.log-util.latency_details_in_ms

local latency, upstream_latency, apisix_latency = latency_details(ctx)

Latency Types:

Request Latency: Total time from first byte read to last byte sent
Upstream Latency: Time waiting for upstream response
APISIX Latency: Processing time within APISIX (request - upstream)

Configuration Guide

Basic Configuration

File: conf/config.yaml

plugin_attr:
  prometheus:
    export_uri: /apisix/prometheus/metrics    # Metrics endpoint URI
    metric_prefix: apisix_                    # Prefix for all metrics
    enable_export_server: true               # Enable dedicated metrics server
    export_addr:                             # Metrics server address
      ip: 127.0.0.1                         # Bind IP
      port: 9091                             # Bind port

Advanced Configuration

plugin_attr:
  prometheus:
    metrics:                                 # Custom metric configurations
      http_status:                          # HTTP status metrics
        extra_labels:                       # Additional labels
          - upstream_addr: $upstream_addr   # Add upstream IP label
          - route_name: $route_name         # Add route name label
        expire: 0                           # Metric expiration (0 = never)
      http_latency:                         # Latency metrics
        extra_labels:
          - upstream_addr: $upstream_addr
        expire: 3600                        # Expire after 1 hour
      bandwidth:                            # Bandwidth metrics
        extra_labels:
          - upstream_addr: $upstream_addr
        expire: 0
    default_buckets:                        # Histogram buckets for latency
      - 1                                   # 1ms
      - 5                                   # 5ms
      - 10                                  # 10ms
      - 50                                  # 50ms
      - 100                                 # 100ms
      - 500                                 # 500ms
      - 1000                                # 1s
      - 5000                                # 5s
      - 10000                               # 10s

Route-Level Configuration

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "uri": "/api/*",
    "plugins": {
      "prometheus": {
        "prefer_name": true
      }
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Metrics Reference

Core HTTP Metrics

Metric Name	Type	Description	Labels
`apisix_http_status`	Counter	HTTP response status codes	code, route, matched_uri, matched_host, service, consumer, node
`apisix_http_latency`	Histogram	Request latency in milliseconds	type, route, service, consumer, node
`apisix_bandwidth`	Counter	Traffic in bytes	type, route, service, consumer, node

System Metrics

Metric Name	Type	Description	Labels
`apisix_nginx_http_current_connections`	Gauge	Current HTTP connections	state
`apisix_http_requests_total`	Gauge	Total HTTP requests	-
`apisix_etcd_reachable`	Gauge	ETCD connectivity status	-
`apisix_etcd_modify_indexes`	Gauge	ETCD modification indexes	key
`apisix_shared_dict_capacity_bytes`	Gauge	Shared dictionary capacity	name
`apisix_shared_dict_free_space_bytes`	Gauge	Shared dictionary free space	name
`apisix_upstream_status`	Gauge	Upstream node health status	name, ip, port
`apisix_node_info`	Gauge	APISIX node information	hostname, version

Stream Metrics (L4 Proxy)

Metric Name	Type	Description	Labels
`apisix_stream_connection_total`	Counter	TCP/UDP connections handled	route

Metric Label Details

HTTP Status Labels

code: HTTP response code (200, 404, 500, etc.)
route: Route ID or name (based on prefer_name setting)
matched_uri: URI pattern that matched the request
matched_host: Host header that matched the request
service: Service ID or name
consumer: Consumer name (if authenticated)
node: Upstream node IP address

Latency Type Labels

request: Total request processing time
upstream: Time waiting for upstream response
apisix: APISIX internal processing time

Bandwidth Type Labels

ingress: Incoming traffic (request size)
egress: Outgoing traffic (response size)

Hands-on Examples

Example 1: Basic Metrics Collection

Step 1: Enable prometheus plugin on a route

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "uri": "/get",
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Step 2: Send test requests

for i in {1..10}; do
  curl "http://127.0.0.1:9080/get"
done

Step 3: View metrics

curl "http://127.0.0.1:9091/apisix/prometheus/metrics"

Expected Output:

# HELP apisix_http_status HTTP status codes per service in APISIX
# TYPE apisix_http_status counter
apisix_http_status{code="200",route="1",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220"} 10

# HELP apisix_http_latency HTTP request latency in milliseconds per service in APISIX  
# TYPE apisix_http_latency histogram
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="1"} 0
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="2"} 0
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="5"} 0
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="10"} 2
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="+Inf"} 10
apisix_http_latency_sum{type="request",route="1",service="",consumer="",node="54.237.103.220"} 847.234
apisix_http_latency_count{type="request",route="1",service="",consumer="",node="54.237.103.220"} 10

Example 2: Custom Labels Configuration

Step 1: Configure extra labels in config.yaml

plugin_attr:
  prometheus:
    metrics:
      http_status:
        extra_labels:
          - upstream_addr: $upstream_addr
          - route_name: $route_name

Step 2: Reload APISIX

apisix reload

Step 3: Create route with name

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "name": "test-api",
    "uri": "/get",
    "plugins": {
      "prometheus": {
        "prefer_name": true
      }
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Step 4: Test and view metrics

curl "http://127.0.0.1:9080/get"
curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep http_status

Expected Output:

apisix_http_status{code="200",route="test-api",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220",upstream_addr="54.237.103.220:80",route_name="test-api"} 1

Example 3: Upstream Health Monitoring

Step 1: Create route with health checks

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "uri": "/get",
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org:80": 1,
        "127.0.0.1:20001": 1
      },
      "checks": {
        "active": {
          "timeout": 5,
          "http_path": "/status",
          "healthy": {
            "interval": 2,
            "successes": 1
          },
          "unhealthy": {
            "interval": 1,
            "http_failures": 2
          }
        }
      }
    }
  }'

Step 2: Wait for health checks and view metrics

sleep 10
curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep upstream_status

Expected Output:

# HELP apisix_upstream_status upstream status from health check
# TYPE apisix_upstream_status gauge  
apisix_upstream_status{name="/apisix/routes/1",ip="54.237.103.220",port="80"} 1
apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="20001"} 0

Example 4: Stream (L4) Metrics

Step 1: Configure stream proxy in config.yaml

apisix:
  proxy_mode: http&stream
  stream_proxy:
    tcp:
      - 9100
    udp:  
      - 9200

stream_plugins:
  - prometheus

Step 2: Reload APISIX

apisix reload

Step 3: Create stream route

curl "http://127.0.0.1:9180/apisix/admin/stream_routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "type": "roundrobin", 
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Step 4: Test stream connection

curl "http://127.0.0.1:9100"

Step 5: View stream metrics

curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep stream_connection

Expected Output:

# HELP apisix_stream_connection_total Total number of connections handled per Stream Route in APISIX
# TYPE apisix_stream_connection_total counter
apisix_stream_connection_total{route="1"} 1

Troubleshooting

Common Issues

1. Metrics Endpoint Not Accessible

Symptoms:

curl http://127.0.0.1:9091/apisix/prometheus/metrics returns connection refused
Prometheus cannot scrape APISIX metrics

Possible Causes:

Export server disabled in configuration
Wrong IP/port binding
Firewall blocking access

Solutions:

# Ensure export server is enabled
plugin_attr:
  prometheus:
    enable_export_server: true
    export_addr:
      ip: 0.0.0.0  # Bind to all interfaces for external access
      port: 9091

2. Missing Metrics Data

Symptoms:

Some metrics not appearing in output
Zero values for expected metrics

Possible Causes:

Plugin not enabled on routes
No traffic sent through routes
Metrics expired due to TTL

Solutions:

# Check if plugin is enabled on route
curl "http://127.0.0.1:9180/apisix/admin/routes/1" \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1"

# Send test traffic
curl "http://127.0.0.1:9080/your-endpoint"

# Check metric expiration settings

3. High Memory Usage

Symptoms:

APISIX consuming excessive memory
Prometheus metrics endpoint slow to respond

Possible Causes:

Too many unique label combinations
No metric expiration configured
High cardinality labels

Solutions:

# Configure metric expiration
plugin_attr:
  prometheus:
    metrics:
      http_status:
        expire: 3600  # Expire after 1 hour
      http_latency:
        expire: 3600
      bandwidth:
        expire: 3600

4. Incorrect Label Values

Symptoms:

Empty label values in metrics
Labels showing unexpected values

Possible Causes:

Incorrect NGINX variable names
Variables not available in context
Typos in configuration

Solutions:

# Use correct NGINX variable names
plugin_attr:
  prometheus:
    metrics:
      http_status:
        extra_labels:
          - upstream_addr: $upstream_addr  # Correct
          # - upstream_ip: $upstream_ip    # Incorrect variable name

Debugging Steps

1. Check Plugin Status

# Verify plugin is loaded
curl "http://127.0.0.1:9180/apisix/admin/plugins/prometheus" \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1"

2. Verify Configuration

# Check current configuration  
curl "http://127.0.0.1:9180/apisix/admin/routes" \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1"

3. Test Metrics Endpoint

# Test basic connectivity
curl -v "http://127.0.0.1:9091/apisix/prometheus/metrics"

# Check specific metrics
curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep -E "(http_status|http_latency|bandwidth)"

4. Enable Debug Logging

# In config.yaml
nginx_config:
  error_log_level: debug

Performance Considerations

1. Label Cardinality

Problem: Too many unique label combinations cause memory issues
Solution: Limit dynamic labels, use metric expiration

2. Collection Frequency

Problem: High-frequency metric collection impacts performance
Solution: Adjust Prometheus scrape interval

3. Metric Storage

Problem: Large number of metrics stored in memory
Solution: Configure appropriate retention and expiration

Best Practices

1. Configuration Best Practices

Use Metric Expiration

plugin_attr:
  prometheus:
    metrics:
      http_status:
        expire: 3600  # 1 hour
      http_latency:
        expire: 3600
      bandwidth:
        expire: 3600

Limit Label Cardinality

# Good: Limited, predictable labels
extra_labels:
  - upstream_addr: $upstream_addr
  - route_name: $route_name

# Bad: High cardinality labels  
# - request_id: $request_id    # Unique per request
# - timestamp: $time_iso8601   # Always changing

Use Appropriate Histogram Buckets

plugin_attr:
  prometheus:
    default_buckets: [1, 5, 10, 50, 100, 500, 1000, 5000, 10000]  # Milliseconds

2. Monitoring Best Practices

Set Up Proper Alerting

# Example Prometheus alerting rules
groups:
- name: apisix
  rules:
  - alert: HighErrorRate
    expr: rate(apisix_http_status{code=~"5.."}[5m]) > 0.1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High error rate on APISIX"
      
  - alert: HighLatency
    expr: histogram_quantile(0.95, rate(apisix_http_latency_bucket{type="request"}[5m])) > 1000
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High latency on APISIX"

Create Meaningful Dashboards

Request Rate: Requests per second by route/service
Error Rate: Error percentage by route/service
Latency: P50, P95, P99 latencies
Upstream Health: Health status of backend services
System Health: NGINX connections, ETCD connectivity

3. Security Best Practices

Restrict Metrics Access

# Use firewall or proxy to restrict access
# Only allow Prometheus server to access metrics endpoint

# Example nginx config for metrics endpoint
location /apisix/prometheus/metrics {
    allow 10.0.0.0/8;      # Internal network
    allow 172.16.0.0/12;   # Internal network  
    allow 192.168.0.0/16;  # Internal network
    deny all;
}

Avoid Sensitive Data in Labels

# Good: Non-sensitive labels
extra_labels:
  - route_name: $route_name
  - service_name: $service_name

# Bad: Potentially sensitive labels
# - authorization: $http_authorization
# - user_id: $arg_user_id

4. Performance Optimization

Optimize Scrape Intervals

# prometheus.yml
scrape_configs:
- job_name: 'apisix'
  static_configs:
  - targets: ['127.0.0.1:9091']
  scrape_interval: 15s      # Balance between data resolution and load
  scrape_timeout: 10s

Use Appropriate Metric Types

Counters: For values that only increase (requests, errors, bytes)
Gauges: For values that can go up/down (connections, memory usage)
Histograms: For distributions (latency, response size)

Monitor Resource Usage

# Monitor APISIX memory usage
ps aux | grep apisix

# Monitor metrics endpoint response time
time curl "http://127.0.0.1:9091/apisix/prometheus/metrics" > /dev/null

This comprehensive guide provides everything you need to understand and implement the Prometheus plugin in Apache APISIX. Start with the basic examples and gradually move to more advanced configurations based on your monitoring requirements.

An Engineer’s Field Guide to Technical Writing