How it works? - Apache APISIX Prometheus

How it works? - Apache APISIX Prometheus

August 5, 2025

Overview

The Prometheus plugin in Apache APISIX provides comprehensive monitoring capabilities by collecting metrics about HTTP requests, system health, and upstream services. It exposes these metrics in Prometheus format for integration with monitoring and alerting systems.

Key Features

  • Request Metrics: HTTP status codes, latencies, bandwidth usage
  • System Metrics: NGINX connections, ETCD connectivity, shared dictionary usage
  • Upstream Metrics: Health check status of backend services
  • Custom Labels: Support for additional labels using NGINX variables
  • Configurable Export: Flexible endpoint configuration and metric prefixes

Architecture Diagram

  graph TB
    %% Client requests and initialization
    Client[Client Request] --> APISIX[APISIX Gateway]
    
    %% Plugin initialization
    subgraph "Plugin Initialization"
        Init[Plugin Init Phase] --> Schema[Load Schema & Config]
        Schema --> Metrics[Initialize Metrics Objects]
        Metrics --> Exporter[Setup Prometheus Exporter]
    end
    
    %% Request processing flow
    APISIX --> Router[Route Matching]
    Router --> Plugin[Prometheus Plugin]
    
    subgraph "Request Processing"
        Plugin --> Upstream[Forward to Upstream]
        Upstream --> Response[Receive Response]
        Response --> LogPhase[Log Phase]
    end
    
    %% Metrics collection during log phase
    subgraph "Metrics Collection (Log Phase)"
        LogPhase --> StatusMetric[HTTP Status Counter]
        LogPhase --> LatencyMetric[Latency Histogram]
        LogPhase --> BandwidthMetric[Bandwidth Counter]
        
        StatusMetric --> Labels1[Labels: code, route, service, consumer, node]
        LatencyMetric --> Labels2[Labels: type, route, service, consumer, node]
        BandwidthMetric --> Labels3[Labels: type, route, service, consumer, node]
    end
    
    %% System metrics collection
    subgraph "System Metrics Collection"
        SystemJob[Background Collection] --> NGINXStatus[NGINX Status]
        SystemJob --> ETCDStatus[ETCD Connectivity]
        SystemJob --> SharedDict[Shared Dict Usage]
        SystemJob --> UpstreamHealth[Upstream Health Status]
        SystemJob --> NodeInfo[Node Information]
    end
    
    %% Metrics exposure
    subgraph "Metrics Exposure"
        PrometheusEndpoint[Prometheus Metrics Endpoint] --> Collect[collect function]
        Collect --> SystemJob
        Collect --> FormatMetrics[Format Prometheus Text]
        FormatMetrics --> MetricsResponse[Return Metrics Data]
    end
    
    %% External monitoring
    PrometheusServer[Prometheus Server] --> PrometheusEndpoint
    Grafana[Grafana Dashboard] --> PrometheusServer
    AlertManager[Alert Manager] --> PrometheusServer
    
    %% Configuration
    subgraph "Configuration Options"
        Config[config.yaml] --> ExportURI[export_uri setting]
        Config --> MetricPrefix[metric_prefix setting]
        Config --> ExportServer[enable_export_server setting]
        Config --> ExtraLabels[extra_labels configuration]
        Config --> Buckets[default_buckets for histograms]
    end
    
    %% Styling with better contrast
    classDef clientStyle fill:#4CAF50,stroke:#333,stroke-width:2px,color:#fff
    classDef pluginStyle fill:#2196F3,stroke:#333,stroke-width:2px,color:#fff
    classDef metricsStyle fill:#FF9800,stroke:#333,stroke-width:2px,color:#fff
    classDef configStyle fill:#9C27B0,stroke:#333,stroke-width:2px,color:#fff
    classDef exposureStyle fill:#F44336,stroke:#333,stroke-width:2px,color:#fff
    
    class Client clientStyle
    class Plugin,LogPhase,StatusMetric,LatencyMetric,BandwidthMetric pluginStyle
    class SystemJob,NGINXStatus,ETCDStatus,SharedDict,UpstreamHealth,NodeInfo metricsStyle
    class Config,ExportURI,MetricPrefix,ExportServer,ExtraLabels,Buckets configStyle
    class PrometheusEndpoint,Collect,FormatMetrics,MetricsResponse,PrometheusServer,Grafana,AlertManager exposureStyle

Core Components

1. Plugin Structure

  • Main Plugin File: apisix/plugins/prometheus.lua
  • Exporter Module: apisix/plugins/prometheus/exporter.lua
  • Priority: 500 (medium priority in plugin execution order)
  • Execution Phase: Log phase (after response is sent)

2. Key Files Overview

FilePurpose
apisix/plugins/prometheus.luaMain plugin registration and schema
apisix/plugins/prometheus/exporter.luaMetrics collection and exposition logic
docs/en/latest/plugins/prometheus.mdDocumentation and examples

Step-by-Step Workflow

Step 1: Plugin Registration and Initialization

Location: apisix/plugins/prometheus.lua:33-40

local _M = {
    version = 0.2,
    priority = 500,
    name = plugin_name,
    log  = exporter.http_log,
    schema = schema,
    run_policy = "prefer_route",
}

What happens:

  1. Plugin registers with APISIX framework
  2. Sets priority to 500 (determines execution order)
  3. Defines log phase handler pointing to exporter.http_log
  4. Uses “prefer_route” policy (executes once per route match)

Step 2: Metrics Objects Creation

Location: apisix/plugins/prometheus/exporter.lua:112-210

Process:

  1. Initialize Prometheus Library: Creates prometheus instance with metric prefix
  2. Create Metric Objects: Defines counters, histograms, and gauges
  3. Set Default Buckets: Configures histogram buckets for latency measurements

Key Metrics Created:

-- Request metrics
metrics.status = prometheus:counter("http_status", ...)
metrics.latency = prometheus:histogram("http_latency", ...)  
metrics.bandwidth = prometheus:counter("bandwidth", ...)

-- System metrics
metrics.connections = prometheus:gauge("nginx_http_current_connections", ...)
metrics.etcd_reachable = prometheus:gauge("etcd_reachable", ...)
metrics.upstream_status = prometheus:gauge("upstream_status", ...)

Step 3: Request Processing Flow

Sequence:

  1. Client Request → APISIX receives HTTP request
  2. Route Matching → APISIX finds matching route configuration
  3. Plugin Loading → Prometheus plugin is loaded if enabled on route
  4. Upstream Forwarding → Request forwarded to backend service
  5. Response Processing → Response received and sent back to client
  6. Log Phase Execution → Prometheus plugin collects metrics

Step 4: Metrics Collection (Log Phase)

Location: apisix/plugins/prometheus/exporter.lua:237-296

Detailed Process:

4.1 Extract Request Context

function _M.http_log(conf, ctx)
    local vars = ctx.var
    local route_id = matched_route.id
    local service_id = matched_route.service_id  
    local consumer_name = ctx.consumer_name or ""
    local balancer_ip = ctx.balancer_ip or ""

4.2 Record HTTP Status Metrics

metrics.status:inc(1,
    gen_arr(vars.status, route_id, matched_uri, matched_host,
            service_id, consumer_name, balancer_ip,
            unpack(extra_labels("http_status", ctx))))

4.3 Record Latency Metrics

local latency, upstream_latency, apisix_latency = latency_details(ctx)

-- Total request latency
metrics.latency:observe(latency,
    gen_arr("request", route_id, service_id, consumer_name, balancer_ip, ...))

-- Upstream response time
metrics.latency:observe(upstream_latency,
    gen_arr("upstream", route_id, service_id, consumer_name, balancer_ip, ...))

-- APISIX processing time
metrics.latency:observe(apisix_latency,
    gen_arr("apisix", route_id, service_id, consumer_name, balancer_ip, ...))

4.4 Record Bandwidth Metrics

-- Ingress traffic (request size)
metrics.bandwidth:inc(vars.request_length,
    gen_arr("ingress", route_id, service_id, consumer_name, balancer_ip, ...))

-- Egress traffic (response size)  
metrics.bandwidth:inc(vars.bytes_sent,
    gen_arr("egress", route_id, service_id, consumer_name, balancer_ip, ...))

Step 5: System Metrics Collection

Location: apisix/plugins/prometheus/exporter.lua:441-503

Background Collection Process:

5.1 NGINX Status Collection

local function nginx_status()
    local res = ngx_capture("/apisix/nginx_status")
    -- Parses: Active connections, server accepts/handled/requests, Reading/Writing/Waiting
    for _, name in ipairs(ngx_status_items) do
        if name == "total" then
            metrics.requests:set(val[0])
        else
            metrics.connections:set(val[0], {name})
        end
    end
end

5.2 ETCD Connectivity Check

local function etcd_modify_index()
    -- Check routes, services, consumers, etc.
    local routes, routes_ver = get_routes()
    local services, services_ver = get_services()
    
    -- Set modify indexes for each resource type
    metrics.etcd_modify_indexes:set(max_idx, {key})
end

5.3 Shared Dictionary Monitoring

local function shared_dict_status()
    for shared_dict_name, shared_dict in pairs(ngx.shared) do
        metrics.shared_dict_capacity_bytes:set(shared_dict:capacity(), {shared_dict_name})
        metrics.shared_dict_free_space_bytes:set(shared_dict:free_space(), {shared_dict_name})
    end
end

5.4 Upstream Health Status

local stats = control.get_health_checkers()
for _, stat in ipairs(stats) do
    for _, node in ipairs(stat.nodes) do
        metrics.upstream_status:set(
            (node.status == "healthy" or node.status == "mostly_healthy") and 1 or 0,
            gen_arr(stat.name, node.ip, node.port)
        )
    end
end

Step 6: Metrics Exposition

Default Endpoint: http://127.0.0.1:9091/apisix/prometheus/metrics

Location: apisix/plugins/prometheus/exporter.lua:507-529

6.1 API Endpoint Definition

local function get_api(called_by_api_router)
    local export_uri = default_export_uri  -- "/apisix/prometheus/metrics"
    local attr = plugin.plugin_attr("prometheus")
    if attr and attr.export_uri then
        export_uri = attr.export_uri
    end

    local api = {
        methods = {"GET"},
        uri = export_uri,
        handler = collect  -- Main collection function
    }
    return {api}
end

6.2 Metrics Collection and Formatting

local function collect(ctx, stream_only)
    -- Collect all system metrics
    shared_dict_status()
    nginx_status()
    etcd_modify_index()
    
    -- Update upstream health metrics
    local stats = control.get_health_checkers()
    -- ... (update upstream_status metrics)
    
    -- Format and return metrics
    core.response.set_header("content_type", "text/plain")
    return 200, core.table.concat(prometheus:metric_data())
end

Step 7: External Integration

Prometheus Server Setup:

  1. Scrape Configuration: Add APISIX endpoint to prometheus.yml
  2. Data Collection: Prometheus scrapes metrics at regular intervals
  3. Storage: Metrics stored in Prometheus time-series database

Grafana Dashboard:

  1. Data Source: Configure Prometheus as data source
  2. Visualization: Create charts and graphs using collected metrics
  3. Alerting: Set up alerts based on metric thresholds

Code Deep Dive

Plugin Schema Definition

Location: apisix/plugins/prometheus.lua:22-30

local schema = {
    type = "object",
    properties = {
        prefer_name = {
            type = "boolean",
            default = false
        }
    },
}

Purpose:

  • prefer_name: When true, uses route/service names instead of IDs in metrics labels

Label Generation Logic

Location: apisix/plugins/prometheus/exporter.lua:76-97

local function extra_labels(name, ctx)
    clear_tab(extra_labels_tbl)
    local attr = plugin.plugin_attr("prometheus")
    local metrics = attr.metrics

    if metrics and metrics[name] and metrics[name].extra_labels then
        local labels = metrics[name].extra_labels
        for _, kv in ipairs(labels) do
            local val, v = next(kv)
            if ctx then
                val = ctx.var[v:sub(2)]  -- Extract NGINX variable
                if val == nil then
                    val = ""
                end
            end
            core.table.insert(extra_labels_tbl, val)
        end
    end
    return extra_labels_tbl
end

Purpose: Adds custom labels to metrics using NGINX variables

Latency Calculation

Location: Uses apisix.utils.log-util.latency_details_in_ms

local latency, upstream_latency, apisix_latency = latency_details(ctx)

Latency Types:

  • Request Latency: Total time from first byte read to last byte sent
  • Upstream Latency: Time waiting for upstream response
  • APISIX Latency: Processing time within APISIX (request - upstream)

Configuration Guide

Basic Configuration

File: conf/config.yaml

plugin_attr:
  prometheus:
    export_uri: /apisix/prometheus/metrics    # Metrics endpoint URI
    metric_prefix: apisix_                    # Prefix for all metrics
    enable_export_server: true               # Enable dedicated metrics server
    export_addr:                             # Metrics server address
      ip: 127.0.0.1                         # Bind IP
      port: 9091                             # Bind port

Advanced Configuration

plugin_attr:
  prometheus:
    metrics:                                 # Custom metric configurations
      http_status:                          # HTTP status metrics
        extra_labels:                       # Additional labels
          - upstream_addr: $upstream_addr   # Add upstream IP label
          - route_name: $route_name         # Add route name label
        expire: 0                           # Metric expiration (0 = never)
      http_latency:                         # Latency metrics
        extra_labels:
          - upstream_addr: $upstream_addr
        expire: 3600                        # Expire after 1 hour
      bandwidth:                            # Bandwidth metrics
        extra_labels:
          - upstream_addr: $upstream_addr
        expire: 0
    default_buckets:                        # Histogram buckets for latency
      - 1                                   # 1ms
      - 5                                   # 5ms
      - 10                                  # 10ms
      - 50                                  # 50ms
      - 100                                 # 100ms
      - 500                                 # 500ms
      - 1000                                # 1s
      - 5000                                # 5s
      - 10000                               # 10s

Route-Level Configuration

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "uri": "/api/*",
    "plugins": {
      "prometheus": {
        "prefer_name": true
      }
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Metrics Reference

Core HTTP Metrics

Metric NameTypeDescriptionLabels
apisix_http_statusCounterHTTP response status codescode, route, matched_uri, matched_host, service, consumer, node
apisix_http_latencyHistogramRequest latency in millisecondstype, route, service, consumer, node
apisix_bandwidthCounterTraffic in bytestype, route, service, consumer, node

System Metrics

Metric NameTypeDescriptionLabels
apisix_nginx_http_current_connectionsGaugeCurrent HTTP connectionsstate
apisix_http_requests_totalGaugeTotal HTTP requests-
apisix_etcd_reachableGaugeETCD connectivity status-
apisix_etcd_modify_indexesGaugeETCD modification indexeskey
apisix_shared_dict_capacity_bytesGaugeShared dictionary capacityname
apisix_shared_dict_free_space_bytesGaugeShared dictionary free spacename
apisix_upstream_statusGaugeUpstream node health statusname, ip, port
apisix_node_infoGaugeAPISIX node informationhostname, version

Stream Metrics (L4 Proxy)

Metric NameTypeDescriptionLabels
apisix_stream_connection_totalCounterTCP/UDP connections handledroute

Metric Label Details

HTTP Status Labels

  • code: HTTP response code (200, 404, 500, etc.)
  • route: Route ID or name (based on prefer_name setting)
  • matched_uri: URI pattern that matched the request
  • matched_host: Host header that matched the request
  • service: Service ID or name
  • consumer: Consumer name (if authenticated)
  • node: Upstream node IP address

Latency Type Labels

  • request: Total request processing time
  • upstream: Time waiting for upstream response
  • apisix: APISIX internal processing time

Bandwidth Type Labels

  • ingress: Incoming traffic (request size)
  • egress: Outgoing traffic (response size)

Hands-on Examples

Example 1: Basic Metrics Collection

Step 1: Enable prometheus plugin on a route

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "uri": "/get",
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Step 2: Send test requests

for i in {1..10}; do
  curl "http://127.0.0.1:9080/get"
done

Step 3: View metrics

curl "http://127.0.0.1:9091/apisix/prometheus/metrics"

Expected Output:

# HELP apisix_http_status HTTP status codes per service in APISIX
# TYPE apisix_http_status counter
apisix_http_status{code="200",route="1",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220"} 10

# HELP apisix_http_latency HTTP request latency in milliseconds per service in APISIX  
# TYPE apisix_http_latency histogram
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="1"} 0
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="2"} 0
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="5"} 0
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="10"} 2
apisix_http_latency_bucket{type="request",route="1",service="",consumer="",node="54.237.103.220",le="+Inf"} 10
apisix_http_latency_sum{type="request",route="1",service="",consumer="",node="54.237.103.220"} 847.234
apisix_http_latency_count{type="request",route="1",service="",consumer="",node="54.237.103.220"} 10

Example 2: Custom Labels Configuration

Step 1: Configure extra labels in config.yaml

plugin_attr:
  prometheus:
    metrics:
      http_status:
        extra_labels:
          - upstream_addr: $upstream_addr
          - route_name: $route_name

Step 2: Reload APISIX

apisix reload

Step 3: Create route with name

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "name": "test-api",
    "uri": "/get",
    "plugins": {
      "prometheus": {
        "prefer_name": true
      }
    },
    "upstream": {
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Step 4: Test and view metrics

curl "http://127.0.0.1:9080/get"
curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep http_status

Expected Output:

apisix_http_status{code="200",route="test-api",matched_uri="/get",matched_host="",service="",consumer="",node="54.237.103.220",upstream_addr="54.237.103.220:80",route_name="test-api"} 1

Example 3: Upstream Health Monitoring

Step 1: Create route with health checks

curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "uri": "/get",
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "type": "roundrobin",
      "nodes": {
        "httpbin.org:80": 1,
        "127.0.0.1:20001": 1
      },
      "checks": {
        "active": {
          "timeout": 5,
          "http_path": "/status",
          "healthy": {
            "interval": 2,
            "successes": 1
          },
          "unhealthy": {
            "interval": 1,
            "http_failures": 2
          }
        }
      }
    }
  }'

Step 2: Wait for health checks and view metrics

sleep 10
curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep upstream_status

Expected Output:

# HELP apisix_upstream_status upstream status from health check
# TYPE apisix_upstream_status gauge  
apisix_upstream_status{name="/apisix/routes/1",ip="54.237.103.220",port="80"} 1
apisix_upstream_status{name="/apisix/routes/1",ip="127.0.0.1",port="20001"} 0

Example 4: Stream (L4) Metrics

Step 1: Configure stream proxy in config.yaml

apisix:
  proxy_mode: http&stream
  stream_proxy:
    tcp:
      - 9100
    udp:  
      - 9200

stream_plugins:
  - prometheus

Step 2: Reload APISIX

apisix reload

Step 3: Create stream route

curl "http://127.0.0.1:9180/apisix/admin/stream_routes/1" -X PUT \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1" \
  -d '{
    "plugins": {
      "prometheus": {}
    },
    "upstream": {
      "type": "roundrobin", 
      "nodes": {
        "httpbin.org:80": 1
      }
    }
  }'

Step 4: Test stream connection

curl "http://127.0.0.1:9100"

Step 5: View stream metrics

curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep stream_connection

Expected Output:

# HELP apisix_stream_connection_total Total number of connections handled per Stream Route in APISIX
# TYPE apisix_stream_connection_total counter
apisix_stream_connection_total{route="1"} 1

Troubleshooting

Common Issues

1. Metrics Endpoint Not Accessible

Symptoms:

  • curl http://127.0.0.1:9091/apisix/prometheus/metrics returns connection refused
  • Prometheus cannot scrape APISIX metrics

Possible Causes:

  • Export server disabled in configuration
  • Wrong IP/port binding
  • Firewall blocking access

Solutions:

# Ensure export server is enabled
plugin_attr:
  prometheus:
    enable_export_server: true
    export_addr:
      ip: 0.0.0.0  # Bind to all interfaces for external access
      port: 9091

2. Missing Metrics Data

Symptoms:

  • Some metrics not appearing in output
  • Zero values for expected metrics

Possible Causes:

  • Plugin not enabled on routes
  • No traffic sent through routes
  • Metrics expired due to TTL

Solutions:

# Check if plugin is enabled on route
curl "http://127.0.0.1:9180/apisix/admin/routes/1" \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1"

# Send test traffic
curl "http://127.0.0.1:9080/your-endpoint"

# Check metric expiration settings

3. High Memory Usage

Symptoms:

  • APISIX consuming excessive memory
  • Prometheus metrics endpoint slow to respond

Possible Causes:

  • Too many unique label combinations
  • No metric expiration configured
  • High cardinality labels

Solutions:

# Configure metric expiration
plugin_attr:
  prometheus:
    metrics:
      http_status:
        expire: 3600  # Expire after 1 hour
      http_latency:
        expire: 3600
      bandwidth:
        expire: 3600

4. Incorrect Label Values

Symptoms:

  • Empty label values in metrics
  • Labels showing unexpected values

Possible Causes:

  • Incorrect NGINX variable names
  • Variables not available in context
  • Typos in configuration

Solutions:

# Use correct NGINX variable names
plugin_attr:
  prometheus:
    metrics:
      http_status:
        extra_labels:
          - upstream_addr: $upstream_addr  # Correct
          # - upstream_ip: $upstream_ip    # Incorrect variable name

Debugging Steps

1. Check Plugin Status

# Verify plugin is loaded
curl "http://127.0.0.1:9180/apisix/admin/plugins/prometheus" \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1"

2. Verify Configuration

# Check current configuration  
curl "http://127.0.0.1:9180/apisix/admin/routes" \
  -H "X-API-KEY: edd1c9f034335f136f87ad84b625c8f1"

3. Test Metrics Endpoint

# Test basic connectivity
curl -v "http://127.0.0.1:9091/apisix/prometheus/metrics"

# Check specific metrics
curl "http://127.0.0.1:9091/apisix/prometheus/metrics" | grep -E "(http_status|http_latency|bandwidth)"

4. Enable Debug Logging

# In config.yaml
nginx_config:
  error_log_level: debug

Performance Considerations

1. Label Cardinality

  • Problem: Too many unique label combinations cause memory issues
  • Solution: Limit dynamic labels, use metric expiration

2. Collection Frequency

  • Problem: High-frequency metric collection impacts performance
  • Solution: Adjust Prometheus scrape interval

3. Metric Storage

  • Problem: Large number of metrics stored in memory
  • Solution: Configure appropriate retention and expiration

Best Practices

1. Configuration Best Practices

Use Metric Expiration

plugin_attr:
  prometheus:
    metrics:
      http_status:
        expire: 3600  # 1 hour
      http_latency:
        expire: 3600
      bandwidth:
        expire: 3600

Limit Label Cardinality

# Good: Limited, predictable labels
extra_labels:
  - upstream_addr: $upstream_addr
  - route_name: $route_name

# Bad: High cardinality labels  
# - request_id: $request_id    # Unique per request
# - timestamp: $time_iso8601   # Always changing

Use Appropriate Histogram Buckets

plugin_attr:
  prometheus:
    default_buckets: [1, 5, 10, 50, 100, 500, 1000, 5000, 10000]  # Milliseconds

2. Monitoring Best Practices

Set Up Proper Alerting

# Example Prometheus alerting rules
groups:
- name: apisix
  rules:
  - alert: HighErrorRate
    expr: rate(apisix_http_status{code=~"5.."}[5m]) > 0.1
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High error rate on APISIX"
      
  - alert: HighLatency
    expr: histogram_quantile(0.95, rate(apisix_http_latency_bucket{type="request"}[5m])) > 1000
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High latency on APISIX"

Create Meaningful Dashboards

  • Request Rate: Requests per second by route/service
  • Error Rate: Error percentage by route/service
  • Latency: P50, P95, P99 latencies
  • Upstream Health: Health status of backend services
  • System Health: NGINX connections, ETCD connectivity

3. Security Best Practices

Restrict Metrics Access

# Use firewall or proxy to restrict access
# Only allow Prometheus server to access metrics endpoint

# Example nginx config for metrics endpoint
location /apisix/prometheus/metrics {
    allow 10.0.0.0/8;      # Internal network
    allow 172.16.0.0/12;   # Internal network  
    allow 192.168.0.0/16;  # Internal network
    deny all;
}

Avoid Sensitive Data in Labels

# Good: Non-sensitive labels
extra_labels:
  - route_name: $route_name
  - service_name: $service_name

# Bad: Potentially sensitive labels
# - authorization: $http_authorization
# - user_id: $arg_user_id

4. Performance Optimization

Optimize Scrape Intervals

# prometheus.yml
scrape_configs:
- job_name: 'apisix'
  static_configs:
  - targets: ['127.0.0.1:9091']
  scrape_interval: 15s      # Balance between data resolution and load
  scrape_timeout: 10s

Use Appropriate Metric Types

  • Counters: For values that only increase (requests, errors, bytes)
  • Gauges: For values that can go up/down (connections, memory usage)
  • Histograms: For distributions (latency, response size)

Monitor Resource Usage

# Monitor APISIX memory usage
ps aux | grep apisix

# Monitor metrics endpoint response time
time curl "http://127.0.0.1:9091/apisix/prometheus/metrics" > /dev/null

This comprehensive guide provides everything you need to understand and implement the Prometheus plugin in Apache APISIX. Start with the basic examples and gradually move to more advanced configurations based on your monitoring requirements.