MicrosoftWindows 11Windows Server 2022Windows Server 2025

DNS Security Monitoring with Grafana

19views

You know that sinking feeling when you realize something’s wrong with your network? Maybe users can’t reach certain sites, or worse they’re being redirected to suspicious domains without anyone noticing. That’s where DNS security monitoring comes in, and honestly, it’s one of those things you don’t appreciate until you really need it.

Let me walk you through how DNS security monitoring works with Grafana, why it matters more than ever in 2026, and how you can start protecting your network today.

Why DNS Security Matters More Than Ever

Here’s something that might surprise you: DNS threats increased by 30% between October 2024 and September 2025 according to recent industry data. The average internet user encounters 66 threats per day, up from 29 previously. That’s not just a statistic—that’s potentially 66 opportunities for attackers to compromise your network every single day.

The Domain Name System wasn’t designed with security in mind. Back when it was created, the internet was a much smaller, more trusted place. Fast forward to 2026, and DNS has become one of the most exploited protocols in cybersecurity. Why? Because it’s trusted by default. Firewalls let DNS traffic through, security teams often don’t monitor it closely, and attackers know this.

The DNS Threat Landscape in 2026

Let’s talk about what you’re actually up against. The threats have evolved significantly, and they’re getting more sophisticated every year.

DNS Tunneling: The Silent Data Thief

Imagine a hacker creating a secret tunnel through your network using something as innocent-looking as DNS queries. That’s DNS tunneling, and it’s terrifyingly effective. Recent research shows detection systems can achieve accuracy rates of 99.82% using machine learning, but only if you’re actually looking for it.

DNS tunneling works by encoding data—commands, stolen files, anything really—into DNS queries and responses. It looks like normal DNS traffic to most monitoring tools, which is exactly the point. Attackers use it for data exfiltration, command and control, and bypassing your security controls.

The telltale signs? Look for unusually long domain names (sometimes hitting that 255-character limit), high-entropy random-looking strings in subdomains, and excessive query volumes to single domains. Organizations should monitor for high query volumes and frequent queries to single domains as indicators.

DNS Cache Poisoning and Hijacking

This one’s particularly nasty. DNS cache poisoning involves injecting false information into DNS caches, redirecting users to malicious sites without them knowing. You think you’re going to your bank’s website, but you’re actually headed to a phishing site that looks identical.

DNS hijacking takes it further by actually compromising DNS records. Advanced persistent threats and botnets often manifest through unusual domain blocking frequencies, which is why continuous monitoring is critical.

Enter Grafana: Your DNS Security Command Center

So how do you fight back? This is where Grafana comes into play as your visualization and monitoring platform. Think of Grafana as your security dashboard—a single pane of glass where you can see everything happening in your DNS infrastructure in real-time.

Grafana doesn’t do the heavy lifting alone. It connects to data sources like Prometheus for metrics and Loki for logs, creating a powerful monitoring stack that gives you eyes on your DNS traffic.

What Makes Grafana Perfect for DNS Security?

Real-Time Visualization: Threats don’t wait, and neither should your monitoring. Grafana updates dashboards in real-time, so you see attacks as they happen, not hours later when reviewing logs.

Flexible Data Sources: Grafana connects to Prometheus for time-series metrics, Loki for log aggregation, and even specialized DNS monitoring tools. You’re not locked into a single vendor or data format.

Custom Dashboards: Every network is different. Grafana lets you build dashboards that show exactly what matters to your environment—whether that’s query volumes, response times, or threat indicators.

Alert Integration: When Grafana detects something suspicious, it can alert your team via Slack, email, PagerDuty, or whatever notification system you use. No more staring at dashboards 24/7.

The Bottom Line

DNS security monitoring isn’t optional anymore. With threats increasing 30% year-over-year and attackers using AI to automate attacks, you need visibility into your DNS traffic. Grafana provides that visibility in a flexible, powerful platform that scales from small homelabs to enterprise networks.

The best part? You don’t need a massive security team or unlimited budget to get started. With open-source tools like Grafana, Prometheus, and Loki, you can build enterprise-grade DNS security monitoring on a shoestring budget.

Your DNS infrastructure is the foundation of your network. Monitoring it isn’t just about detecting attacks—it’s about understanding your network, spotting issues before they become problems, and having the confidence that when something goes wrong, you’ll know about it immediately.

Tutorial: DNS Security Monitoring with Grafana – Complete Setup Guide

Welcome to this hands-on tutorial! We ‘ll build a complete DNS security monitoring system using Grafana, Prometheus, and Loki. By the end, you’ll have real-time dashboards showing DNS threats, automated alerts, and a solid foundation for protecting your network.

What You’ll Build

By the end of this tutorial, you’ll have:

  • ✅ A Grafana dashboard displaying DNS security metrics in real-time
  • ✅ Prometheus collecting DNS metrics every 5 minutes
  • ✅ Loki aggregating DNS logs for threat analysis
  • ✅ Automated alerts for suspicious DNS activity
  • ✅ Threat detection panels for tunneling, cache poisoning, and DDoS

Prerequisites

Required:

  • A DNS server (Unbound, BIND, or CoreDNS)
  • Linux system with sudo access (this tutorial uses Ubuntu/Debian)
  • 2GB+ RAM available
  • 10GB+ free disk space
  • Basic command line knowledge

Architecture Overview

Before we dive in, here’s what we’re building:

DNS Server (Unbound)
    ↓
    ├→ Metrics → Unbound Exporter → Prometheus → Grafana
    └→ Logs → Promtail → Loki → Grafana
  • DNS Server generates queries and logs
  • Unbound Exporter collects metrics (queries/sec, cache stats, etc.)
  • Prometheus stores time-series metrics
  • Promtail ships logs to Loki
  • Loki aggregates and indexes logs
  • Grafana visualizes everything in dashboards

Part 1: Install and Configure Grafana

Let’s start with our visualization platform.

Step 1.1: Install Grafana

bash

# Update system packages
sudo apt update

# Install required dependency
sudo apt install -y musl

# Download Grafana OSS (adjust for your architecture)
# For ARM64 (Raspberry Pi 4, etc.):
wget https://dl.grafana.com/oss/release/grafana_11.1.0_arm64.deb

# For AMD64/x86_64:
# wget https://dl.grafana.com/oss/release/grafana_11.1.0_amd64.deb

# Install Grafana
sudo dpkg -i grafana_11.1.0_*.deb

Step 1.2: Start Grafana

bash

# Enable Grafana to start on boot
sudo systemctl enable grafana-server

# Start Grafana
sudo systemctl start grafana-server

# Check status
sudo systemctl status grafana-server

You should see “active (running)” in green.

Step 1.3: Access Grafana UI

Open your browser and navigate to:

http://<your-server-ip>:3000

Default credentials:

  • Username: admin
  • Password: admin

You’ll be prompted to change the password on first login. Choose a strong password!

✓ Checkpoint: You should see the Grafana welcome screen.

Part 2: Install and Configure Prometheus

Prometheus will collect and store DNS metrics.

Step 2.1: Install Prometheus

bash

# Install Prometheus
sudo apt install -y prometheus

# Stop the service temporarily for configuration
sudo systemctl stop prometheus

Step 2.2: Configure Prometheus

Create a backup of the default config:

bash

sudo cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.backup

Edit the configuration:

bash

sudo nano /etc/prometheus/prometheus.yml

Replace the contents with:

yaml

# Global configuration
global:
  scrape_interval: 5m
  evaluation_interval: 5m

# Alerting configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: []

# Load rules once and periodically evaluate them
rule_files: []

# Scrape configurations
scrape_configs:
  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Unbound DNS exporter
  - job_name: 'unbound'
    static_configs:
      - targets: ['localhost:9167']

Save and exit (Ctrl+X, then Y, then Enter).

Step 2.3: Start Prometheus

bash

# Start Prometheus
sudo systemctl start prometheus

# Enable on boot
sudo systemctl enable prometheus

# Verify it's running
sudo systemctl status prometheus

Step 2.4: Verify Prometheus

Open in your browser:

http://<your-server-ip>:9090

Go to Status → Targets. You should see the Prometheus job (it will show as up).

✓ Checkpoint: Prometheus UI is accessible and showing targets.

Part 3: Configure Unbound for Security Monitoring

Now we’ll enable detailed DNS logging and metrics.

Step 3.1: Enable Unbound Statistics

Edit Unbound configuration:

bash

sudo nano /etc/unbound/unbound.conf

Add these lines under the server: section:

yaml

server:
    # Existing configuration...
    
    # Enable extended statistics
    extended-statistics: yes
    
    # Enable detailed logging
    verbosity: 0
    log-queries: no
    log-replies: yes
    log-tag-queryreply: yes
    log-local-actions: yes
    logfile: /var/log/unbound/unbound.log
    
    # Log servfail messages
    log-servfail: yes

Add these lines under the remote-control: section:

yaml

remote-control:
    # Existing configuration...
    
    # Enable Unix socket for faster communication
    control-interface: "/var/run/unbound.sock"
    control-use-cert: no

Save and exit.

Step 3.2: Create Log Directory

bash

# Create log directory
sudo mkdir -p /var/log/unbound

# Set proper permissions
sudo chown unbound:unbound /var/log/unbound
sudo chmod 755 /var/log/unbound

Step 3.3: Configure Log Rotation

Prevent logs from filling your disk:

bash

sudo nano /etc/logrotate.d/unbound

Add this content:

/var/log/unbound/unbound.log {
    daily
    rotate 7
    missingok
    notifempty
    compress
    delaycompress
    postrotate
        /usr/sbin/unbound-control reload 2>/dev/null || true
    endscript
}

Save and exit.

Step 3.4: Restart Unbound

bash

# Test configuration
sudo unbound-checkconf

# If no errors, restart
sudo systemctl restart unbound

# Verify it's running
sudo systemctl status unbound

# Check logs are being written
sudo tail -f /var/log/unbound/unbound.log

You should see log entries appearing. Press Ctrl+C to stop watching.

✓ Checkpoint: Unbound is logging to /var/log/unbound/unbound.log

Part 4: Install Unbound Exporter

The exporter collects metrics from Unbound and exposes them for Prometheus.

Step 4.1: Download Unbound Exporter

For this tutorial, we’ll use the custom unbound-exporter from the ar51an/unbound-exporter project.

bash

# Create a temporary directory
cd /tmp

# Download the latest release (adjust URL for your architecture)
# For ARM64:
wget https://github.com/ar51an/unbound-exporter/releases/latest/download/unbound-exporter-arm64

# For AMD64:
# wget https://github.com/ar51an/unbound-exporter/releases/latest/download/unbound-exporter-amd64

# Make it executable
chmod +x unbound-exporter-*

# Move to system binary directory
sudo mv unbound-exporter-* /usr/local/bin/unbound-exporter

# Verify installation
/usr/local/bin/unbound-exporter -version

Step 4.2: Create Systemd Service

bash

sudo nano /etc/systemd/system/prometheus-unbound-exporter.service

Add this content:

ini

[Unit]
Description=Prometheus Unbound Exporter
After=network.target unbound.service
Requires=unbound.service

[Service]
Type=simple
User=unbound
Group=unbound
ExecStart=/usr/local/bin/unbound-exporter \
    -unbound.socket unix:///var/run/unbound.sock \
    -web.listen-address :9167
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Save and exit.

Step 4.3: Start the Exporter

bash

# Reload systemd
sudo systemctl daemon-reload

# Start the exporter
sudo systemctl start prometheus-unbound-exporter

# Enable on boot
sudo systemctl enable prometheus-unbound-exporter

# Check status
sudo systemctl status prometheus-unbound-exporter

Step 4.4: Verify Metrics

bash

# Check metrics endpoint
curl http://localhost:9167/metrics

You should see Prometheus-formatted metrics. Look for lines like:

  • unbound_queries_total
  • unbound_cache_hits_total
  • unbound_response_time_seconds

✓ Checkpoint: Exporter is running and exposing metrics.

Part 5: Install and Configure Loki

Loki will aggregate DNS logs for threat detection.

Step 5.1: Install Loki

bash

# Download Loki (adjust for your architecture)
# For ARM64:
curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/loki_3.1.0_arm64.deb"

# For AMD64:
# curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/loki_3.1.0_amd64.deb"

# Install
sudo dpkg -i loki_3.1.0_*.deb

Step 5.2: Configure Loki

Create a backup and edit the config:

bash

sudo cp /etc/loki/config.yml /etc/loki/config.yml.backup
sudo nano /etc/loki/config.yml

Replace with this optimized configuration:

yaml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: warn

common:
  path_prefix: /var/lib/loki
  storage:
    filesystem:
      chunks_directory: /var/lib/loki/chunks
      rules_directory: /var/lib/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /var/lib/loki/tsdb-index
    cache_location: /var/lib/loki/tsdb-cache

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32
  max_query_length: 721h
  max_query_parallelism: 32
  max_cache_freshness_per_query: 10m
  max_query_series: 5000
  max_query_lookback: 0
  max_streams_per_user: 10000
  max_entries_limit_per_query: 100000

chunk_store_config:
  max_look_back_period: 720h

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h

compactor:
  working_directory: /var/lib/loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

query_range:
  align_queries_with_step: true
  max_retries: 5
  parallelise_shardable_queries: true
  cache_results: true

querier:
  max_concurrent: 20

Save and exit.

Step 5.3: Start Loki

bash

# Start Loki
sudo systemctl start loki

# Enable on boot
sudo systemctl enable loki

# Check status
sudo systemctl status loki

✓ Checkpoint: Loki is running on port 3100.

Part 6: Install and Configure Promtail

Promtail ships logs from Unbound to Loki.

Step 6.1: Install Promtail

bash

# Download Promtail (adjust for your architecture)
# For ARM64:
curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/promtail_3.1.0_arm64.deb"

# For AMD64:
# curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/promtail_3.1.0_amd64.deb"

# Install
sudo dpkg -i promtail_3.1.0_*.deb

Step 6.2: Configure Promtail

bash

sudo cp /etc/promtail/config.yml /etc/promtail/config.yml.backup
sudo nano /etc/promtail/config.yml

Replace with:

yaml

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: warn

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: unbound
    static_configs:
      - targets:
          - localhost
        labels:
          job: unbound
          __path__: /var/log/unbound/unbound.log

Save and exit.

Step 6.3: Grant Promtail Access to Logs

bash

# Add promtail user to unbound group
sudo usermod -a -G unbound promtail

# Or adjust permissions
sudo chmod 644 /var/log/unbound/unbound.log

Step 6.4: Start Promtail

bash

# Start Promtail
sudo systemctl start promtail

# Enable on boot
sudo systemctl enable promtail

# Check status
sudo systemctl status promtail

✓ Checkpoint: Promtail is shipping logs to Loki.

Part 7: Configure Grafana Data Sources

Now we’ll connect Grafana to Prometheus and Loki.

Step 7.1: Add Prometheus Data Source

  1. Open Grafana: http://<your-server-ip>:3000
  2. Click the hamburger menu (☰) → ConnectionsData sources
  3. Click Add data source
  4. Select Prometheus
  5. Configure:
    • Name: Prometheus
    • Default: Toggle ON
    • URL: http://localhost:9090
    • Scrape interval: 5m
  6. Scroll down and click Save & test

You should see: ✓ Successfully queried the Prometheus API.

Step 7.2: Add Loki Data Source

  1. Click Add data source again
  2. Select Loki
  3. Configure:
    • Name: Loki
    • URL: http://localhost:3100
    • Maximum lines: 100000
  4. Scroll down and click Save & test

You should see: ✓ Data source connected and labels found.

✓ Checkpoint: Both data sources are connected and working.

Part 8: Create DNS Security Dashboard

Time to build your security monitoring dashboard!

Step 8.1: Create a New Dashboard

  1. Click the hamburger menu (☰) → Dashboards
  2. Click NewNew Dashboard
  3. Click + Add visualization
  4. Select Prometheus as the data source

Step 8.2: Panel 1 – Query Rate Over Time

This panel shows DNS queries per second.

Configuration:

  • Title: DNS Queries per Second
  • Query:

promql

  rate(unbound_queries_total[5m])
  • Legend: {{job}}
  • Visualization: Time series (line chart)

Click Apply to save the panel.

Step 8.3: Panel 2 – Query Types Distribution

Shows the breakdown of DNS query types (A, AAAA, etc.).

Configuration:

  • Click AddVisualization
  • Select Prometheus
  • Title: Query Types
  • Query:

promql

  sum by (type) (rate(unbound_queries_total{type!=""}[5m]))
  • Visualization: Pie chart or Bar chart
  • Legend: {{type}}

Click Apply.

Step 8.4: Panel 3 – Cache Performance

Monitors cache hit ratio.

Configuration:

  • Title: Cache Hit Ratio
  • Query:

promql

  (
    rate(unbound_cache_hits_total[5m])
    /
    (rate(unbound_cache_hits_total[5m]) + rate(unbound_cache_miss_total[5m]))
  ) * 100
  • Unit: Percent (0-100)
  • Visualization: Stat or Gauge
  • Thresholds:
    • Red: < 70
    • Yellow: 70-85
    • Green: > 85

Click Apply.

Step 8.5: Panel 4 – NXDOMAIN Rate (Threat Indicator)

High NXDOMAIN rates suggest reconnaissance or tunneling.

Configuration:

  • Title: NXDOMAIN Rate (Suspicious if >20%)
  • Query:

promql

  (
    rate(unbound_answers_rcode_total{rcode="NXDOMAIN"}[5m])
    /
    rate(unbound_queries_total[5m])
  ) * 100
  • Unit: Percent (0-100)
  • Visualization: Time series
  • Thresholds:
    • Green: < 5
    • Yellow: 5-20
    • Red: > 20

Click Apply.

Step 8.6: Panel 5 – Top Queried Domains (Logs)

Uses Loki to show most frequently queried domains.

Configuration:

  • Title: Top Queried Domains
  • Select Loki as data source
  • Query (LogQL):

logql

  topk(10,
    sum by (domain) (
      count_over_time({job="unbound"} |~ "reply" | regexp "(?P<domain>[a-zA-Z0-9.-]+)" [1h]
    )
  ))
  • Visualization: Bar chart
  • Legend: {{domain}}

Click Apply.

Step 8.7: Panel 6 – Failed Queries (Security Anomaly)

Shows queries that failed, which might indicate attacks.

Configuration:

  • Title: Failed Queries
  • Data source: Loki
  • Query:

logql

  {job="unbound"} |= "SERVFAIL" or "REFUSED" or "NXDOMAIN"
  • Visualization: Logs
  • Time range: Last 15 minutes

Click Apply.

Step 8.8: Panel 7 – Query Size Distribution

Large queries may indicate DNS tunneling.

Configuration:

  • Title: Query Size Distribution
  • Data source: Prometheus
  • Query:

promql

  histogram_quantile(0.99, 
    rate(unbound_request_size_bytes_bucket[5m])
  )
  • Visualization: Time series
  • Legend: 99th Percentile Query Size
  • Threshold alert: Alert if > 200 bytes

Click Apply.

Step 8.9: Panel 8 – Geographic Anomalies (If using GeoIP)

If you have GeoIP data in your logs:

Configuration:

  • Title: Queries by Country
  • Data source: Loki
  • Query:

logql

  sum by (country) (
    count_over_time({job="unbound"} | json | country != "" [1h])
  )
  • Visualization: Geomap or Table

Click Apply.

Step 8.10: Save the Dashboard

  1. Click the save icon (💾) at the top right
  2. Name: DNS Security Monitoring
  3. Folder: Dashboards
  4. Click Save

✓ Checkpoint: You have a working DNS security dashboard!

Part 9: Create Security Alerts

Now let’s set up automated alerts for threats.

Step 9.1: Alert 1 – High NXDOMAIN Rate

Triggers when NXDOMAIN rate exceeds 20% (possible reconnaissance).

  1. Edit the NXDOMAIN Rate panel
  2. Click Alert tab
  3. Click Create alert rule from this panel
  4. Configure:
    • Alert rule name: High NXDOMAIN Rate
    • Condition:
      • WHEN: last()
      • IS ABOVE: 20
    • Evaluate every: 5m
    • For: 10m (must be true for 10 minutes)
    • Summary: NXDOMAIN rate is {{$value}}%, indicating possible DNS reconnaissance
  5. Click Save rule and exit

Step 9.2: Alert 2 – Unusual Query Volume

Triggers on sudden query spikes (possible DDoS).

  1. Create a new panel or edit Query Rate panel
  2. Click Alert tab
  3. Alert rule name: Unusual Query Volume
  4. Condition:
    • WHEN: avg()
    • IS ABOVE: Set based on your baseline (e.g., 150% of normal)
  5. For: 5m
  6. Summary: Query volume spike detected: {{$value}} qps
  7. Click Save rule and exit

Step 9.3: Alert 3 – Cache Hit Ratio Drop

Triggers when cache performance degrades (possible cache poisoning).

  1. Edit the Cache Hit Ratio panel
  2. Alert rule name: Low Cache Hit Ratio
  3. Condition:
    • WHEN: last()
    • IS BELOW: 70
  4. For: 15m
  5. Summary: Cache hit ratio dropped to {{$value}}%
  6. Click Save rule and exit

Step 9.4: Configure Alert Notifications

To receive alerts via email, Slack, etc.:

  1. Go to AlertingContact points
  2. Click Add contact point
  3. Choose your notification method (email, Slack, webhook, etc.)
  4. Configure credentials/settings
  5. Click Test to verify
  6. Click Save

Then create a notification policy:

  1. Go to AlertingNotification policies
  2. Click New policy
  3. Select your contact point
  4. Configure routing rules
  5. Click Save

✓ Checkpoint: Alerts are configured and will notify you of threats.

Part 10: Testing Your Setup

Let’s verify everything works.

Test 10.1: Generate Test Queries

bash

# Generate some DNS queries
for i in {1..100}; do
  dig @localhost google.com +short
  dig @localhost example.com +short
  dig @localhost github.com +short
done

Wait 1-2 minutes, then check your Grafana dashboard. You should see the query count increase.

Test 10.2: Simulate NXDOMAIN Reconnaissance

bash

# Query non-existent domains
for i in {1..50}; do
  dig @localhost random-nonexistent-domain-$RANDOM.com +short
done

Check the NXDOMAIN rate panel. It should show an increase.

Test 10.3: Check Logs in Grafana

  1. Go to Explore (compass icon in sidebar)
  2. Select Loki data source
  3. Query: {job="unbound"}
  4. Click Run query

You should see your DNS logs appearing.

Test 10.4: Verify Metrics in Prometheus

  1. Go to Prometheus UI: http://<your-server-ip>:9090
  2. Query: unbound_queries_total
  3. Click Execute
  4. Switch to Graph tab

You should see query metrics over time.

✓ Checkpoint: All components are working and data is flowing!

Part 11: Advanced Threat Detection Queries

Now that your system is working, here are some advanced queries for specific threats.

Detecting DNS Tunneling

PromQL Query:

promql

# Queries with unusually long domain names
histogram_quantile(0.99, 
  sum(rate(unbound_request_size_bytes_bucket[5m])) by (le)
) > 150

LogQL Query:

logql

# Find domains with high entropy (random-looking subdomains)
{job="unbound"} 
| regexp "(?P<domain>[a-z0-9]{20,}\\..*)" 
| line_format "{{.domain}}"

Detecting DGA (Domain Generation Algorithm)

logql

# Domains with consecutive random characters
{job="unbound"} 
| regexp "(?P<domain>[a-z]{15,}\\.com)" 
| line_format "{{.domain}}"

Detecting DNS Amplification Attacks

PromQL Query:

promql

# Large responses relative to queries
(
  rate(unbound_response_size_bytes_total[5m])
  /
  rate(unbound_request_size_bytes_total[5m])
) > 10

Detecting Query Floods from Single Source

LogQL Query:

logql

# Count queries per source IP
topk(10,
  sum by (source_ip) (
    count_over_time(
      {job="unbound"} 
      | regexp "(?P<source_ip>\\d+\\.\\d+\\.\\d+\\.\\d+)" 
      [5m]
    )
  )
)

Add These as Dashboard Panels

Create new panels with these queries to enhance your threat detection!

Part 12: Optimization and Maintenance

12.1: Tune Data Retention

Adjust how long data is kept:

Prometheus:

bash

sudo nano /etc/default/prometheus

Add:

ARGS="--storage.tsdb.retention.time=30d"

Loki: Already configured in config.yml (720h = 30 days)

12.2: Monitor Resource Usage

bash

# Check disk usage
df -h /var/lib/prometheus
df -h /var/lib/loki

# Check memory usage
free -h

# Monitor processes
htop

12.3: Set Up Automated Backups

bash

# Create backup script
sudo nano /usr/local/bin/backup-monitoring.sh

Add:

bash

#!/bin/bash
BACKUP_DIR="/backup/monitoring"
DATE=$(date +%Y%m%d)

mkdir -p $BACKUP_DIR

# Backup Grafana database
cp /var/lib/grafana/grafana.db "$BACKUP_DIR/grafana-$DATE.db"

# Backup configurations
tar -czf "$BACKUP_DIR/configs-$DATE.tar.gz" \
  /etc/grafana/grafana.ini \
  /etc/prometheus/prometheus.yml \
  /etc/loki/config.yml \
  /etc/promtail/config.yml

# Keep only last 7 days
find $BACKUP_DIR -mtime +7 -delete

echo "Backup completed: $DATE"

Make executable:

bash

sudo chmod +x /usr/local/bin/backup-monitoring.sh

Add to crontab:

bash

sudo crontab -e

Add:

0 2 * * * /usr/local/bin/backup-monitoring.sh >> /var/log/backup-monitoring.log 2>&1

12.4: Regular Maintenance Tasks

Weekly:

  • Review alert history
  • Check disk usage
  • Update blocklists if using them

Monthly:

  • Update software packages
  • Review and adjust alert thresholds
  • Analyze long-term trends

Quarterly:

  • Full backup of all data
  • Security audit
  • Performance tuning

Troubleshooting Guide

Issue: No metrics appearing in Grafana

Check:

  1. Is Prometheus running?

bash

   sudo systemctl status prometheus
  1. Is the exporter running?

bash

   sudo systemctl status prometheus-unbound-exporter
   curl http://localhost:9167/metrics
  1. Are targets up in Prometheus?
    • Go to http://<server>:9090/targets
    • All should show “UP”

Issue: No logs appearing in Grafana

Check:

  1. Is Loki running?

bash

   sudo systemctl status loki
  1. Is Promtail running and shipping logs?

bash

   sudo systemctl status promtail
   sudo journalctl -u promtail -n 50
  1. Are logs being written?

bash

   sudo tail -f /var/log/unbound/unbound.log
  1. Check Promtail permissions:

bash

   sudo ls -l /var/log/unbound/unbound.log

Issue: High disk usage

Solution:

  1. Reduce retention periods in Prometheus and Loki configs
  2. Enable log compression
  3. Implement log rotation
  4. Archive old data

Issue: Alerts not firing

Check:

  1. Alert rules are correctly configured
  2. Evaluation interval is appropriate
  3. Thresholds match your environment
  4. Notification channels are configured
  5. Test with:

bash

   # Generate test traffic to trigger alerts

Issue: Slow dashboard performance

Solutions:

  1. Reduce time range displayed
  2. Increase scrape intervals
  3. Limit number of panels
  4. Use recording rules in Prometheus
  5. Add more resources (RAM)

Next Steps

Congratulations! You now have a working DNS security monitoring system. Here’s what to do next:

Immediate:

  • ✅ Monitor for 24-48 hours to establish baselines
  • ✅ Adjust alert thresholds based on your normal traffic
  • ✅ Document your specific setup and customizations

Leave a Response