Feb. 09, 2026 at 8:07 am

Microsoft Windows 11 Windows Server 2022 Windows Server 2025

DNS Security Monitoring with Grafana

vmorecloud22 hours agoFebruary 9, 2026no commentMicrosoft Windows Server

19views

You know that sinking feeling when you realize something’s wrong with your network? Maybe users can’t reach certain sites, or worse they’re being redirected to suspicious domains without anyone noticing. That’s where DNS security monitoring comes in, and honestly, it’s one of those things you don’t appreciate until you really need it.

Let me walk you through how DNS security monitoring works with Grafana, why it matters more than ever in 2026, and how you can start protecting your network today.

Why DNS Security Matters More Than Ever

Here’s something that might surprise you: DNS threats increased by 30% between October 2024 and September 2025 according to recent industry data. The average internet user encounters 66 threats per day, up from 29 previously. That’s not just a statistic—that’s potentially 66 opportunities for attackers to compromise your network every single day.

The Domain Name System wasn’t designed with security in mind. Back when it was created, the internet was a much smaller, more trusted place. Fast forward to 2026, and DNS has become one of the most exploited protocols in cybersecurity. Why? Because it’s trusted by default. Firewalls let DNS traffic through, security teams often don’t monitor it closely, and attackers know this.

The DNS Threat Landscape in 2026

Let’s talk about what you’re actually up against. The threats have evolved significantly, and they’re getting more sophisticated every year.

DNS Tunneling: The Silent Data Thief

Imagine a hacker creating a secret tunnel through your network using something as innocent-looking as DNS queries. That’s DNS tunneling, and it’s terrifyingly effective. Recent research shows detection systems can achieve accuracy rates of 99.82% using machine learning, but only if you’re actually looking for it.

DNS tunneling works by encoding data—commands, stolen files, anything really—into DNS queries and responses. It looks like normal DNS traffic to most monitoring tools, which is exactly the point. Attackers use it for data exfiltration, command and control, and bypassing your security controls.

The telltale signs? Look for unusually long domain names (sometimes hitting that 255-character limit), high-entropy random-looking strings in subdomains, and excessive query volumes to single domains. Organizations should monitor for high query volumes and frequent queries to single domains as indicators.

DNS Cache Poisoning and Hijacking

This one’s particularly nasty. DNS cache poisoning involves injecting false information into DNS caches, redirecting users to malicious sites without them knowing. You think you’re going to your bank’s website, but you’re actually headed to a phishing site that looks identical.

DNS hijacking takes it further by actually compromising DNS records. Advanced persistent threats and botnets often manifest through unusual domain blocking frequencies, which is why continuous monitoring is critical.

Enter Grafana: Your DNS Security Command Center

So how do you fight back? This is where Grafana comes into play as your visualization and monitoring platform. Think of Grafana as your security dashboard—a single pane of glass where you can see everything happening in your DNS infrastructure in real-time.

Grafana doesn’t do the heavy lifting alone. It connects to data sources like Prometheus for metrics and Loki for logs, creating a powerful monitoring stack that gives you eyes on your DNS traffic.

What Makes Grafana Perfect for DNS Security?

Real-Time Visualization: Threats don’t wait, and neither should your monitoring. Grafana updates dashboards in real-time, so you see attacks as they happen, not hours later when reviewing logs.

Flexible Data Sources: Grafana connects to Prometheus for time-series metrics, Loki for log aggregation, and even specialized DNS monitoring tools. You’re not locked into a single vendor or data format.

Custom Dashboards: Every network is different. Grafana lets you build dashboards that show exactly what matters to your environment—whether that’s query volumes, response times, or threat indicators.

Alert Integration: When Grafana detects something suspicious, it can alert your team via Slack, email, PagerDuty, or whatever notification system you use. No more staring at dashboards 24/7.

The Bottom Line

DNS security monitoring isn’t optional anymore. With threats increasing 30% year-over-year and attackers using AI to automate attacks, you need visibility into your DNS traffic. Grafana provides that visibility in a flexible, powerful platform that scales from small homelabs to enterprise networks.

The best part? You don’t need a massive security team or unlimited budget to get started. With open-source tools like Grafana, Prometheus, and Loki, you can build enterprise-grade DNS security monitoring on a shoestring budget.

Your DNS infrastructure is the foundation of your network. Monitoring it isn’t just about detecting attacks—it’s about understanding your network, spotting issues before they become problems, and having the confidence that when something goes wrong, you’ll know about it immediately.

Tutorial: DNS Security Monitoring with Grafana – Complete Setup Guide

Welcome to this hands-on tutorial! We ‘ll build a complete DNS security monitoring system using Grafana, Prometheus, and Loki. By the end, you’ll have real-time dashboards showing DNS threats, automated alerts, and a solid foundation for protecting your network.

What You’ll Build

By the end of this tutorial, you’ll have:

✅ A Grafana dashboard displaying DNS security metrics in real-time
✅ Prometheus collecting DNS metrics every 5 minutes
✅ Loki aggregating DNS logs for threat analysis
✅ Automated alerts for suspicious DNS activity
✅ Threat detection panels for tunneling, cache poisoning, and DDoS

Prerequisites

Required:

A DNS server (Unbound, BIND, or CoreDNS)
Linux system with sudo access (this tutorial uses Ubuntu/Debian)
2GB+ RAM available
10GB+ free disk space
Basic command line knowledge

Architecture Overview

Before we dive in, here’s what we’re building:

DNS Server (Unbound)
    ↓
    ├→ Metrics → Unbound Exporter → Prometheus → Grafana
    └→ Logs → Promtail → Loki → Grafana

DNS Server generates queries and logs
Unbound Exporter collects metrics (queries/sec, cache stats, etc.)
Prometheus stores time-series metrics
Promtail ships logs to Loki
Loki aggregates and indexes logs
Grafana visualizes everything in dashboards

Part 1: Install and Configure Grafana

Let’s start with our visualization platform.

Step 1.1: Install Grafana

bash

# Update system packages
sudo apt update

# Install required dependency
sudo apt install -y musl

# Download Grafana OSS (adjust for your architecture)
# For ARM64 (Raspberry Pi 4, etc.):
wget https://dl.grafana.com/oss/release/grafana_11.1.0_arm64.deb

# For AMD64/x86_64:
# wget https://dl.grafana.com/oss/release/grafana_11.1.0_amd64.deb

# Install Grafana
sudo dpkg -i grafana_11.1.0_*.deb

Step 1.2: Start Grafana

bash

# Enable Grafana to start on boot
sudo systemctl enable grafana-server

# Start Grafana
sudo systemctl start grafana-server

# Check status
sudo systemctl status grafana-server

You should see “active (running)” in green.

Step 1.3: Access Grafana UI

Open your browser and navigate to:

http://<your-server-ip>:3000

Default credentials:

Username: admin
Password: admin

You’ll be prompted to change the password on first login. Choose a strong password!

✓ Checkpoint: You should see the Grafana welcome screen.

Part 2: Install and Configure Prometheus

Prometheus will collect and store DNS metrics.

Step 2.1: Install Prometheus

bash

# Install Prometheus
sudo apt install -y prometheus

# Stop the service temporarily for configuration
sudo systemctl stop prometheus

Step 2.2: Configure Prometheus

Create a backup of the default config:

bash

sudo cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.backup

Edit the configuration:

bash

sudo nano /etc/prometheus/prometheus.yml

Replace the contents with:

yaml

# Global configuration
global:
  scrape_interval: 5m
  evaluation_interval: 5m

# Alerting configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets: []

# Load rules once and periodically evaluate them
rule_files: []

# Scrape configurations
scrape_configs:
  # Prometheus self-monitoring
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # Unbound DNS exporter
  - job_name: 'unbound'
    static_configs:
      - targets: ['localhost:9167']

Save and exit (Ctrl+X, then Y, then Enter).

Step 2.3: Start Prometheus

bash

# Start Prometheus
sudo systemctl start prometheus

# Enable on boot
sudo systemctl enable prometheus

# Verify it's running
sudo systemctl status prometheus

Step 2.4: Verify Prometheus

Open in your browser:

http://<your-server-ip>:9090

Go to Status → Targets. You should see the Prometheus job (it will show as up).

✓ Checkpoint: Prometheus UI is accessible and showing targets.

Part 3: Configure Unbound for Security Monitoring

Now we’ll enable detailed DNS logging and metrics.

Step 3.1: Enable Unbound Statistics

Edit Unbound configuration:

bash

sudo nano /etc/unbound/unbound.conf

Add these lines under the server: section:

yaml

server:
    # Existing configuration...
    
    # Enable extended statistics
    extended-statistics: yes
    
    # Enable detailed logging
    verbosity: 0
    log-queries: no
    log-replies: yes
    log-tag-queryreply: yes
    log-local-actions: yes
    logfile: /var/log/unbound/unbound.log
    
    # Log servfail messages
    log-servfail: yes

Add these lines under the remote-control: section:

yaml

remote-control:
    # Existing configuration...
    
    # Enable Unix socket for faster communication
    control-interface: "/var/run/unbound.sock"
    control-use-cert: no

Save and exit.

Step 3.2: Create Log Directory

bash

# Create log directory
sudo mkdir -p /var/log/unbound

# Set proper permissions
sudo chown unbound:unbound /var/log/unbound
sudo chmod 755 /var/log/unbound

Step 3.3: Configure Log Rotation

Prevent logs from filling your disk:

bash

sudo nano /etc/logrotate.d/unbound

Add this content:

/var/log/unbound/unbound.log {
    daily
    rotate 7
    missingok
    notifempty
    compress
    delaycompress
    postrotate
        /usr/sbin/unbound-control reload 2>/dev/null || true
    endscript
}

Save and exit.

Step 3.4: Restart Unbound

bash

# Test configuration
sudo unbound-checkconf

# If no errors, restart
sudo systemctl restart unbound

# Verify it's running
sudo systemctl status unbound

# Check logs are being written
sudo tail -f /var/log/unbound/unbound.log

You should see log entries appearing. Press Ctrl+C to stop watching.

✓ Checkpoint: Unbound is logging to /var/log/unbound/unbound.log

Part 4: Install Unbound Exporter

The exporter collects metrics from Unbound and exposes them for Prometheus.

Step 4.1: Download Unbound Exporter

For this tutorial, we’ll use the custom unbound-exporter from the ar51an/unbound-exporter project.

bash

# Create a temporary directory
cd /tmp

# Download the latest release (adjust URL for your architecture)
# For ARM64:
wget https://github.com/ar51an/unbound-exporter/releases/latest/download/unbound-exporter-arm64

# For AMD64:
# wget https://github.com/ar51an/unbound-exporter/releases/latest/download/unbound-exporter-amd64

# Make it executable
chmod +x unbound-exporter-*

# Move to system binary directory
sudo mv unbound-exporter-* /usr/local/bin/unbound-exporter

# Verify installation
/usr/local/bin/unbound-exporter -version

Step 4.2: Create Systemd Service

bash

sudo nano /etc/systemd/system/prometheus-unbound-exporter.service

Add this content:

ini

[Unit]
Description=Prometheus Unbound Exporter
After=network.target unbound.service
Requires=unbound.service

[Service]
Type=simple
User=unbound
Group=unbound
ExecStart=/usr/local/bin/unbound-exporter \
    -unbound.socket unix:///var/run/unbound.sock \
    -web.listen-address :9167
Restart=on-failure
RestartSec=5s

[Install]
WantedBy=multi-user.target

Save and exit.

Step 4.3: Start the Exporter

bash

# Reload systemd
sudo systemctl daemon-reload

# Start the exporter
sudo systemctl start prometheus-unbound-exporter

# Enable on boot
sudo systemctl enable prometheus-unbound-exporter

# Check status
sudo systemctl status prometheus-unbound-exporter

Step 4.4: Verify Metrics

bash

# Check metrics endpoint
curl http://localhost:9167/metrics

You should see Prometheus-formatted metrics. Look for lines like:

unbound_queries_total
unbound_cache_hits_total
unbound_response_time_seconds

✓ Checkpoint: Exporter is running and exposing metrics.

Part 5: Install and Configure Loki

Loki will aggregate DNS logs for threat detection.

Step 5.1: Install Loki

bash

# Download Loki (adjust for your architecture)
# For ARM64:
curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/loki_3.1.0_arm64.deb"

# For AMD64:
# curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/loki_3.1.0_amd64.deb"

# Install
sudo dpkg -i loki_3.1.0_*.deb

Step 5.2: Configure Loki

Create a backup and edit the config:

bash

sudo cp /etc/loki/config.yml /etc/loki/config.yml.backup
sudo nano /etc/loki/config.yml

Replace with this optimized configuration:

yaml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: warn

common:
  path_prefix: /var/lib/loki
  storage:
    filesystem:
      chunks_directory: /var/lib/loki/chunks
      rules_directory: /var/lib/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

storage_config:
  tsdb_shipper:
    active_index_directory: /var/lib/loki/tsdb-index
    cache_location: /var/lib/loki/tsdb-cache

limits_config:
  enforce_metric_name: false
  reject_old_samples: true
  reject_old_samples_max_age: 168h
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32
  max_query_length: 721h
  max_query_parallelism: 32
  max_cache_freshness_per_query: 10m
  max_query_series: 5000
  max_query_lookback: 0
  max_streams_per_user: 10000
  max_entries_limit_per_query: 100000

chunk_store_config:
  max_look_back_period: 720h

table_manager:
  retention_deletes_enabled: true
  retention_period: 720h

compactor:
  working_directory: /var/lib/loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
  retention_delete_worker_count: 150

query_range:
  align_queries_with_step: true
  max_retries: 5
  parallelise_shardable_queries: true
  cache_results: true

querier:
  max_concurrent: 20

Save and exit.

Step 5.3: Start Loki

bash

# Start Loki
sudo systemctl start loki

# Enable on boot
sudo systemctl enable loki

# Check status
sudo systemctl status loki

✓ Checkpoint: Loki is running on port 3100.

Part 6: Install and Configure Promtail

Promtail ships logs from Unbound to Loki.

Step 6.1: Install Promtail

bash

# Download Promtail (adjust for your architecture)
# For ARM64:
curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/promtail_3.1.0_arm64.deb"

# For AMD64:
# curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/promtail_3.1.0_amd64.deb"

# Install
sudo dpkg -i promtail_3.1.0_*.deb

Step 6.2: Configure Promtail

bash

sudo cp /etc/promtail/config.yml /etc/promtail/config.yml.backup
sudo nano /etc/promtail/config.yml

Replace with:

yaml

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  log_level: warn

positions:
  filename: /var/lib/promtail/positions.yaml

clients:
  - url: http://localhost:3100/loki/api/v1/push

scrape_configs:
  - job_name: unbound
    static_configs:
      - targets:
          - localhost
        labels:
          job: unbound
          __path__: /var/log/unbound/unbound.log

Save and exit.

Step 6.3: Grant Promtail Access to Logs

bash

# Add promtail user to unbound group
sudo usermod -a -G unbound promtail

# Or adjust permissions
sudo chmod 644 /var/log/unbound/unbound.log

Step 6.4: Start Promtail

bash

# Start Promtail
sudo systemctl start promtail

# Enable on boot
sudo systemctl enable promtail

# Check status
sudo systemctl status promtail

✓ Checkpoint: Promtail is shipping logs to Loki.

Part 7: Configure Grafana Data Sources

Now we’ll connect Grafana to Prometheus and Loki.

Step 7.1: Add Prometheus Data Source

Open Grafana: http://<your-server-ip>:3000
Click the hamburger menu (☰) → Connections → Data sources
Click Add data source
Select Prometheus
Configure:
- Name: Prometheus
- Default: Toggle ON
- URL: http://localhost:9090
- Scrape interval: 5m
Scroll down and click Save & test

You should see: ✓ Successfully queried the Prometheus API.

Step 7.2: Add Loki Data Source

Click Add data source again
Select Loki
Configure:
- Name: Loki
- URL: http://localhost:3100
- Maximum lines: 100000
Scroll down and click Save & test

You should see: ✓ Data source connected and labels found.

✓ Checkpoint: Both data sources are connected and working.

Part 8: Create DNS Security Dashboard

Time to build your security monitoring dashboard!

Step 8.1: Create a New Dashboard

Click the hamburger menu (☰) → Dashboards
Click New → New Dashboard
Click + Add visualization
Select Prometheus as the data source

Step 8.2: Panel 1 – Query Rate Over Time

This panel shows DNS queries per second.

Configuration:

Title: DNS Queries per Second
Query:

promql

  rate(unbound_queries_total[5m])

Legend: {{job}}
Visualization: Time series (line chart)

Click Apply to save the panel.

Step 8.3: Panel 2 – Query Types Distribution

Shows the breakdown of DNS query types (A, AAAA, etc.).

Configuration:

Click Add → Visualization
Select Prometheus
Title: Query Types
Query:

promql

  sum by (type) (rate(unbound_queries_total{type!=""}[5m]))

Visualization: Pie chart or Bar chart
Legend: {{type}}

Click Apply.

Step 8.4: Panel 3 – Cache Performance

Monitors cache hit ratio.

Configuration:

Title: Cache Hit Ratio
Query:

promql

  (
    rate(unbound_cache_hits_total[5m])
    /
    (rate(unbound_cache_hits_total[5m]) + rate(unbound_cache_miss_total[5m]))
  ) * 100

Unit: Percent (0-100)
Visualization: Stat or Gauge
Thresholds:
- Red: < 70
- Yellow: 70-85
- Green: > 85

Click Apply.

Step 8.5: Panel 4 – NXDOMAIN Rate (Threat Indicator)

High NXDOMAIN rates suggest reconnaissance or tunneling.

Configuration:

Title: NXDOMAIN Rate (Suspicious if >20%)
Query:

promql

  (
    rate(unbound_answers_rcode_total{rcode="NXDOMAIN"}[5m])
    /
    rate(unbound_queries_total[5m])
  ) * 100

Unit: Percent (0-100)
Visualization: Time series
Thresholds:
- Green: < 5
- Yellow: 5-20
- Red: > 20

Click Apply.

Step 8.6: Panel 5 – Top Queried Domains (Logs)

Uses Loki to show most frequently queried domains.

Configuration:

Title: Top Queried Domains
Select Loki as data source
Query (LogQL):

logql

  topk(10,
    sum by (domain) (
      count_over_time({job="unbound"} |~ "reply" | regexp "(?P<domain>[a-zA-Z0-9.-]+)" [1h]
    )
  ))

Visualization: Bar chart
Legend: {{domain}}

Click Apply.

Step 8.7: Panel 6 – Failed Queries (Security Anomaly)

Shows queries that failed, which might indicate attacks.

Configuration:

Title: Failed Queries
Data source: Loki
Query:

logql

  {job="unbound"} |= "SERVFAIL" or "REFUSED" or "NXDOMAIN"

Visualization: Logs
Time range: Last 15 minutes

Click Apply.

Step 8.8: Panel 7 – Query Size Distribution

Large queries may indicate DNS tunneling.

Configuration:

Title: Query Size Distribution
Data source: Prometheus
Query:

promql

  histogram_quantile(0.99, 
    rate(unbound_request_size_bytes_bucket[5m])
  )

Visualization: Time series
Legend: 99th Percentile Query Size
Threshold alert: Alert if > 200 bytes

Click Apply.

Step 8.9: Panel 8 – Geographic Anomalies (If using GeoIP)

If you have GeoIP data in your logs:

Configuration:

Title: Queries by Country
Data source: Loki
Query:

logql

  sum by (country) (
    count_over_time({job="unbound"} | json | country != "" [1h])
  )

Visualization: Geomap or Table

Click Apply.

Step 8.10: Save the Dashboard

Click the save icon (💾) at the top right
Name: DNS Security Monitoring
Folder: Dashboards
Click Save

✓ Checkpoint: You have a working DNS security dashboard!

Part 9: Create Security Alerts

Now let’s set up automated alerts for threats.

Step 9.1: Alert 1 – High NXDOMAIN Rate

Triggers when NXDOMAIN rate exceeds 20% (possible reconnaissance).

Edit the NXDOMAIN Rate panel
Click Alert tab
Click Create alert rule from this panel
Configure:
- Alert rule name: High NXDOMAIN Rate
- Condition:
  - WHEN: last()
  - IS ABOVE: 20
- Evaluate every: 5m
- For: 10m (must be true for 10 minutes)
- Summary: NXDOMAIN rate is {{$value}}%, indicating possible DNS reconnaissance
Click Save rule and exit

Step 9.2: Alert 2 – Unusual Query Volume

Triggers on sudden query spikes (possible DDoS).

Create a new panel or edit Query Rate panel
Click Alert tab
Alert rule name: Unusual Query Volume
Condition:
- WHEN: avg()
- IS ABOVE: Set based on your baseline (e.g., 150% of normal)
For: 5m
Summary: Query volume spike detected: {{$value}} qps
Click Save rule and exit

Step 9.3: Alert 3 – Cache Hit Ratio Drop

Triggers when cache performance degrades (possible cache poisoning).

Edit the Cache Hit Ratio panel
Alert rule name: Low Cache Hit Ratio
Condition:
- WHEN: last()
- IS BELOW: 70
For: 15m
Summary: Cache hit ratio dropped to {{$value}}%
Click Save rule and exit

Step 9.4: Configure Alert Notifications

To receive alerts via email, Slack, etc.:

Go to Alerting → Contact points
Click Add contact point
Choose your notification method (email, Slack, webhook, etc.)
Configure credentials/settings
Click Test to verify
Click Save

Then create a notification policy:

Go to Alerting → Notification policies
Click New policy
Select your contact point
Configure routing rules
Click Save

✓ Checkpoint: Alerts are configured and will notify you of threats.

Part 10: Testing Your Setup

Let’s verify everything works.

Test 10.1: Generate Test Queries

bash

# Generate some DNS queries
for i in {1..100}; do
  dig @localhost google.com +short
  dig @localhost example.com +short
  dig @localhost github.com +short
done

Wait 1-2 minutes, then check your Grafana dashboard. You should see the query count increase.

Test 10.2: Simulate NXDOMAIN Reconnaissance

bash

# Query non-existent domains
for i in {1..50}; do
  dig @localhost random-nonexistent-domain-$RANDOM.com +short
done

Check the NXDOMAIN rate panel. It should show an increase.

Test 10.3: Check Logs in Grafana

Go to Explore (compass icon in sidebar)
Select Loki data source
Query: {job="unbound"}
Click Run query

You should see your DNS logs appearing.

Test 10.4: Verify Metrics in Prometheus

Go to Prometheus UI: http://<your-server-ip>:9090
Query: unbound_queries_total
Click Execute
Switch to Graph tab

You should see query metrics over time.

✓ Checkpoint: All components are working and data is flowing!

Part 11: Advanced Threat Detection Queries

Now that your system is working, here are some advanced queries for specific threats.

Detecting DNS Tunneling

PromQL Query:

promql

# Queries with unusually long domain names
histogram_quantile(0.99, 
  sum(rate(unbound_request_size_bytes_bucket[5m])) by (le)
) > 150

LogQL Query:

logql

# Find domains with high entropy (random-looking subdomains)
{job="unbound"} 
| regexp "(?P<domain>[a-z0-9]{20,}\\..*)" 
| line_format "{{.domain}}"

Detecting DGA (Domain Generation Algorithm)

logql

# Domains with consecutive random characters
{job="unbound"} 
| regexp "(?P<domain>[a-z]{15,}\\.com)" 
| line_format "{{.domain}}"

Detecting DNS Amplification Attacks

PromQL Query:

promql

# Large responses relative to queries
(
  rate(unbound_response_size_bytes_total[5m])
  /
  rate(unbound_request_size_bytes_total[5m])
) > 10

Detecting Query Floods from Single Source

LogQL Query:

logql

# Count queries per source IP
topk(10,
  sum by (source_ip) (
    count_over_time(
      {job="unbound"} 
      | regexp "(?P<source_ip>\\d+\\.\\d+\\.\\d+\\.\\d+)" 
      [5m]
    )
  )
)

Add These as Dashboard Panels

Create new panels with these queries to enhance your threat detection!

Part 12: Optimization and Maintenance

12.1: Tune Data Retention

Adjust how long data is kept:

Prometheus:

bash

sudo nano /etc/default/prometheus

Add:

ARGS="--storage.tsdb.retention.time=30d"

Loki: Already configured in config.yml (720h = 30 days)

12.2: Monitor Resource Usage

bash

# Check disk usage
df -h /var/lib/prometheus
df -h /var/lib/loki

# Check memory usage
free -h

# Monitor processes
htop

12.3: Set Up Automated Backups

bash

# Create backup script
sudo nano /usr/local/bin/backup-monitoring.sh

Add:

bash

#!/bin/bash
BACKUP_DIR="/backup/monitoring"
DATE=$(date +%Y%m%d)

mkdir -p $BACKUP_DIR

# Backup Grafana database
cp /var/lib/grafana/grafana.db "$BACKUP_DIR/grafana-$DATE.db"

# Backup configurations
tar -czf "$BACKUP_DIR/configs-$DATE.tar.gz" \
  /etc/grafana/grafana.ini \
  /etc/prometheus/prometheus.yml \
  /etc/loki/config.yml \
  /etc/promtail/config.yml

# Keep only last 7 days
find $BACKUP_DIR -mtime +7 -delete

echo "Backup completed: $DATE"

Make executable:

bash

sudo chmod +x /usr/local/bin/backup-monitoring.sh

Add to crontab:

bash

sudo crontab -e

Add:

0 2 * * * /usr/local/bin/backup-monitoring.sh >> /var/log/backup-monitoring.log 2>&1

12.4: Regular Maintenance Tasks

Weekly:

Review alert history
Check disk usage
Update blocklists if using them

Monthly:

Update software packages
Review and adjust alert thresholds
Analyze long-term trends

Quarterly:

Full backup of all data
Security audit
Performance tuning

Troubleshooting Guide

Issue: No metrics appearing in Grafana

Check:

Is Prometheus running?

bash

   sudo systemctl status prometheus

Is the exporter running?

bash

   sudo systemctl status prometheus-unbound-exporter
   curl http://localhost:9167/metrics

Are targets up in Prometheus?
- Go to http://<server>:9090/targets
- All should show “UP”

Issue: No logs appearing in Grafana

Check:

Is Loki running?

bash

   sudo systemctl status loki

Is Promtail running and shipping logs?

bash

   sudo systemctl status promtail
   sudo journalctl -u promtail -n 50

Are logs being written?

bash

   sudo tail -f /var/log/unbound/unbound.log

Check Promtail permissions:

bash

   sudo ls -l /var/log/unbound/unbound.log

Issue: High disk usage

Solution:

Reduce retention periods in Prometheus and Loki configs
Enable log compression
Implement log rotation
Archive old data

Issue: Alerts not firing

Check:

Alert rules are correctly configured
Evaluation interval is appropriate
Thresholds match your environment
Notification channels are configured
Test with:

bash

   # Generate test traffic to trigger alerts

Issue: Slow dashboard performance

Solutions:

Reduce time range displayed
Increase scrape intervals
Limit number of panels
Use recording rules in Prometheus
Add more resources (RAM)

Next Steps

Congratulations! You now have a working DNS security monitoring system. Here’s what to do next:

Immediate:

✅ Monitor for 24-48 hours to establish baselines
✅ Adjust alert thresholds based on your normal traffic
✅ Document your specific setup and customizations

Tags :Microsoft Windows Server

add a comment

Leave a Response Cancel reply

vmorecloud

view all posts