You know that sinking feeling when you realize something’s wrong with your network? Maybe users can’t reach certain sites, or worse they’re being redirected to suspicious domains without anyone noticing. That’s where DNS security monitoring comes in, and honestly, it’s one of those things you don’t appreciate until you really need it.
Let me walk you through how DNS security monitoring works with Grafana, why it matters more than ever in 2026, and how you can start protecting your network today.
Why DNS Security Matters More Than Ever
Here’s something that might surprise you: DNS threats increased by 30% between October 2024 and September 2025 according to recent industry data. The average internet user encounters 66 threats per day, up from 29 previously. That’s not just a statistic—that’s potentially 66 opportunities for attackers to compromise your network every single day.
The Domain Name System wasn’t designed with security in mind. Back when it was created, the internet was a much smaller, more trusted place. Fast forward to 2026, and DNS has become one of the most exploited protocols in cybersecurity. Why? Because it’s trusted by default. Firewalls let DNS traffic through, security teams often don’t monitor it closely, and attackers know this.
The DNS Threat Landscape in 2026
Let’s talk about what you’re actually up against. The threats have evolved significantly, and they’re getting more sophisticated every year.
DNS Tunneling: The Silent Data Thief
Imagine a hacker creating a secret tunnel through your network using something as innocent-looking as DNS queries. That’s DNS tunneling, and it’s terrifyingly effective. Recent research shows detection systems can achieve accuracy rates of 99.82% using machine learning, but only if you’re actually looking for it.
DNS tunneling works by encoding data—commands, stolen files, anything really—into DNS queries and responses. It looks like normal DNS traffic to most monitoring tools, which is exactly the point. Attackers use it for data exfiltration, command and control, and bypassing your security controls.
The telltale signs? Look for unusually long domain names (sometimes hitting that 255-character limit), high-entropy random-looking strings in subdomains, and excessive query volumes to single domains. Organizations should monitor for high query volumes and frequent queries to single domains as indicators.
DNS Cache Poisoning and Hijacking
This one’s particularly nasty. DNS cache poisoning involves injecting false information into DNS caches, redirecting users to malicious sites without them knowing. You think you’re going to your bank’s website, but you’re actually headed to a phishing site that looks identical.
DNS hijacking takes it further by actually compromising DNS records. Advanced persistent threats and botnets often manifest through unusual domain blocking frequencies, which is why continuous monitoring is critical.
Enter Grafana: Your DNS Security Command Center
So how do you fight back? This is where Grafana comes into play as your visualization and monitoring platform. Think of Grafana as your security dashboard—a single pane of glass where you can see everything happening in your DNS infrastructure in real-time.
Grafana doesn’t do the heavy lifting alone. It connects to data sources like Prometheus for metrics and Loki for logs, creating a powerful monitoring stack that gives you eyes on your DNS traffic.
What Makes Grafana Perfect for DNS Security?
Real-Time Visualization: Threats don’t wait, and neither should your monitoring. Grafana updates dashboards in real-time, so you see attacks as they happen, not hours later when reviewing logs.
Flexible Data Sources: Grafana connects to Prometheus for time-series metrics, Loki for log aggregation, and even specialized DNS monitoring tools. You’re not locked into a single vendor or data format.
Custom Dashboards: Every network is different. Grafana lets you build dashboards that show exactly what matters to your environment—whether that’s query volumes, response times, or threat indicators.
Alert Integration: When Grafana detects something suspicious, it can alert your team via Slack, email, PagerDuty, or whatever notification system you use. No more staring at dashboards 24/7.
The Bottom Line
DNS security monitoring isn’t optional anymore. With threats increasing 30% year-over-year and attackers using AI to automate attacks, you need visibility into your DNS traffic. Grafana provides that visibility in a flexible, powerful platform that scales from small homelabs to enterprise networks.
The best part? You don’t need a massive security team or unlimited budget to get started. With open-source tools like Grafana, Prometheus, and Loki, you can build enterprise-grade DNS security monitoring on a shoestring budget.
Your DNS infrastructure is the foundation of your network. Monitoring it isn’t just about detecting attacks—it’s about understanding your network, spotting issues before they become problems, and having the confidence that when something goes wrong, you’ll know about it immediately.
Tutorial: DNS Security Monitoring with Grafana – Complete Setup Guide
Welcome to this hands-on tutorial! We ‘ll build a complete DNS security monitoring system using Grafana, Prometheus, and Loki. By the end, you’ll have real-time dashboards showing DNS threats, automated alerts, and a solid foundation for protecting your network.
What You’ll Build
By the end of this tutorial, you’ll have:
- ✅ A Grafana dashboard displaying DNS security metrics in real-time
- ✅ Prometheus collecting DNS metrics every 5 minutes
- ✅ Loki aggregating DNS logs for threat analysis
- ✅ Automated alerts for suspicious DNS activity
- ✅ Threat detection panels for tunneling, cache poisoning, and DDoS
Prerequisites
Required:
- A DNS server (Unbound, BIND, or CoreDNS)
- Linux system with sudo access (this tutorial uses Ubuntu/Debian)
- 2GB+ RAM available
- 10GB+ free disk space
- Basic command line knowledge
Architecture Overview
Before we dive in, here’s what we’re building:
DNS Server (Unbound)
↓
├→ Metrics → Unbound Exporter → Prometheus → Grafana
└→ Logs → Promtail → Loki → Grafana
- DNS Server generates queries and logs
- Unbound Exporter collects metrics (queries/sec, cache stats, etc.)
- Prometheus stores time-series metrics
- Promtail ships logs to Loki
- Loki aggregates and indexes logs
- Grafana visualizes everything in dashboards
Part 1: Install and Configure Grafana
Let’s start with our visualization platform.
Step 1.1: Install Grafana
bash
# Update system packages
sudo apt update
# Install required dependency
sudo apt install -y musl
# Download Grafana OSS (adjust for your architecture)
# For ARM64 (Raspberry Pi 4, etc.):
wget https://dl.grafana.com/oss/release/grafana_11.1.0_arm64.deb
# For AMD64/x86_64:
# wget https://dl.grafana.com/oss/release/grafana_11.1.0_amd64.deb
# Install Grafana
sudo dpkg -i grafana_11.1.0_*.deb
Step 1.2: Start Grafana
bash
# Enable Grafana to start on boot
sudo systemctl enable grafana-server
# Start Grafana
sudo systemctl start grafana-server
# Check status
sudo systemctl status grafana-server
You should see “active (running)” in green.
Step 1.3: Access Grafana UI
Open your browser and navigate to:
http://<your-server-ip>:3000
Default credentials:
- Username:
admin - Password:
admin
You’ll be prompted to change the password on first login. Choose a strong password!
✓ Checkpoint: You should see the Grafana welcome screen.
Part 2: Install and Configure Prometheus
Prometheus will collect and store DNS metrics.
Step 2.1: Install Prometheus
bash
# Install Prometheus
sudo apt install -y prometheus
# Stop the service temporarily for configuration
sudo systemctl stop prometheus
Step 2.2: Configure Prometheus
Create a backup of the default config:
bash
sudo cp /etc/prometheus/prometheus.yml /etc/prometheus/prometheus.yml.backup
Edit the configuration:
bash
sudo nano /etc/prometheus/prometheus.yml
Replace the contents with:
yaml
# Global configuration
global:
scrape_interval: 5m
evaluation_interval: 5m
# Alerting configuration
alerting:
alertmanagers:
- static_configs:
- targets: []
# Load rules once and periodically evaluate them
rule_files: []
# Scrape configurations
scrape_configs:
# Prometheus self-monitoring
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Unbound DNS exporter
- job_name: 'unbound'
static_configs:
- targets: ['localhost:9167']
Save and exit (Ctrl+X, then Y, then Enter).
Step 2.3: Start Prometheus
bash
# Start Prometheus
sudo systemctl start prometheus
# Enable on boot
sudo systemctl enable prometheus
# Verify it's running
sudo systemctl status prometheus
Step 2.4: Verify Prometheus
Open in your browser:
http://<your-server-ip>:9090
Go to Status → Targets. You should see the Prometheus job (it will show as up).
✓ Checkpoint: Prometheus UI is accessible and showing targets.
Part 3: Configure Unbound for Security Monitoring
Now we’ll enable detailed DNS logging and metrics.
Step 3.1: Enable Unbound Statistics
Edit Unbound configuration:
bash
sudo nano /etc/unbound/unbound.conf
Add these lines under the server: section:
yaml
server:
# Existing configuration...
# Enable extended statistics
extended-statistics: yes
# Enable detailed logging
verbosity: 0
log-queries: no
log-replies: yes
log-tag-queryreply: yes
log-local-actions: yes
logfile: /var/log/unbound/unbound.log
# Log servfail messages
log-servfail: yes
Add these lines under the remote-control: section:
yaml
remote-control:
# Existing configuration...
# Enable Unix socket for faster communication
control-interface: "/var/run/unbound.sock"
control-use-cert: no
Save and exit.
Step 3.2: Create Log Directory
bash
# Create log directory
sudo mkdir -p /var/log/unbound
# Set proper permissions
sudo chown unbound:unbound /var/log/unbound
sudo chmod 755 /var/log/unbound
Step 3.3: Configure Log Rotation
Prevent logs from filling your disk:
bash
sudo nano /etc/logrotate.d/unbound
Add this content:
/var/log/unbound/unbound.log {
daily
rotate 7
missingok
notifempty
compress
delaycompress
postrotate
/usr/sbin/unbound-control reload 2>/dev/null || true
endscript
}
Save and exit.
Step 3.4: Restart Unbound
bash
# Test configuration
sudo unbound-checkconf
# If no errors, restart
sudo systemctl restart unbound
# Verify it's running
sudo systemctl status unbound
# Check logs are being written
sudo tail -f /var/log/unbound/unbound.log
You should see log entries appearing. Press Ctrl+C to stop watching.
✓ Checkpoint: Unbound is logging to /var/log/unbound/unbound.log
Part 4: Install Unbound Exporter
The exporter collects metrics from Unbound and exposes them for Prometheus.
Step 4.1: Download Unbound Exporter
For this tutorial, we’ll use the custom unbound-exporter from the ar51an/unbound-exporter project.
bash
# Create a temporary directory
cd /tmp
# Download the latest release (adjust URL for your architecture)
# For ARM64:
wget https://github.com/ar51an/unbound-exporter/releases/latest/download/unbound-exporter-arm64
# For AMD64:
# wget https://github.com/ar51an/unbound-exporter/releases/latest/download/unbound-exporter-amd64
# Make it executable
chmod +x unbound-exporter-*
# Move to system binary directory
sudo mv unbound-exporter-* /usr/local/bin/unbound-exporter
# Verify installation
/usr/local/bin/unbound-exporter -version
Step 4.2: Create Systemd Service
bash
sudo nano /etc/systemd/system/prometheus-unbound-exporter.service
Add this content:
ini
[Unit]
Description=Prometheus Unbound Exporter
After=network.target unbound.service
Requires=unbound.service
[Service]
Type=simple
User=unbound
Group=unbound
ExecStart=/usr/local/bin/unbound-exporter \
-unbound.socket unix:///var/run/unbound.sock \
-web.listen-address :9167
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
Save and exit.
Step 4.3: Start the Exporter
bash
# Reload systemd
sudo systemctl daemon-reload
# Start the exporter
sudo systemctl start prometheus-unbound-exporter
# Enable on boot
sudo systemctl enable prometheus-unbound-exporter
# Check status
sudo systemctl status prometheus-unbound-exporter
Step 4.4: Verify Metrics
bash
# Check metrics endpoint
curl http://localhost:9167/metrics
You should see Prometheus-formatted metrics. Look for lines like:
unbound_queries_totalunbound_cache_hits_totalunbound_response_time_seconds
✓ Checkpoint: Exporter is running and exposing metrics.
Part 5: Install and Configure Loki
Loki will aggregate DNS logs for threat detection.
Step 5.1: Install Loki
bash
# Download Loki (adjust for your architecture)
# For ARM64:
curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/loki_3.1.0_arm64.deb"
# For AMD64:
# curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/loki_3.1.0_amd64.deb"
# Install
sudo dpkg -i loki_3.1.0_*.deb
Step 5.2: Configure Loki
Create a backup and edit the config:
bash
sudo cp /etc/loki/config.yml /etc/loki/config.yml.backup
sudo nano /etc/loki/config.yml
Replace with this optimized configuration:
yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: warn
common:
path_prefix: /var/lib/loki
storage:
filesystem:
chunks_directory: /var/lib/loki/chunks
rules_directory: /var/lib/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
tsdb_shipper:
active_index_directory: /var/lib/loki/tsdb-index
cache_location: /var/lib/loki/tsdb-cache
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 16
ingestion_burst_size_mb: 32
max_query_length: 721h
max_query_parallelism: 32
max_cache_freshness_per_query: 10m
max_query_series: 5000
max_query_lookback: 0
max_streams_per_user: 10000
max_entries_limit_per_query: 100000
chunk_store_config:
max_look_back_period: 720h
table_manager:
retention_deletes_enabled: true
retention_period: 720h
compactor:
working_directory: /var/lib/loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 150
query_range:
align_queries_with_step: true
max_retries: 5
parallelise_shardable_queries: true
cache_results: true
querier:
max_concurrent: 20
Save and exit.
Step 5.3: Start Loki
bash
# Start Loki
sudo systemctl start loki
# Enable on boot
sudo systemctl enable loki
# Check status
sudo systemctl status loki
✓ Checkpoint: Loki is running on port 3100.
Part 6: Install and Configure Promtail
Promtail ships logs from Unbound to Loki.
Step 6.1: Install Promtail
bash
# Download Promtail (adjust for your architecture)
# For ARM64:
curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/promtail_3.1.0_arm64.deb"
# For AMD64:
# curl -O -L "https://github.com/grafana/loki/releases/download/v3.1.0/promtail_3.1.0_amd64.deb"
# Install
sudo dpkg -i promtail_3.1.0_*.deb
Step 6.2: Configure Promtail
bash
sudo cp /etc/promtail/config.yml /etc/promtail/config.yml.backup
sudo nano /etc/promtail/config.yml
Replace with:
yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
log_level: warn
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: unbound
static_configs:
- targets:
- localhost
labels:
job: unbound
__path__: /var/log/unbound/unbound.log
Save and exit.
Step 6.3: Grant Promtail Access to Logs
bash
# Add promtail user to unbound group
sudo usermod -a -G unbound promtail
# Or adjust permissions
sudo chmod 644 /var/log/unbound/unbound.log
Step 6.4: Start Promtail
bash
# Start Promtail
sudo systemctl start promtail
# Enable on boot
sudo systemctl enable promtail
# Check status
sudo systemctl status promtail
✓ Checkpoint: Promtail is shipping logs to Loki.
Part 7: Configure Grafana Data Sources
Now we’ll connect Grafana to Prometheus and Loki.
Step 7.1: Add Prometheus Data Source
- Open Grafana:
http://<your-server-ip>:3000 - Click the hamburger menu (☰) → Connections → Data sources
- Click Add data source
- Select Prometheus
- Configure:
- Name:
Prometheus - Default: Toggle ON
- URL:
http://localhost:9090 - Scrape interval:
5m
- Name:
- Scroll down and click Save & test
You should see: ✓ Successfully queried the Prometheus API.
Step 7.2: Add Loki Data Source
- Click Add data source again
- Select Loki
- Configure:
- Name:
Loki - URL:
http://localhost:3100 - Maximum lines:
100000
- Name:
- Scroll down and click Save & test
You should see: ✓ Data source connected and labels found.
✓ Checkpoint: Both data sources are connected and working.
Part 8: Create DNS Security Dashboard
Time to build your security monitoring dashboard!
Step 8.1: Create a New Dashboard
- Click the hamburger menu (☰) → Dashboards
- Click New → New Dashboard
- Click + Add visualization
- Select Prometheus as the data source
Step 8.2: Panel 1 – Query Rate Over Time
This panel shows DNS queries per second.
Configuration:
- Title: DNS Queries per Second
- Query:
promql
rate(unbound_queries_total[5m])
- Legend:
{{job}} - Visualization: Time series (line chart)
Click Apply to save the panel.
Step 8.3: Panel 2 – Query Types Distribution
Shows the breakdown of DNS query types (A, AAAA, etc.).
Configuration:
- Click Add → Visualization
- Select Prometheus
- Title: Query Types
- Query:
promql
sum by (type) (rate(unbound_queries_total{type!=""}[5m]))
- Visualization: Pie chart or Bar chart
- Legend:
{{type}}
Click Apply.
Step 8.4: Panel 3 – Cache Performance
Monitors cache hit ratio.
Configuration:
- Title: Cache Hit Ratio
- Query:
promql
(
rate(unbound_cache_hits_total[5m])
/
(rate(unbound_cache_hits_total[5m]) + rate(unbound_cache_miss_total[5m]))
) * 100
- Unit: Percent (0-100)
- Visualization: Stat or Gauge
- Thresholds:
- Red: < 70
- Yellow: 70-85
- Green: > 85
Click Apply.
Step 8.5: Panel 4 – NXDOMAIN Rate (Threat Indicator)
High NXDOMAIN rates suggest reconnaissance or tunneling.
Configuration:
- Title: NXDOMAIN Rate (Suspicious if >20%)
- Query:
promql
(
rate(unbound_answers_rcode_total{rcode="NXDOMAIN"}[5m])
/
rate(unbound_queries_total[5m])
) * 100
- Unit: Percent (0-100)
- Visualization: Time series
- Thresholds:
- Green: < 5
- Yellow: 5-20
- Red: > 20
Click Apply.
Step 8.6: Panel 5 – Top Queried Domains (Logs)
Uses Loki to show most frequently queried domains.
Configuration:
- Title: Top Queried Domains
- Select Loki as data source
- Query (LogQL):
logql
topk(10,
sum by (domain) (
count_over_time({job="unbound"} |~ "reply" | regexp "(?P<domain>[a-zA-Z0-9.-]+)" [1h]
)
))
- Visualization: Bar chart
- Legend:
{{domain}}
Click Apply.
Step 8.7: Panel 6 – Failed Queries (Security Anomaly)
Shows queries that failed, which might indicate attacks.
Configuration:
- Title: Failed Queries
- Data source: Loki
- Query:
logql
{job="unbound"} |= "SERVFAIL" or "REFUSED" or "NXDOMAIN"
- Visualization: Logs
- Time range: Last 15 minutes
Click Apply.
Step 8.8: Panel 7 – Query Size Distribution
Large queries may indicate DNS tunneling.
Configuration:
- Title: Query Size Distribution
- Data source: Prometheus
- Query:
promql
histogram_quantile(0.99,
rate(unbound_request_size_bytes_bucket[5m])
)
- Visualization: Time series
- Legend: 99th Percentile Query Size
- Threshold alert: Alert if > 200 bytes
Click Apply.
Step 8.9: Panel 8 – Geographic Anomalies (If using GeoIP)
If you have GeoIP data in your logs:
Configuration:
- Title: Queries by Country
- Data source: Loki
- Query:
logql
sum by (country) (
count_over_time({job="unbound"} | json | country != "" [1h])
)
- Visualization: Geomap or Table
Click Apply.
Step 8.10: Save the Dashboard
- Click the save icon (💾) at the top right
- Name:
DNS Security Monitoring - Folder: Dashboards
- Click Save
✓ Checkpoint: You have a working DNS security dashboard!
Part 9: Create Security Alerts
Now let’s set up automated alerts for threats.
Step 9.1: Alert 1 – High NXDOMAIN Rate
Triggers when NXDOMAIN rate exceeds 20% (possible reconnaissance).
- Edit the NXDOMAIN Rate panel
- Click Alert tab
- Click Create alert rule from this panel
- Configure:
- Alert rule name: High NXDOMAIN Rate
- Condition:
- WHEN:
last() - IS ABOVE:
20
- WHEN:
- Evaluate every: 5m
- For: 10m (must be true for 10 minutes)
- Summary: NXDOMAIN rate is {{$value}}%, indicating possible DNS reconnaissance
- Click Save rule and exit
Step 9.2: Alert 2 – Unusual Query Volume
Triggers on sudden query spikes (possible DDoS).
- Create a new panel or edit Query Rate panel
- Click Alert tab
- Alert rule name: Unusual Query Volume
- Condition:
- WHEN:
avg() - IS ABOVE: Set based on your baseline (e.g., 150% of normal)
- WHEN:
- For: 5m
- Summary: Query volume spike detected: {{$value}} qps
- Click Save rule and exit
Step 9.3: Alert 3 – Cache Hit Ratio Drop
Triggers when cache performance degrades (possible cache poisoning).
- Edit the Cache Hit Ratio panel
- Alert rule name: Low Cache Hit Ratio
- Condition:
- WHEN:
last() - IS BELOW:
70
- WHEN:
- For: 15m
- Summary: Cache hit ratio dropped to {{$value}}%
- Click Save rule and exit
Step 9.4: Configure Alert Notifications
To receive alerts via email, Slack, etc.:
- Go to Alerting → Contact points
- Click Add contact point
- Choose your notification method (email, Slack, webhook, etc.)
- Configure credentials/settings
- Click Test to verify
- Click Save
Then create a notification policy:
- Go to Alerting → Notification policies
- Click New policy
- Select your contact point
- Configure routing rules
- Click Save
✓ Checkpoint: Alerts are configured and will notify you of threats.
Part 10: Testing Your Setup
Let’s verify everything works.
Test 10.1: Generate Test Queries
bash
# Generate some DNS queries
for i in {1..100}; do
dig @localhost google.com +short
dig @localhost example.com +short
dig @localhost github.com +short
done
Wait 1-2 minutes, then check your Grafana dashboard. You should see the query count increase.
Test 10.2: Simulate NXDOMAIN Reconnaissance
bash
# Query non-existent domains
for i in {1..50}; do
dig @localhost random-nonexistent-domain-$RANDOM.com +short
done
Check the NXDOMAIN rate panel. It should show an increase.
Test 10.3: Check Logs in Grafana
- Go to Explore (compass icon in sidebar)
- Select Loki data source
- Query:
{job="unbound"} - Click Run query
You should see your DNS logs appearing.
Test 10.4: Verify Metrics in Prometheus
- Go to Prometheus UI:
http://<your-server-ip>:9090 - Query:
unbound_queries_total - Click Execute
- Switch to Graph tab
You should see query metrics over time.
✓ Checkpoint: All components are working and data is flowing!
Part 11: Advanced Threat Detection Queries
Now that your system is working, here are some advanced queries for specific threats.
Detecting DNS Tunneling
PromQL Query:
promql
# Queries with unusually long domain names
histogram_quantile(0.99,
sum(rate(unbound_request_size_bytes_bucket[5m])) by (le)
) > 150
LogQL Query:
logql
# Find domains with high entropy (random-looking subdomains)
{job="unbound"}
| regexp "(?P<domain>[a-z0-9]{20,}\\..*)"
| line_format "{{.domain}}"
Detecting DGA (Domain Generation Algorithm)
logql
# Domains with consecutive random characters
{job="unbound"}
| regexp "(?P<domain>[a-z]{15,}\\.com)"
| line_format "{{.domain}}"
Detecting DNS Amplification Attacks
PromQL Query:
promql
# Large responses relative to queries
(
rate(unbound_response_size_bytes_total[5m])
/
rate(unbound_request_size_bytes_total[5m])
) > 10
Detecting Query Floods from Single Source
LogQL Query:
logql
# Count queries per source IP
topk(10,
sum by (source_ip) (
count_over_time(
{job="unbound"}
| regexp "(?P<source_ip>\\d+\\.\\d+\\.\\d+\\.\\d+)"
[5m]
)
)
)
Add These as Dashboard Panels
Create new panels with these queries to enhance your threat detection!
Part 12: Optimization and Maintenance
12.1: Tune Data Retention
Adjust how long data is kept:
Prometheus:
bash
sudo nano /etc/default/prometheus
Add:
ARGS="--storage.tsdb.retention.time=30d"
Loki: Already configured in config.yml (720h = 30 days)
12.2: Monitor Resource Usage
bash
# Check disk usage
df -h /var/lib/prometheus
df -h /var/lib/loki
# Check memory usage
free -h
# Monitor processes
htop
12.3: Set Up Automated Backups
bash
# Create backup script
sudo nano /usr/local/bin/backup-monitoring.sh
Add:
bash
#!/bin/bash
BACKUP_DIR="/backup/monitoring"
DATE=$(date +%Y%m%d)
mkdir -p $BACKUP_DIR
# Backup Grafana database
cp /var/lib/grafana/grafana.db "$BACKUP_DIR/grafana-$DATE.db"
# Backup configurations
tar -czf "$BACKUP_DIR/configs-$DATE.tar.gz" \
/etc/grafana/grafana.ini \
/etc/prometheus/prometheus.yml \
/etc/loki/config.yml \
/etc/promtail/config.yml
# Keep only last 7 days
find $BACKUP_DIR -mtime +7 -delete
echo "Backup completed: $DATE"
Make executable:
bash
sudo chmod +x /usr/local/bin/backup-monitoring.sh
Add to crontab:
bash
sudo crontab -e
Add:
0 2 * * * /usr/local/bin/backup-monitoring.sh >> /var/log/backup-monitoring.log 2>&1
12.4: Regular Maintenance Tasks
Weekly:
- Review alert history
- Check disk usage
- Update blocklists if using them
Monthly:
- Update software packages
- Review and adjust alert thresholds
- Analyze long-term trends
Quarterly:
- Full backup of all data
- Security audit
- Performance tuning
Troubleshooting Guide
Issue: No metrics appearing in Grafana
Check:
- Is Prometheus running?
bash
sudo systemctl status prometheus
- Is the exporter running?
bash
sudo systemctl status prometheus-unbound-exporter
curl http://localhost:9167/metrics
- Are targets up in Prometheus?
- Go to
http://<server>:9090/targets - All should show “UP”
- Go to
Issue: No logs appearing in Grafana
Check:
- Is Loki running?
bash
sudo systemctl status loki
- Is Promtail running and shipping logs?
bash
sudo systemctl status promtail
sudo journalctl -u promtail -n 50
- Are logs being written?
bash
sudo tail -f /var/log/unbound/unbound.log
- Check Promtail permissions:
bash
sudo ls -l /var/log/unbound/unbound.log
Issue: High disk usage
Solution:
- Reduce retention periods in Prometheus and Loki configs
- Enable log compression
- Implement log rotation
- Archive old data
Issue: Alerts not firing
Check:
- Alert rules are correctly configured
- Evaluation interval is appropriate
- Thresholds match your environment
- Notification channels are configured
- Test with:
bash
# Generate test traffic to trigger alerts
Issue: Slow dashboard performance
Solutions:
- Reduce time range displayed
- Increase scrape intervals
- Limit number of panels
- Use recording rules in Prometheus
- Add more resources (RAM)
Next Steps
Congratulations! You now have a working DNS security monitoring system. Here’s what to do next:
Immediate:
- ✅ Monitor for 24-48 hours to establish baselines
- ✅ Adjust alert thresholds based on your normal traffic
- ✅ Document your specific setup and customizations



