27 Jun, 2025
0 Comments
5 Mins Read

How I Set Up Archive Log Lag Monitoring in Oracle Data Guard?.. And Why You Should Too

Hey friends,

If you’ve been following my earlier posts, I promised to break down something super important in Oracle Data Guard that often gets ignored: archive log lag monitoring between your primary and standby databases.

Today, I’m fulfilling that promise.

I’ll walk you through exactly how I monitor lag, what kind of alerts I’ve set up, and the small things that saved me from big trouble in real-world situations. Whether you’re managing a single standby or an entire fleet of DR databases, this one’s for you.

Let’s get into it.

Why This Matters (And What Happened With Me)

A couple of years ago, I was handling a DR setup for a logistics client running round-the-clock operations. One night, the primary database went down, and we had to failover to the standby. But the standby was almost 2 hours behind because the MRP process had silently failed, and nobody noticed.

It was a mess.

After that, I made it a non-negotiable rule for myself: Always monitor log lag, transport delay, and apply lag, and alert yourself before your NOC does.

What Is Archive Log Lag, Again?

If you’re new to Oracle Data Guard or just never gave this much attention, let me quickly explain.

There are two types of lags that you should monitor:

Transport Lag: Time delay in sending archive logs from primary to standby
Apply Lag: Delay in applying those logs to the standby DB

For example, if your primary shipped archive log 1000 at 10:00 AM, and your standby applied it at 10:10 AM, that’s a 10-minute apply lag.

Sometimes it’s 2 minutes, sometimes 2 hours, and without monitoring, you’ll never know.

How You Can Check Lag Manually

If you want to see your current lag, log in to the standby database and run this:

sql
SELECT NAME, VALUE, UNIT 
FROM V$DATAGUARD_STATS 
WHERE NAME IN ('transport lag', 'apply lag');

ou’ll get output like:

sql
NAME VALUE UNIT -------------- ----------- --------------------- 
transport lag +00 00:03:00 day(2) to second(0) 
apply lag +00 00:07:00 day(2) to second(0)

That means your logs are 3 minutes late in arriving and 7 minutes late in being applied.

How I Monitor Archive Log Lag Using a Shell Script

Here’s the simple shell script I use in most projects. It runs via cron every 5 minutes and shoots out a mail if the apply lag crosses a set threshold (say 10 minutes).

Script: archive_lag_check.sh

bash
#!/bin/bash

# Set environment
. ~/.bash_profile

export ORACLE_SID=your_sid
export ORAENV_ASK=NO
. oraenv

THRESHOLD="00:10:00"

LAG=$(sqlplus -s / as sysdba <<EOF
SET pagesize 0 feedback off verify off heading off echo off
SELECT VALUE FROM V\\$DATAGUARD_STATS WHERE NAME='apply lag';
EOF
)

LAG=${LAG#+}
echo "Apply Lag: $LAG"

if [[ "$LAG" > "$THRESHOLD" ]]; then
  echo "ALERT: Apply Lag is $LAG" | mailx -s "Data Guard Apply Lag Alert" [email protected]
fi

This small script has saved me so many times, especially in high-load windows like year-end processing or sudden redo surges.

Automating It with Crontab

Add this to your crontab like:

bash
*/5 * * * * /u01/scripts/archive_lag_check.sh >> /u01/logs/lag_check.log 2>&1

It checks every 5 minutes and logs the result. You can set up SMS alerts, Teams notifications, or Grafana dashboards depending on your infra.

Real-Time Log Sequence Tracking

Sometimes, I just want to know the log difference between primary and standby.

Run this on both:

On Primary:

sql
SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE DEST_ID=1;

On Standby:

sql
SELECT MAX(SEQUENCE#) FROM V$ARCHIVED_LOG WHERE APPLIED='YES';
_ID=1;

If primary is at 1020 and standby is at 1018, you’ve got a 2 log lag.

Common Reasons for Lag (From My Experience)

I’ve seen several causes across client environments:

Slow network or firewall issues
Disk I/O bottlenecks on standby
Heavy DML activity causing redo spikes
MRP or LNS process stuck silently
Archivelog destination full or inaccessible

Sometimes it’s as silly as an OS patch that accidentally disabled the listener.

Always check alert logs and trace files when things feel off.

How Much Lag Is Okay?

Honestly, it depends on the business.

For banking or real-time trading, I’ve seen RPO < 30 seconds
For mid-level CRMs, 5–10 mins lag is tolerable
For daily batch systems, even 30–60 mins can be okay

But never assume, always align your alerting threshold with the business team’s recovery expectations.

Pro Tip: Add This to Your DR Checklist

Here’s what I always advise my juniors and even clients:

“Your standby is your safety net. If you’re not watching it, it’s just a false sense of security.”

Make archive lag checks part of your daily DBA checklist. Don’t leave it to chance.

Advanced Setup (For DevOps-Ready Teams)

If you’re using Prometheus + Grafana, you can integrate Oracle exporter and build dashboards that show:

Transport lag trends
Apply lag spikes
Log apply rate per hour
Real-time sequence difference

This kind of visibility is gold when you’re handling multi-TB databases and mission-critical workloads.

Summary

Here’s a quick wrap-up checklist:

Monitor transport & apply lag using V$DATAGUARD_STATS
Use shell script + cron + mail alerts
Align thresholds with RPO needs
Regularly compare sequence# on both ends
Investigate alert logs when things look slow
Never assume things are fine unless you’ve checked

Final Thoughts

Being an Oracle DBA isn’t just about keeping the database “up”, it’s about being proactive.

Archive log lag might sound like a small detail, but trust me, it can ruin your DR plan if ignored.

If you found this helpful, feel free to drop a message. I can also share a follow-up post on Data Guard broker-based monitoring or using OEM custom metrics.

Let’s keep learning, keep improving, and most importantly, keep our databases safe.

Conclusion

Monitoring archive log lag in Oracle Data Guard is not just a best practice, it’s a necessity. Whether you’re managing a single production environment or a complex DR setup, being proactive can save you from hours of recovery pain and potential data loss.

At Learnomate Technologies, we don’t just talk about these concepts, we train you to master them. Our Oracle DBA training programs are designed with real-time scenarios, production-level simulations, and hands-on monitoring setups, just like the one we discussed in this article.

📺 For more practical insights, walkthroughs, and demos, do check out our YouTube channel: www.youtube.com/@learnomate

🌐 Want to know more about our courses and services? Visit our website: www.learnomate.org

🔗 Stay connected with me on LinkedIn for regular DBA tips, career guidance, and real-world case studies: Ankush Thavali – LinkedIn

📝 If you want to read more about different technologies, feel free to explore our blog section: https://learnomate.org/blogs/

Let’s keep learning, keep sharing, and keep growing — together.

ANKUSH😎

How I Set Up Archive Log Lag Monitoring in Oracle Data Guard?.. And Why You Should Too

How I Set Up Archive Log Lag Monitoring in Oracle Data Guard?.. And Why You Should Too

Why This Matters (And What Happened With Me)

What Is Archive Log Lag, Again?

How You Can Check Lag Manually

How I Monitor Archive Log Lag Using a Shell Script

Script: archive_lag_check.sh

Automating It with Crontab

Real-Time Log Sequence Tracking

On Primary:

On Standby:

Common Reasons for Lag (From My Experience)

How Much Lag Is Okay?

Pro Tip: Add This to Your DR Checklist

Advanced Setup (For DevOps-Ready Teams)

Summary

Final Thoughts

Conclusion

Book a Free Demo