How Oracle RAC Handles Failover
In enterprise environments, database downtime is not an option. Applications are expected to run 24/7, even during hardware failures, OS crashes, or instance outages. Oracle Real Application Clusters (RAC) is designed to address this exact challenge by providing high availability and scalability.
One of the key strengths of Oracle RAC is its failover mechanism, which ensures that database services remain available even when a node or instance fails. In this blog, we’ll explore how Oracle RAC detects failures, reacts to them, and ensures minimal impact on applications.
What Is Failover in Oracle RAC?
Failover in Oracle RAC refers to the automatic relocation of database services and workloads from a failed instance or node to a surviving one without manual intervention.
Oracle RAC supports:
-
Instance failover
-
Node failover
-
Service-level failover
-
Transparent Application Failover (TAF)
-
Fast Application Notification (FAN)
All of these work together to maintain application availability.
Key Components Involved in RAC Failover
Oracle RAC failover is not handled by a single component. It is a coordinated effort among several layers:
1. Clusterware (Oracle Grid Infrastructure)
-
Monitors node health
-
Manages cluster membership
-
Restarts resources during failures
2. Cluster Synchronization Services (CSS)
-
Detects node membership
-
Prevents split-brain scenarios
-
Initiates node eviction if needed
3. Oracle RAC Instances
-
Each node runs its own database instance
-
All instances access the same shared database
4. Services
-
Logical workloads mapped to instances
-
Automatically relocated during failover
How Oracle RAC Detects Failure
Node Failure Detection
Oracle RAC uses network heartbeats through:
-
Private interconnect
-
Disk heartbeat (OCR/Voting disks)
If a node stops responding:
-
CSS detects heartbeat loss
-
Voting disks confirm node status
-
Cluster decides whether the node is alive or dead
Instance Failure Detection
An instance failure may occur due to:
-
ORA-600 / ORA-7445 errors
-
PMON crash
-
OS-level process termination
Clusterware immediately detects the failed instance and triggers failover actions.
What Happens During Node Failover?
When a RAC node fails, Oracle performs the following steps:
-
Failure Detection
-
Clusterware detects missing heartbeats
-
-
Node Eviction
-
Failed node is evicted to protect data integrity
-
-
Instance Termination
-
Instance on the failed node is marked as down
-
-
Resource Cleanup
-
Locks, enqueue resources, and memory structures are released
-
-
Service Relocation
-
Services running on the failed node are started on surviving nodes
-
-
Client Reconnection
-
Applications reconnect using SCAN listeners
-
What Happens During Instance Failover?
In instance failover:
-
The node remains up
-
Only the database instance crashes
Oracle RAC will:
-
Restart the failed instance (if configured)
-
Or relocate services to other running instances
-
Roll back uncommitted transactions using UNDO
Committed transactions remain intact.
Role of Services in RAC Failover
Oracle strongly recommends using services instead of SID-based connections.
Services provide:
-
Load balancing
-
Failover control
-
Performance management
During failover:
-
Services are automatically relocated
-
Applications reconnect to the new instance hosting the service
Example:
Fast Application Notification (FAN)
FAN allows applications to immediately know about failures.
Benefits:
-
Faster reconnection
-
Reduced connection timeouts
-
Efficient resource usage
FAN events notify:
-
Connection pool
-
Mid-tier servers
-
JDBC / OCI clients
Transparent Application Failover (TAF)
TAF enables automatic session reconnection.
TAF can:
-
Re-establish connections
-
Resume SELECT queries (not DML)
Limitations:
-
Does not protect uncommitted transactions
-
Best suited for read-only workloads
Application Continuity (AC)
Application Continuity improves upon TAF by:
-
Replaying in-flight requests
-
Supporting both SELECT and DML
-
Minimizing application disruption
This is widely used in modern Oracle RAC environments.
Role of SCAN in Failover
Single Client Access Name (SCAN):
-
Provides a single connection endpoint
-
Redirects clients to available listeners
-
Automatically handles node changes
SCAN ensures:
-
No client configuration changes
-
Seamless failover and load balancing
Example Failover Scenario
Scenario: Node 1 crashes suddenly.
Result:
-
Node 1 is evicted
-
Instance on Node 1 goes down
-
Services move to Node 2
-
Clients reconnect via SCAN
-
Database remains available
Total downtime:
-
Typically seconds, not minutes
Failover vs Switchover in RAC
| Feature | Failover | Switchover |
|---|---|---|
| Trigger | Unplanned failure | Planned activity |
| Automation | Automatic | Manual |
| Data Loss | None | None |
| Use Case | Hardware/OS crash | Maintenance |
Best Practices for RAC Failover
-
Always connect using services
-
Use SCAN listeners
-
Enable FAN and Application Continuity
-
Monitor Clusterware logs
-
Test failover scenarios periodically
-
Configure service placement policies
Common RAC Failover Myths
❌ RAC eliminates all downtime
✅ RAC minimizes downtime, not eliminates it completely
❌ Failover is only for node crashes
✅ Failover also handles instance and service failures
❌ RAC replaces Data Guard
✅ RAC and Data Guard solve different problems
Conclusion
Oracle RAC handles failover through a well-orchestrated combination of Clusterware, services, SCAN, and application-aware technologies. By detecting failures instantly and relocating workloads automatically, RAC ensures that database services remain available even in the face of unexpected failures.
For organizations running mission-critical workloads, understanding how RAC handles failover is essential for designing resilient, highly available database architectures.
Explore more with Learnomate Technologies!
Want to see how we teach?
Head over to our YouTube channel for insights, tutorials, and tech breakdowns: www.youtube.com/@learnomate
To know more about our courses, offerings, and team:
Visit our official website: www.learnomate.org
Interested in mastering Oracle Database Administration?
Check out our comprehensive Oracle RAC Training program here: https://learnomate.org/oracle-dba-training/
Want to explore more tech topics?
Check out our detailed blog posts here: https://learnomate.org/blogs/
And hey, I’d love to stay connected with you personally!
Let’s connect on LinkedIn: Ankush Thavali
Happy learning!
Ankush😎





