What Happens When One Node Goes Down?
In an Oracle RAC (Real Application Clusters) environment, node failure is not an exception it’s an expected scenario. RAC is designed for high availability, meaning the database should continue running even if one node goes down.
This blog explains step-by-step what actually happens when one node goes down, from the moment of failure to full stabilization, in simple and practical DBA language.
What Does “One Node Goes Down” Mean?
A node can go down due to several reasons:
- Server crash or power failure
- OS hang or kernel panic
- Network failure
- Manual shutdown or reboot
- Hardware issues (CPU, RAM, disk)
In RAC terms, this means:
One instance + its local services are no longer available
But the database itself is NOT down.
Immediate Detection by Oracle Clusterware
Oracle Clusterware continuously monitors all nodes using:
- Voting disks
- Private interconnect
What happens first?
- Surviving nodes stop receiving heartbeat from the failed node
- Clusterware confirms the node is unreachable
- Node is declared evicted / down
This usually happens within a few seconds.
Instance on Failed Node Terminates
Once the node is marked down:
- The Oracle instance running on that node is terminated
- All background processes (PMON, SMON, DBWR, LGWR) stop
- Memory (SGA) on that node is lost
Any active sessions on that node are disconnected immediately.
What Happens to User Sessions?
Sessions on Failed Node
- All sessions connected to that instance are terminated
- Uncommitted transactions are rolled back
Sessions on Other Nodes
- Sessions on remaining nodes continue normally
- No impact if applications are RAC-aware
If Application Continuity (AC) / TAF is configured:
- Sessions may reconnect automatically
- In-flight transactions may be replayed
Global Cache Service (GCS) & Global Enqueue Service (GES)
These two services play a critical role after node failure.
GCS Actions
- Reassigns cache ownership
- Cleans up dirty buffers from failed instance
- Ensures data consistency
GES Actions
- Releases locks held by failed instance
- Prevents deadlocks
This process is called:
Instance Recovery
Instance Recovery on Surviving Node(s)
One of the surviving instances automatically performs instance recovery.
During Instance Recovery:
- Redo logs of failed instance are read
- Uncommitted transactions are rolled back
- Committed but unapplied changes are applied
Data consistency is fully restored
Duration depends on:
- Number of active transactions
- Redo generated
- System load
Services Failover
Oracle RAC services are configured with preferred and available instances.
When a node goes down:
- Services running on failed node are relocated
- They start on surviving node(s)
Applications using SCAN listeners automatically connect to new instances.
What Happens to SCAN and Listeners?
- SCAN listeners remain available
- Local listener on failed node stops
- SCAN redirects connections to healthy nodes
End users usually don’t notice anything except a brief reconnect.
ASM Behavior During Node Failure
If ASM is used:
- ASM instance on failed node goes down
- ASM on surviving node continues
- Disks remain accessible
If redundancy is NORMAL/HIGH:
- No data loss
- ASM rebalance is not triggered immediately
Alerts and Logs Generated
As a DBA, you’ll see alerts in:
- Clusterware alert log
- Database alert log
- CRS logs
- OEM alerts (if configured)
Typical messages:
- Node eviction detected
- Instance terminated
- Instance recovery completed
What DBA Should Check After Node Failure
Immediate Checks
crsctl stat res -tolsnodes -n -s- Database instance status
Logs to Review
- CRS alert log
- Database alert log
- OS logs
Recovery Actions
- Fix OS / network / hardware issue
- Restart node
- Verify services placement
Key Points to Remember (Interview Gold)
- RAC is designed to survive node failures
- Database remains available
- Only sessions on failed node are affected
- Instance recovery is automatic
- No manual DBA intervention required (usually)
Final Summary
When one node goes down in Oracle RAC:
- Clusterware detects failure
- Instance terminates
- Sessions on that node are lost
- Other nodes continue working
- Services failover automatically
- Data consistency is maintained
This is true high availability in action.
Learning Oracle RAC doesn’t have to be complicated. At Learnomate Technologies we focus on clear explanations, hands-on learning, and real DBA scenarios that actually happen in production.





