Managing Cluster Nodes
1. Cluster Node Overview:
- Definition: A cluster node in Oracle RAC refers to an individual server or instance
that is part of the larger Oracle RAC cluster. Each node runs its own Oracle instance,
but all instances work together to serve a single database. - Node Components: Nodes include Oracle Clusterware, Automatic Storage
Management (ASM), and the Oracle Database instance.
2. Adding Nodes to the Cluster:
- Preparation: Ensure the new node meets hardware and software requirements, and
has proper network configuration (public, private, and VIP). - Installation: Use the addnode.sh script or Oracle Universal Installer (OUI) to add a
new node to the existing cluster. This involves installing the Grid Infrastructure
software on the new node. - Post-Installation: After adding the node, configure the Oracle Database on the new
node using the Database Configuration Assistant (DBCA) to extend the existing RAC
database.
3. Removing Nodes from the Cluster:
- Preparation: Before removing a node, ensure that no active sessions or workloads
are running on it. - Removing: Use the delnode.sh script or Oracle Universal Installer (OUI) to remove
the node from the cluster. - Post-Removal: Reconfigure the cluster to ensure the remaining nodes handle the
workload properly. Adjust services and load balancing configurations if necessary.
4. Node Eviction:
- Concept: Node eviction occurs when Oracle Clusterware forcibly removes a node
from the cluster due to a failure or split-brain scenario. This is done to protect the
integrity of the database. - Reasons for Eviction: Common reasons include network issues, hardware failures,
or resource starvation. - Handling Evictions: Review logs (e.g., alert.log, crsd.log, cssd.log) to
diagnose the cause of the eviction and take corrective actions to prevent future
evictions.
5. Node Maintenance:
- Planned Maintenance: If you need to perform maintenance on a node, you can stop
the database instance on that node using srvctl and then perform the necessary
tasks. - Unplanned Maintenance: In case of unexpected failures, use Clusterware tools
(crsctl, srvctl) to relocate resources and ensure high availability. - Rebooting Nodes: When rebooting a node, ensure proper shutdown and startup of
Oracle services using crsctl or srvctl to avoid cluster disruptions.
6. Monitoring Cluster Nodes:
- Monitoring Tools: Use Oracle Enterprise Manager (OEM), Cluster Health Monitor
(CHM), and tools like crsctl, srvctl, and oswatcher to monitor node performance,
resource usage, and availability. - Key Metrics: Monitor CPU, memory, disk I/O, network latency, and interconnect
performance across all nodes to ensure optimal functioning.