Oracle RAC Interview Questions

Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
Breadcrumb Abstract Shape
  • User AvatarShripad Musale
  • 23 Jan, 2024
  • 0 Comments
  • 33 Mins Read

Oracle RAC Interview Questions

Why we are using vip in RAC?

Before Oracle Database 10g, there was no concept of VIP, and in RAC environments, clients had to connect directly to individual node IP addresses. Virtual IP (VIP) in Real Application Clusters (RAC) is used to provide a single, virtual IP address that can be used to connect to any node in the cluster. It helps ensure seamless connectivity during node failures or relocations. The concept of VIP was introduced in Oracle 10g to improve high availability and simplify client connections in RAC environments.

If user connected to the INSTANCE  using physical ip, and if the node goes down, then there is no way for the user to know whether node is available or not. So it need to wait for a long time, till it gets timed out by the network.

However If i use logical vip( on top of physical vip), then if node goes down, then CRS will failover this vip to other surviving node. And the user will get the connection error quickly( like TNS no listener available ).

If I have a 8-node RAC, then how many scan listeners are required?

For an 8-node Real Application Clusters (RAC), you typically need three Scan Listeners.It is not mandatory for scan listener to run on all the nodes.

 

How SCAN knows which node has least load?

The SCAN (Single Client Access Name) does not inherently know the load on individual nodes. Load balancing is typically managed by the underlying clusterware or load balancing algorithms, which distribute client connections across the nodes in a balanced manner.

Explain how client connection is established in RAC database ?

 LREG process on each instances registers the database service of the node with default local listener and scan listener. The listeners store the workload information of each node.

So when client tries to connect using scan_name and port,

  • scan_name will be resolved through DNS , which will redirect to 1st scan ip ( out of 3).
  • The client will connected to the respective scan listener.
  • The scan listener compares the work load of both the instances and if scan determines that node1 has least load , then scan listener send the vip address  and port details of that particular nodes local listener to client.
  • Address  and port details of that particular nodes local listener to client.
  • Now client connects to that local listeners and a dedicated server process is created.
  • Client connection becomes successful and it starts accessing the database.

What current block and CR block and PI in RAC?

Data block requests from global cache are of two types.

current block(cur) – When we want to update a data, oracle must locate the most recent version of the block in cache, it is known as current block

consistent read(CR) – When we want to read a data, then only committed data will be provided( with help of undo). that is known as consistent read.

Past image(PI) – When  node A wants to updates the block, which is present on node B and node B has also updated the block , then node B will send the current copy of the block to Node A, it will keep a past image( PI)  of the block , until it is written to the disk. Once commit happens on node B for that transaction or when checkpoint happens , the PI images will be flushed to disk.

There can be multiple CR blocks, But there will be always one Current block.

There can multiple scur(shared current) , But only xcur( exclusive current).

 

What is gc buffer busy wait?

Mean a session is trying to access a buffer in buffer cache, But that particular buffer is currently busy with global cache operation.So during that time gc buffer busy wait will happen.

Example –

  • Let’s say session A want to access block id 100 , But currently that block is in buffer cache of session B.
  • So session A requested session B LMS process to transfer the block.
  • While transfer is going on , session B also tried to access that block. But as already that block/buffer is already busy in global cache operation. Session B has to wait with wait event, gc buffer busy wait.

Reasons – Concurrency related, Right hand index growth.

 

What are some RAC specific parameters ?

  • undo_tablespaces
  • cluster_database
  • cluster_interconnects
  • remote_listener
  • thread
  • cluster_database_instances

Why RAC has separate redo thread for each node?

In RAC, each instance have their own lgwr process , So there has to be separate online redolog for each instance ( called as thread), So that lgwr will write to the respective redo log.

Why RAC has separate undo tablespace for each node?

In Oracle Real Application Clusters (RAC), each node has a separate undo tablespace to facilitate the management of undo data in a distributed and parallel processing environment. The primary reasons for having separate undo tablespaces for each node in a RAC setup include:

Isolation of Undo SegmentsEach node in an RAC cluster has its own set of undo segments within the dedicated undo tablespace .This isolation helps prevent contention and potential performance issues that could arise if multiple nodes were contending for the same undo segments.

Parallel Processing and ScalabilityRAC environments are designed to support parallel processing across multiple nodes for improved scalability and performance .Having separate undo tablespaces allows each node to manage its undo data independently, minimizing the need for coordination between nodes during undo operations.

Reduced Global Resource ContentionThe separation of undo tablespaces reduces the likelihood of global resource contention for undo space .Without separate undo tablespaces, multiple nodes might contend for the same undo segments, leading to contention and potential performance bottlenecks.

Enhanced ConcurrencySeparate undo tablespaces enhance concurrency by allowing each node to manage its undo transactions independently .This helps avoid serialization and contention for undo resources, which is crucial for maintaining high levels of concurrent activity in an RAC environment.

Improved High AvailabilityIn the event of a node failure or partitioning of the RAC cluster, having separate undo tablespaces ensures that each surviving node can continue managing its undo data independently. This improves the high availability of the database by reducing the impact of failures on undo operations.

What data we need to check in vmstat and iostat output?

vmstat:

  • CPU Utilization:
    Utilization percentages for user, system, and idle states.
    High CPU utilization might indicate processing bottlenecks.
  • Memory Utilization:
     Memory statistics including total, used, free, buffers, and cache.
    High memory usage might indicate memory contention or insufficient memory.
  • Virtual Memory (Swap):
    Swap-in and swap-out rates.
    Excessive swapping suggests memory pressure.
  • I/O Wait:
     Percentage of time the CPU spends waiting for I/O operations to complete.
     High I/O wait times suggest I/O bottlenecks.
  • System and Interrupts
    System and hardware interrupts per second.
    High interrupt rates may indicate hardware issues or heavy system activity.
    iostat:
  • Disk Utilization:
    Disk I/O statistics such as average I/O requests per second, kilobytes read and written      per second, and average wait time.
    High disk utilization might indicate disk I/O bottlenecks.
  • Disk Service Time:
    Average time taken to service I/O requests.
    High service times suggest slow disk response.
  • I/O Wait:
    Percentage of time the CPU spends waiting for I/O operations to complete.
     Correlate with CPU utilization from vmstat to identify if high I/O wait times affect overall   system performance.
  • Device and CPU Utilization:
    Device utilization percentage and CPU utilization percentage.
    Correlate with vmstat CPU utilization to understand how much of the CPU time is spend  on disk I/O operations.
  • Transfer Rates:
     Average and instantaneous transfer rates in kilobytes per second.
     Monitor for sustained high or low transfer rates, which may indicate performance issues    or changes in workload.

What is Flex Cluster introduced in oracle 12c?

Oracle Flex Cluster, introduced in Oracle Database 12c, is a clustering architecture offering two types of server pools: Hub-and-Spoke and Uniform. It provides flexible resource management, simplified administration, and enhanced high availability for Oracle RAC environments.

Hub-and-Spoke Architecture

Uniform Cluster Architecture

Dynamic Resource Management

Enhanced High Availability

Simplified Administration

What is TAF?

Automatic Failover: TAF allows client connections to be automatically redirected to another available RAC node if the currently connected node or instance fails.

Transparent to Applications: TAF is transparent to the application layer, meaning that applications do not need to handle connection failover logic explicitly. The failover process is managed by Oracle client libraries and Oracle Net Services.

Session Persistence: TAF maintains session state during failover, ensuring that in-flight transactions and session data are preserved when switching to a new node.

Fast Recovery: TAF facilitates fast recovery by quickly redirecting client connections to an available node, minimizing downtime and disruption to application users.

Supported Failover Modes: TAF supports various failover modes, including Failover, Restart, and Session State. These modes determine the behavior of client connections during failover events.

 

Can we start crs in exclusive mode? and its purpose?

Yes, Oracle Clusterware (CRS) can be started in exclusive mode. It is used for maintenance tasks, troubleshooting, and ensuring isolation of a single node from the cluster to perform critical operations without affecting other nodes’ services.

ASM is running , but the database is not coming up? What might be the issue?

If ASM (Automatic Storage Management) is running, but the database is not coming up, several potential issues could be causing this problem:

Listener Configuration: Ensure that the listener is running and properly configured to handle connections for the database instances. Check the listener.ora file for correct settings.

Database Initialization Parameters: Review the database initialization parameters (init.ora or spfile) to ensure they are correctly configured for ASM storage. Pay attention to parameters such as ASM_DISKGROUPS, ASM_DISKSTRING, and DB_CREATE_FILE_DEST.

ASM Disk Group Availability: Verify the availability and health of the ASM disk groups. If a disk group is offline or in a dismounted state, it can prevent the database from starting. Use ASM commands like asmcmd to check the status of disk groups.

ASM Instance Status: Check the status of the ASM instance. If the ASM instance is down or in a restricted mode, it may be preventing the database from accessing the required storage.

Disk or Storage Issues: Investigate any disk or storage-related problems that may be affecting ASM operations. This could include disk failures, storage connectivity issues, or insufficient space in ASM disk groups.

Network Configuration: Ensure that there are no network issues preventing communication between the database instances and the ASM instance. Check for firewall rules, network congestion, or DNS resolution problems.

Permission and Ownership: Verify that the Oracle processes have appropriate permissions and ownership to access ASM disks and files. Incorrect permissions can lead to startup failures.

Diagnostic Logs: Check the alert log, ASM alert log, and other diagnostic logs for error messages or warnings that may provide clues about the cause of the startup failure.

If crs is not coming up , then what are things you will start looking into?

If CRS is not coming up, I would start by checking logs, reviewing alert logs, verifying dependencies, checking disk space, reviewing configuration files, verifying permissions, checking cluster interconnect, reviewing OS logs, checking cluster health, and investigating hardware failures.

Explain about local_listener and remote_listener parameter in RAC?

Local_listener:

Definition: This parameter specifies the network address that an Oracle RAC instance uses to listen for local connection requests.

Purpose: The local listener is responsible for managing connection requests originating from the same node where the instance is running.

Configuration: The local_listener parameter is set to a TNS entry or a net service name that resolves to a network address (hostname and port) where the local listener is running.

ALTER SYSTEM SET LOCAL_LISTENER = ‘mydb_listener’ SCOPE=both;

Here, ‘mydb_listener’ is a TNS entry pointing to the local listener’s address.

Remote_listener:

Definition: This parameter specifies one or more network addresses that an Oracle RAC instance uses to listen for remote connection requests from other instances in the cluster.

Purpose: The remote listener facilitates communication between instances on different nodes in the RAC cluster.

Configuration: The remote_listener parameter is set to a list of TNS entries or net service names that resolve to network addresses (hostnames and ports) where the remote listeners are running on other nodes in the cluster.

ALTER SYSTEM SET REMOTE_LISTENER = ‘node1_listener, node2_listener’ SCOPE=both;

Here, ‘node1_listener’ and ‘node2_listener’ are TNS entries pointing to the remote listeners on different nodes.

 

What are local registry and cluster registry?

Local Registry:
The local registry, also known as the local node registry, is specific to each node in the Oracle RAC cluster.
It stores configuration information relevant to the local instance, such as instance-specific parameters, initialization parameters, and certain cluster-related settings.
Each RAC node maintains its own local registry to manage its local instance’s configuration independently.

Cluster Registry:
The cluster registry is a shared repository that contains configuration information relevant to the entire RAC cluster.
It stores cluster-wide configuration settings, such as SCAN (Single Client Access Name) configurations, service configurations, and global resources like Oracle Clusterware resources.
The cluster registry is typically maintained in a shared location accessible by all nodes in the RAC cluster.
It facilitates centralized management of cluster-wide settings and resources, ensuring consistency and coherence across all nodes.

What is client side load balancing and server side load balancing?

Client-side load balancing: Clients distribute requests across multiple servers, deciding which server to connect to based on various algorithms such as round-robin or least connections.

Server-side load balancing: Load balancing is handled by a dedicated device or software component, such as a load balancer or proxy server, which distributes incoming requests among multiple servers based on predefined algorithms and server health checks.

What are the RAC related background processes?

LMSn (Lock Manager Server): Manages global resource coordination and distributed locks across RAC instances.

LMD (Lock Manager Daemon): Coordinates lock-related messaging among RAC instances and manages lock conversion requests.

LMON (Global Enqueue Service Monitor): Monitors global resources and manages global enqueue services, such as space allocation for global resources.

LMSn (Global Cache Service Process): Handles cache fusion operations by transferring data blocks between RAC instances to maintain cache coherence.

GCR (Global Cache Service Resources): Coordinates global cache resources and manages cache fusion and resource requests.

GCS (Global Cache Service): Manages the global cache and facilitates cache fusion operations between RAC instances.

RMSn (Resource Manager Server): Controls resource allocation and prioritization for database sessions based on resource management policies.

RBAL (Rebalance): Coordinates instance and resource rebalancing operations to maintain workload distribution across RAC instances.

MMON (Manageability Monitor): Monitors database and instance health, collecting statistics and managing advisory components.

MMNL (Manageability Monitor Light): Light version of MMON for low-priority monitoring tasks.

DIA0 (Diagnosability Process): Handles diagnostic data collection and monitoring for cluster-related issues.

LREG (Local Enqueue Service Registration): Registers local enqueue services with the Global Cache Service (GCS).

 

How instance recovery happens in oracle RAC?

When any one of the instance is crashed in RAC, then this node failure is detected by the surviving instances. Now the GRD resouces will be distributed across the existing instances. The instance which first detects the crash, will the start the online redo log thread of the crashed instance.  The SMON of that instance.

Sequence

  • Normal RAC operation, all nodes are available.
  • One or more RAC instances fail.
  • Node failure is detected.
  • Global Cache Service (GCS) reconfigures to distribute resource management to the surviving instances.
  • The SMON process in the instance that first discovers the failed instance(s) reads the failed instance(s) redo logs to determine which blocks have to be recovered.
  • SMON issues requests for all of the blocks it needs to recover.  Once all blocks are made available to the SMON process doing the recovery, all other database blocks are available for normal processing.
  • Oracle performs roll forward recovery against the blocks, applying all redo log recorded transactions.
  • Once redo transactions are applied, all undo records are applied, which eliminates non-committed transactions.
  • Database is now fully available to surviving nodes. will read the redo to do rollforward ( i.e to apply both committed and noncommited data). Once rollforward is done, it will rollback the uncommited transactions using UNDO tablespace of the failed instance.

What is TAF in oracle RAC?

BASIC

PRECONNECT

SELECT  FAILOVER

SESSION FAILOVER

Can we have multiple SCAN(name) in a RAC?

From 12c onwards, We can have multiple scan with different subnets. As part of installation only scan will be configured. Post installation we need to configure another SCAN with different subnet( If required).

In RAC, where we define the SCAN?

We can define SCAN with below 2 option.

Using corporate DNS

Using Oracle GNS( Grid naming service)

What g stand for in views like gv$session , gv$sql etc.?

In Oracle Database, the “g” in views like gv$session, gv$sql, and similar views stands for “Global.” These views provide a global or cluster-wide perspective of database activity and resources in Oracle Real Application Clusters (RAC) environments.
The “gv” prefix indicates that these views are “Global Views,” meaning they aggregate information from all instances in the RAC cluster. They allow administrators and users to monitor and manage database resources and activities across all nodes in the cluster from a single point of access.

What is load balancing advisory?

Load Balancing Advisory in Oracle Database is a feature designed to assist in distributing client connections across instances within a Real Application Clusters (RAC) environment. It provides recommendations to help determine the optimal instance to direct a new connection request based on the current workload and resource utilization across the cluster.
The Load Balancing Advisory collects and analyzes statistics related to instance performance, such as CPU utilization, I/O rates, and other metrics. Based on these statistics, it determines which instance is the least loaded and most suitable to handle incoming connection requests.
By leveraging the Load Balancing Advisory, Oracle Connection Managers can intelligently route new connection requests to the recommended instance, thereby achieving better load distribution, maximizing resource utilization, and improving overall system performance in Oracle RAC deployments.

What is gc cr 2 way and gc cr 3 way?

gc cr 2-way (Global Cache Consistent Read 2-way):
This wait event occurs when an instance in the RAC environment is waiting for a consistent read block from another instance.
It indicates contention between two instances for access to the same consistent read block in the global cache.
The number “2-way” signifies that the contention is between two instances.
gc cr 3-way (Global Cache Consistent Read 3-way):
Similar to “gc cr 2-way”, this wait event occurs when an instance is waiting for a consistent read block from another instance.
However, “gc cr 3-way” indicates contention among three instances for access to the same consistent read block in the global cache.
It signifies contention between three instances.

 

What is the role of LMON background process?

The LMON (Global Enqueue Service Monitor) background process in Oracle RAC manages and monitors global resources, including global enqueue locks and resources. It helps ensure consistency and coordination across multiple instances in the RAC cluster by managing distributed lock management and resolving lock conflicts.

What are some RAC related wait events?

Some RAC-related wait events include:

gc buffer busy: Indicates contention for access to a global cache buffer in a RAC environment.

gc cr block busy: Denotes contention for access to a consistent read block in the global cache.

gc current block busy: Indicates contention for access to the current block in the global cache.

gc freelist: Denotes contention for access to the global cache freelist.

gc grant 2-way: Indicates contention for global cache resource grants between two RAC instances.

What is ACMS?

ACMS stands for Atomic Controlfile to Memory Service. It synchronizes the control file updates across Oracle Real Application Clusters (RAC) instances, ensuring consistency and high availability of database metadata.

Explain different ways to find master node in oracle rac?

Grep occsd Log file. [oracle @ tadrac1]: /u1/app/../cssd >grep -i “master node” ocssd.log | tail -1. …
Grep crsd log file. [oracle @ tadrac1]: /u1/app/../crsd>grep MASTER crsd.log | tail -1.
Query V$GES_RESOURCE view.
ocrconfig -showbackup. The node that store OCR backups is the master node.

 

Who updates OCR and how/when it gets updated?

OCR is updated by clients application and utilities through CRSd process.
1.tools like DBCA,DBUA,NETCA,ASMCA,CRSCTL,SRVCTL through CRsd process.
2. CSSd during cluster setup
3.CSS during node addition/deletion.
Each node maintains a copy of OCR in the memory. Only one CRSd(master) , performs read, write to the OCR file . Whenever some configuration is changed, CRSd process will refresh the local OCR cache and remote OCR cache and updates the OCR FILE in disk.
So whenever we try get cluster information using srvctl or crsctl , then it uses the local ocr for fetching the data . But when it modify , then through CRSd process, it will updates the ocr physical file).

 

What is cache fusion in oracle RAC? and its benefits?

Cache Fusion in Oracle RAC (Real Application Clusters) is a feature that enables instances in a cluster to share data blocks directly from one instance’s memory cache to another. This minimizes disk I/O and inter-instance messaging, enhancing performance and scalability. Benefits include improved response times, efficient resource utilization, high availability, and simplified application design due to abstracted complexities of data sharing in a clustered environment.

What is OCR and what it contains?

OCR is the central repository for CRS, which stores the metadata, configuration and state information for all cluster resources defined in clusterware.
node membership information
status of cluster resources like database,instance,listener,services
ASM DISKGROUP INFORMATION
Information ocr,vd and its location and backups
vip and scan vip details.

Explain split brain in oracle RAC.

In Oracle RAC, “split brain” refers to a situation where cluster nodes lose communication, leading to partitions in the cluster. This can cause data inconsistencies and corruption because nodes act independently. Oracle RAC uses mechanisms like heartbeat monitoring, quorum, and split-brain resolution to prevent and resolve such scenarios, ensuring data integrity and cluster reliability.

What is OLR and why it is required?

While starting clusterware, it need to access the OCR , to know which resources it need to start. However the OCR file is stored inside ASM, which is not accessible at this point( because ASM resource also present in OCR file.
To avoid this, The resources which need to be started on node is stored in  operating file system called as OLR ( Oracle local registry). Each node will have their OLR file.
So when we start the clusterware, this file will be accessed first.

 

Difference between crsctl and srvctl?

crsctl (Cluster Ready Services Control):
crsctl is a utility used for managing the Oracle Clusterware software itself, which forms the foundation of Oracle RAC and Oracle Grid Infrastructure environments.
It allows administrators to perform various administrative tasks related to Oracle Clusterware, such as starting and stopping the clusterware stack, managing cluster resources, managing voting disks, managing Oracle Cluster Registry (OCR), and monitoring the health of cluster components.
crsctl commands are typically used for low-level cluster management and diagnostics.

srvctl (Server Control):
srvctl is a utility designed specifically for managing Oracle RAC (Real Application Clusters) instances and services.
It enables administrators to perform common administrative tasks related to Oracle RAC databases, instances, services, and node applications in a more user-friendly and database-centric manner.
With srvctl, administrators can create, delete, start, stop, relocate, and manage Oracle RAC instances and services easily.
srvctl commands are used to manage high-level database and service operations within an Oracle RAC environment.
In summary, while both crsctl and srvctl are important tools for managing Oracle cluster environments, crsctl focuses on managing the underlying Oracle Clusterware infrastructure, while srvctl is specifically tailored for managing Oracle RAC database instances and services.

What are the storage structures of a clusterware?

2 shares storage structure – OCR , VD
2 local storage structure – OLR, GPNP profile.

 

My clusterware version is 11gr2 , can i install oracle 12c database? is the viceversa possible( means clusteware version 12c and oracle database version 11g?)?

My clusterware version can be same or higher than the the database version. But a 12c database will not work on 11g grid.
I want to run a parallel query in rac database, But I need to make sure that, the parallel slave processes will run only on node where i am running the query and it will not move to other node.
We can set Parallel_force_local parameter to TRUE at session level and then run the parallel query. All the px processes will run only on that node

 

What is the purpose of Voting disk?

Voting disk stores information about the nodes in the cluster and their heartbeat information. Also stores information about cluster membership.

How instance recovery happens in oracle RAC?

In Oracle Real Application Clusters (RAC), instance recovery occurs when a database instance needs to recover from a failure or crash. Instance recovery is a crucial process in RAC environments to maintain data consistency and availability across multiple nodes.
Here’s an overview of how instance recovery happens in Oracle RAC:
Instance Failure Detection: When an Oracle RAC instance fails due to a hardware or software issue, the other instances in the cluster detect the failure. This detection can happen through mechanisms like cluster interconnect heartbeat checks or other monitoring processes.
Fencing Mechanism: Before initiating instance recovery, Oracle uses a fencing mechanism to isolate the failed instance from the cluster. This prevents the failed instance from accessing shared resources and ensures data integrity during recovery.
Roll Forward and Rollback: Once the failed instance is isolated, Oracle RAC performs instance recovery by applying changes from the redo log files to the data files. This process involves rolling forward committed transactions that were not yet written to disk at the time of the failure and rolling back uncommitted transactions.
Redo Log Application: Oracle RAC replays the changes recorded in the redo log files to bring the data files to a consistent state. This involves applying redo records to the data blocks affected by uncommitted transactions and transactions that were in progress during the failure.
Parallel Recovery: In Oracle RAC, instance recovery can occur in parallel across multiple instances to expedite the process and minimize downtime. Each surviving instance in the cluster contributes to the recovery process by applying redo records to its respective data files.
Resource Reintegration: Once instance recovery is complete and the failed instance is restored, it rejoins the cluster and resumes its role in serving database requests. The fencing mechanism is lifted, and the cluster resources are once again shared among all active instances.
Automatic Restart: Oracle RAC can be configured for automatic instance recovery, where Oracle Clusterware automatically restarts failed instances to minimize downtime and maintain high availability.
Overall, instance recovery in Oracle RAC involves the coordinated effort of multiple cluster nodes to restore the failed instance to a consistent state while ensuring data integrity and availability across the cluster.

Why we need voting disk?

We need a voting disk in Oracle Real Application Clusters (RAC) to determine cluster membership, establish quorum, prevent split-brain scenarios, coordinate resource ownership, and facilitate failure recovery.

What is dynamic remastering?

Mastering of a block means, master instance will keep track of the state of blocks  until the remastering happens due of few of the scenarios like instance crash etc.
GRD stores useful infor like data block address, block status, lock information, scn, past image etc. Each instance have some of the GRD data in their SGA. i.e any instance which is master of the block or resource , will maintain the GRD of that resource in their SGA.
Mastering of a resource is decided based on the demand. If a particular resource is mostly accessed from node 1, then node1 will become the master of that resource. And if after some time if node 2 is heavily accessing the same resource, then all the resource information will be moved the node2 GRD.
LMON, LMD, LMS are responsible for dynamic remastering.
Remastering can happen due to below scenarios.
Resource affinity – > GCS keeps tracks of the number of GCS  request per instance and per objects . If one instance is heavily accessing the object blocks, compare to other nodes, Then gcs can take decision to migration all the object resource to the heavily accessed instance.
Manually remastering – > We can manually remaster a object
Instance crash – > If instance is crashed, the the its GRD data will be remastering to the existing instances in cluster.

What is GPNP profile?

The GPNP file is a configuration file used by Oracle Grid Engine to define the characteristics and behavior of grid computing resources within a cluster. This file typically contains information about the various hosts (computers) in the grid, the available resources on each host (such as CPU, memory, and storage), and the policies governing the allocation and scheduling of jobs across the grid.
The GPNP file helps administrators manage and optimize the utilization of resources within the grid, ensuring that jobs are executed efficiently and that computing resources are allocated in accordance with the organization’s priorities and requirements.

What are the software stacks in oracle clusterware?

Oracle Clusterware includes several software stacks that provide essential functionality for high availability and scalability in Oracle’s clustered environments. These software stacks include:
Cluster Ready Services (CRS): Manages cluster node membership, monitors node health, and coordinates cluster-wide operations.
Cluster Synchronization Services (CSS): Ensures consistent cluster node membership and configuration
across all nodes in the cluster.
Event Manager (EVM): Monitors and responds to cluster events, such as node failures or resource state
changes.
Cluster Interconnect: Facilitates communication and data transfer between nodes in the cluster, typically through dedicated network interfaces or interconnects.
These software stacks work together to provide a reliable and robust foundation for Oracle’s clustered solutions, ensuring high availability and fault tolerance for critical enterprise applications.

What are the role of CRSD,CSSD,CTSSD, EVMD, GPNPD?

In Oracle Clusterware, each component plays a crucial role in maintaining the integrity, availability, and coordination of resources within the cluster. Here’s a brief overview of the roles of each component:
CRSD (Cluster Ready Services Daemon):

  • Manages the cluster resources and services.
  • Monitors the health and status of cluster resources.
  • Handles resource startup, shutdown, and failover.

CSSD (Cluster Synchronization Services Daemon):

  • Maintains the cluster configuration and membership information.
  • Synchronizes the cluster state across all nodes.
  • Facilitates communication and coordination between cluster nodes.

CTSSD (Cluster Time Synchronization Service Daemon):

  • Ensures that the system time is synchronized across all cluster nodes.
  • Maintains a consistent time reference for all cluster components.
  • Helps prevent issues related to time skew or inconsistencies in a clustered environment.

EVMD (Event Manager Daemon):

  • Monitors events and changes within the cluster.
  • Responds to critical events such as node failures, resource failures, or changes in cluster state.
  • Coordinates the execution of recovery actions and failover procedures in response to events.

GPNPD (Grid Plug and Play Daemon):

  • Manages the dynamic discovery and configuration of grid resources.
  • Facilitates the automatic registration and management of resources within the cluster.
  • Helps streamline the deployment and administration of grid computing environments.
  • These components work together to provide a robust and highly available infrastructure for Oracle clustered environments, ensuring that resources are properly managed, synchronized, and monitored to maintain continuous operation and minimize downtime.
 

What is GES and GCS?

GES (Global Enqueue Service):
GES is responsible for managing and coordinating access to shared resources or data structures, known as enqueues, across the cluster.
Enqueues are used to enforce concurrency controls and prevent conflicting access to shared resources, such as database blocks, tables, or other critical data structures.
GES ensures that only one process or node can access a specific resource at any given time, thereby maintaining data integrity and preventing data corruption or inconsistency.
GCS (Global Cache Service):
GCS facilitates the distributed caching of data blocks across the cluster’s nodes.
It maintains a consistent and coherent cache of database blocks in memory across multiple instances of the Oracle Database running on different nodes.
GCS enables efficient data access and reduces contention by allowing database instances to share cached data blocks rather than accessing them from disk.
It coordinates cache fusion, a mechanism that allows database instances to directly access and modify data blocks in the local cache of other instances, enhancing performance and scalability in clustered environments.

ASM spfile is stored inside ASM diskgroup, So how clusterware starts the ASM instance( as asm instance needs asm file startup)?

When Oracle Clusterware starts up the ASM instance, it follows a specific process to ensure that the ASM spfile (server parameter file) is accessible. Here’s how it typically works:
Grid Infrastructure Startup: Oracle Clusterware, also known as Grid Infrastructure, starts up first. This includes starting the Cluster Ready Services (CRS) and other necessary components.
ASM Instance Startup: After Grid Infrastructure is up and running, the ASM instance is started. The ASM instance is a special instance that manages the ASM disk groups where the Oracle database files are stored.
ASM Parameter File (SPFILE) Access: When strting the ASM instance, Oracle Clusterware knows where the ASM SPFILE is located. It could be in one of the ASM disk groups or located in a file system.
If the SPFILE is stored in an ASM disk group, Oracle Clusterware accesses it directly from the disk group during the ASM instance startup process.
If the SPFILE is stored in a file system, Oracle Clusterware reads it from there and provides it to the ASM instance during startup.
ASM Instance Startup Parameters: The ASM instance uses the parameters specified in the SPFILE (or PFILE if configured) to initialize itself during startup. These parameters include information about disk groups, redundancy levels, ASM disk locations, and other configuration settings.
Disk Group Mounting: Once the ASM instance is up and running, it mounts the ASM disk groups specified in its configuration. These disk groups contain the actual Oracle database files like datafiles, control files, and redo logs.
Database Startup: After the ASM instance is running and ASM disk groups are mounted, Oracle databases that rely on ASM for storage can be started. These databases reference the ASM disk groups for their files.
In summary, Oracle Clusterware coordinates the startup process for ASM instances, ensures that the necessary configuration files like the SPFILE are accessible, and manages the initialization of ASM disk groups where database files are stored. This coordination is crucial for the overall availability and reliability of the Oracle database environment.

What is HAIP?

HAIP stands for Highly Available IP (HAIP). It is a feature provided by Oracle Real Application Clusters (RAC) in Oracle Database environments. HAIP facilitates high availability and fault tolerance by ensuring that client connections to the database remain uninterrupted even in the event of network or node failures within the cluster.
Here’s how HAIP works in Oracle RAC:
1.Virtual IP (VIP)**: In an Oracle RAC environment, each instance has its own IP address, but there is also a shared Virtual IP (VIP) address that serves as a single access point for client connections to the cluster database.
2.Floating VIP Address**: The VIP address “floats” among the nodes of the cluster. It is assigned to the node where the Oracle RAC database is currently active. If there is a failover event or a node goes down, the VIP address automatically moves to another available node.
3.HAIP**: HAIP is the mechanism that enables this automatic failover of the VIP address. It monitors the status of the nodes in the cluster and ensures that the VIP address is always accessible to clients, even if a node or network interface fails.
4.Transparent Failover**: From the perspective of clients connecting to the Oracle RAC database, the failover process is transparent. Clients continue to connect to the VIP address, and the HAIP infrastructure redirects their connections to the appropriate node where the database is currently active.
5.Improved Availability**: HAIP enhances the availability of Oracle RAC databases by reducing the impact of node or network failures on client connections. It helps maintain continuous access to the database, minimizing downtime and ensuring high availability of services.
Overall, HAIP is a critical component of Oracle RAC environments, providing automatic failover capabilities for VIP addresses and helping to ensure uninterrupted access to clustered databases in the event of failures.

Why we need odd number of voting disks in RAC?

In Oracle Real Application Clusters (RAC) environments, voting disks are used to determine the status of each node in the cluster. They help prevent the split-brain scenario, where multiple nodes in a cluster lose communication with each other, potentially leading to data corruption and inconsistency.
Having an odd number of voting disks is recommended for several reasons:
Quorum Calculation: The number of voting disks determines the quorum, which is the minimum number of votes required to establish and maintain cluster membership. In a cluster with an odd number of voting disks, the quorum is calculated as (total voting disks / 2) + 1. This ensures that there is always a majority of votes available to establish consensus and make decisions regarding cluster membership and operations.
Avoiding Tie-Breakers: In clusters with an even number of voting disks, there is a possibility of tie-breaker situations where the cluster may not be able to reach a decisive quorum if nodes are equally divided in their voting. With an odd number of voting disks, there is always a clear majority, helping to avoid tie-breaker scenarios and ensuring that the cluster can make decisions effectively.
Enhanced Fault Tolerance: Having an odd number of voting disks improves fault tolerance in the cluster. If one voting disk fails or becomes inaccessible, the cluster can still maintain a majority and continue to operate effectively. In contrast, in clusters with an even number of voting disks, the failure of a single voting disk could potentially lead to a loss of quorum and disrupt cluster operations.
By ensuring an odd number of voting disks, Oracle RAC environments can maintain the integrity and stability of the cluster, minimize the risk of split-brain scenarios, and ensure that decisions regarding cluster membership and operations can be made effectively even in the presence of failures or network partitions.

What is node eviction and in which scenarios node eviction happens?

Node eviction in Oracle Real Application Clusters (RAC) refers to the process by which a node is forcibly removed or evicted from the cluster due to various reasons such as failures, network issues, or administrative actions. Node eviction is a critical aspect of maintaining the integrity and stability of the cluster.
Here are some common scenarios in which node eviction may occur:
Node Failure: If a node in the cluster experiences a hardware failure, software failure, or becomes unresponsive, it may be evicted from the cluster to prevent it from causing disruptions or compromising the overall availability of the cluster.
Network Partition: In the event of a network partition where communication between nodes is lost or degraded, the cluster may initiate node eviction to prevent split-brain scenarios. Split-brain occurs when nodes cannot communicate with each other but continue to operate independently, potentially leading to data corruption or inconsistency. Node eviction helps maintain cluster integrity by removing nodes that cannot participate in the cluster’s decision-making process due to network isolation.
Voting Disk Inaccessibility: If the voting disks, which are used to determine cluster membership and quorum, become inaccessible to a node, the cluster may initiate node eviction to ensure that the remaining nodes can establish consensus and maintain cluster stability.
Administrative Actions: Administrators may initiate node eviction for maintenance purposes, software upgrades, or other administrative tasks that require temporarily removing a node from the cluster. This allows administrators to perform maintenance activities without disrupting the overall availability of the cluster.
Cluster Health Checks: The clusterware continuously monitors the health and status of nodes in the cluster. If a node is determined to be in an unhealthy state or is unable to perform its required functions, the cluster may initiate node eviction to prevent the node from causing further issues and to maintain the overall health of the cluster.
In all of these scenarios, node eviction is a mechanism used by Oracle RAC to maintain the integrity, stability, and availability of the cluster. Node eviction helps ensure that only healthy and functional nodes participate in cluster operations, thereby minimizing the risk of data corruption, downtime, and disruptions to service availability.

Explain the backup frequency of OCR.

The backup frequency of OCR depends on factors such as the rate of configuration changes and the criticality of the system. Generally, regular backups, scheduled at intervals aligned with configuration updates, along with testing and integration with the overall backup strategy, are essential for OCR backup frequency.

ocrconfig -showbackup - > command to take manual ocr backup.