Kiran Dalvi
23 Jan, 2024
0 Comments
16 Mins Read

Oracle RAC Interview Questions

Why we are using vip in RAC?

Before Oracle Database 10g, there was no concept of VIP, and in RAC environments, clients had to connect directly to individual node IP addresses. Virtual IP (VIP) in Real Application Clusters (RAC) is used to provide a single, virtual IP address that can be used to connect to any node in the cluster. It helps ensure seamless connectivity during node failures or relocations. The concept of VIP was introduced in Oracle 10g to improve high availability and simplify client connections in RAC environments.

If user connected to the INSTANCE using physical ip, and if the node goes down, then there is no way for the user to know whether node is available or not. So it need to wait for a long time, till it gets timed out by the network.

However If i use logical vip( on top of physical vip), then if node goes down, then CRS will failover this vip to other surviving node. And the user will get the connection error quickly( like TNS no listener available ).

If I have a 8-node RAC, then how many scan listeners are required?

For an 8-node Real Application Clusters (RAC), you typically need three Scan Listeners.It is not mandatory for scan listener to run on all the nodes.

How SCAN knows which node has least load?

The SCAN (Single Client Access Name) does not inherently know the load on individual nodes. Load balancing is typically managed by the underlying clusterware or load balancing algorithms, which distribute client connections across the nodes in a balanced manner.

Explain how client connection is established in RAC database ?

LREG process on each instances registers the database service of the node with default local listener and scan listener. The listeners store the workload information of each node.

So when client tries to connect using scan_name and port,

scan_name will be resolved through DNS , which will redirect to 1st scan ip ( out of 3).
The client will connected to the respective scan listener.
The scan listener compares the work load of both the instances and if scan determines that node1 has least load , then scan listener send the vip address and port details of that particular nodes local listener to client.
Address and port details of that particular nodes local listener to client.
Now client connects to that local listeners and a dedicated server process is created.
Client connection becomes successful and it starts accessing the database.

What current block and CR block and PI in RAC?

Data block requests from global cache are of two types.

current block(cur) – When we want to update a data, oracle must locate the most recent version of the block in cache, it is known as current block

consistent read(CR) – When we want to read a data, then only committed data will be provided( with help of undo). that is known as consistent read.

Past image(PI) – When node A wants to updates the block, which is present on node B and node B has also updated the block , then node B will send the current copy of the block to Node A, it will keep a past image( PI) of the block , until it is written to the disk. Once commit happens on node B for that transaction or when checkpoint happens , the PI images will be flushed to disk.

There can be multiple CR blocks, But there will be always one Current block.

There can multiple scur(shared current) , But only xcur( exclusive current).

What is gc buffer busy wait?

Mean a session is trying to access a buffer in buffer cache, But that particular buffer is currently busy with global cache operation.So during that time gc buffer busy wait will happen.

Example –

Let’s say session A want to access block id 100 , But currently that block is in buffer cache of session B.
So session A requested session B LMS process to transfer the block.
While transfer is going on , session B also tried to access that block. But as already that block/buffer is already busy in global cache operation. Session B has to wait with wait event, gc buffer busy wait.

Reasons – Concurrency related, Right hand index growth.

What are some RAC specific parameters ?

undo_tablespaces
cluster_database
cluster_interconnects
remote_listener
thread
cluster_database_instances

Why RAC has separate redo thread for each node?

In RAC, each instance have their own lgwr process , So there has to be separate online redolog for each instance ( called as thread), So that lgwr will write to the respective redo log.

Why RAC has separate undo tablespace for each node?

In Oracle Real Application Clusters (RAC), each node has a separate undo tablespace to facilitate the management of undo data in a distributed and parallel processing environment. The primary reasons for having separate undo tablespaces for each node in a RAC setup include:

Isolation of Undo Segments: Each node in an RAC cluster has its own set of undo segments within the dedicated undo tablespace .This isolation helps prevent contention and potential performance issues that could arise if multiple nodes were contending for the same undo segments.

Parallel Processing and Scalability: RAC environments are designed to support parallel processing across multiple nodes for improved scalability and performance .Having separate undo tablespaces allows each node to manage its undo data independently, minimizing the need for coordination between nodes during undo operations.

Reduced Global Resource Contention: The separation of undo tablespaces reduces the likelihood of global resource contention for undo space .Without separate undo tablespaces, multiple nodes might contend for the same undo segments, leading to contention and potential performance bottlenecks.

Enhanced Concurrency: Separate undo tablespaces enhance concurrency by allowing each node to manage its undo transactions independently .This helps avoid serialization and contention for undo resources, which is crucial for maintaining high levels of concurrent activity in an RAC environment.

Improved High Availability: In the event of a node failure or partitioning of the RAC cluster, having separate undo tablespaces ensures that each surviving node can continue managing its undo data independently. This improves the high availability of the database by reducing the impact of failures on undo operations.

What data we need to check in vmstat and iostat output?

vmstat:

CPU Utilization:

Utilization percentages for user, system, and idle states.

High CPU utilization might indicate processing bottlenecks.

Memory Utilization:

Memory statistics including total, used, free, buffers, and cache.

High memory usage might indicate memory contention or insufficient memory.

Virtual Memory (Swap):

Swap-in and swap-out rates.

Excessive swapping suggests memory pressure.

I/O Wait:

Percentage of time the CPU spends waiting for I/O operations to complete.

High I/O wait times suggest I/O bottlenecks.

System and Interrupts:

System and hardware interrupts per second.

High interrupt rates may indicate hardware issues or heavy system activity.

iostat:

Disk Utilization:

Disk I/O statistics such as average I/O requests per second, kilobytes read and written per second, and average wait time.

High disk utilization might indicate disk I/O bottlenecks.

Disk Service Time:

Average time taken to service I/O requests.

High service times suggest slow disk response.

I/O Wait:

Percentage of time the CPU spends waiting for I/O operations to complete.

Correlate with CPU utilization from vmstat to identify if high I/O wait times affect overall system performance.

Device and CPU Utilization:

Device utilization percentage and CPU utilization percentage.

Correlate with vmstat CPU utilization to understand how much of the CPU time is spend on disk I/O operations.

Transfer Rates:

Average and instantaneous transfer rates in kilobytes per second.

Monitor for sustained high or low transfer rates, which may indicate performance issues or changes in workload.

What is Flex Cluster introduced in oracle 12c?

Oracle Flex Cluster, introduced in Oracle Database 12c, is a clustering architecture offering two types of server pools: Hub-and-Spoke and Uniform. It provides flexible resource management, simplified administration, and enhanced high availability for Oracle RAC environments.

Hub-and-Spoke Architecture

Uniform Cluster Architecture

Dynamic Resource Management

Enhanced High Availability

Simplified Administration

What is TAF?

Automatic Failover: TAF allows client connections to be automatically redirected to another available RAC node if the currently connected node or instance fails.

Transparent to Applications: TAF is transparent to the application layer, meaning that applications do not need to handle connection failover logic explicitly. The failover process is managed by Oracle client libraries and Oracle Net Services.

Session Persistence: TAF maintains session state during failover, ensuring that in-flight transactions and session data are preserved when switching to a new node.

Fast Recovery: TAF facilitates fast recovery by quickly redirecting client connections to an available node, minimizing downtime and disruption to application users.

Supported Failover Modes: TAF supports various failover modes, including Failover, Restart, and Session State. These modes determine the behavior of client connections during failover events.

Can we start crs in exclusive mode? and its purpose?

Yes, Oracle Clusterware (CRS) can be started in exclusive mode. It is used for maintenance tasks, troubleshooting, and ensuring isolation of a single node from the cluster to perform critical operations without affecting other nodes’ services.

ASM is running , but the database is not coming up? What might be the issue?

If ASM (Automatic Storage Management) is running, but the database is not coming up, several potential issues could be causing this problem:

Listener Configuration: Ensure that the listener is running and properly configured to handle connections for the database instances. Check the listener.ora file for correct settings.

Database Initialization Parameters: Review the database initialization parameters (init.ora or spfile) to ensure they are correctly configured for ASM storage. Pay attention to parameters such as ASM_DISKGROUPS, ASM_DISKSTRING, and DB_CREATE_FILE_DEST.

ASM Disk Group Availability: Verify the availability and health of the ASM disk groups. If a disk group is offline or in a dismounted state, it can prevent the database from starting. Use ASM commands like asmcmd to check the status of disk groups.

ASM Instance Status: Check the status of the ASM instance. If the ASM instance is down or in a restricted mode, it may be preventing the database from accessing the required storage.

Disk or Storage Issues: Investigate any disk or storage-related problems that may be affecting ASM operations. This could include disk failures, storage connectivity issues, or insufficient space in ASM disk groups.

Network Configuration: Ensure that there are no network issues preventing communication between the database instances and the ASM instance. Check for firewall rules, network congestion, or DNS resolution problems.

Permission and Ownership: Verify that the Oracle processes have appropriate permissions and ownership to access ASM disks and files. Incorrect permissions can lead to startup failures.

Diagnostic Logs: Check the alert log, ASM alert log, and other diagnostic logs for error messages or warnings that may provide clues about the cause of the startup failure.

If crs is not coming up , then what are things you will start looking into?

If CRS is not coming up, I would start by checking logs, reviewing alert logs, verifying dependencies, checking disk space, reviewing configuration files, verifying permissions, checking cluster interconnect, reviewing OS logs, checking cluster health, and investigating hardware failures.

Explain about local_listener and remote_listener parameter in RAC?

Local_listener:

Definition: This parameter specifies the network address that an Oracle RAC instance uses to listen for local connection requests.

Purpose: The local listener is responsible for managing connection requests originating from the same node where the instance is running.

Configuration: The local_listener parameter is set to a TNS entry or a net service name that resolves to a network address (hostname and port) where the local listener is running.

ALTER SYSTEM SET LOCAL_LISTENER = ‘mydb_listener’ SCOPE=both;

Here, ‘mydb_listener’ is a TNS entry pointing to the local listener’s address.

Remote_listener:

Definition: This parameter specifies one or more network addresses that an Oracle RAC instance uses to listen for remote connection requests from other instances in the cluster.

Purpose: The remote listener facilitates communication between instances on different nodes in the RAC cluster.

Configuration: The remote_listener parameter is set to a list of TNS entries or net service names that resolve to network addresses (hostnames and ports) where the remote listeners are running on other nodes in the cluster.

ALTER SYSTEM SET REMOTE_LISTENER = ‘node1_listener, node2_listener’ SCOPE=both;

Here, ‘node1_listener’ and ‘node2_listener’ are TNS entries pointing to the remote listeners on different nodes.

What are local registry and cluster registry?

Local Registry:

The local registry, also known as the local node registry, is specific to each node in the Oracle RAC cluster.

It stores configuration information relevant to the local instance, such as instance-specific parameters, initialization parameters, and certain cluster-related settings.

Each RAC node maintains its own local registry to manage its local instance’s configuration independently.

Cluster Registry:

The cluster registry is a shared repository that contains configuration information relevant to the entire RAC cluster.

It stores cluster-wide configuration settings, such as SCAN (Single Client Access Name) configurations, service configurations, and global resources like Oracle Clusterware resources.

The cluster registry is typically maintained in a shared location accessible by all nodes in the RAC cluster.

It facilitates centralized management of cluster-wide settings and resources, ensuring consistency and coherence across all nodes.

What is client side load balancing and server side load balancing?

Client-side load balancing: Clients distribute requests across multiple servers, deciding which server to connect to based on various algorithms such as round-robin or least connections.

Server-side load balancing: Load balancing is handled by a dedicated device or software component, such as a load balancer or proxy server, which distributes incoming requests among multiple servers based on predefined algorithms and server health checks.

What are the RAC related background processes?

LMSn (Lock Manager Server): Manages global resource coordination and distributed locks across RAC instances.

LMD (Lock Manager Daemon): Coordinates lock-related messaging among RAC instances and manages lock conversion requests.

LMON (Global Enqueue Service Monitor): Monitors global resources and manages global enqueue services, such as space allocation for global resources.

LMSn (Global Cache Service Process): Handles cache fusion operations by transferring data blocks between RAC instances to maintain cache coherence.

GCR (Global Cache Service Resources): Coordinates global cache resources and manages cache fusion and resource requests.

GCS (Global Cache Service): Manages the global cache and facilitates cache fusion operations between RAC instances.

RMSn (Resource Manager Server): Controls resource allocation and prioritization for database sessions based on resource management policies.

RBAL (Rebalance): Coordinates instance and resource rebalancing operations to maintain workload distribution across RAC instances.

MMON (Manageability Monitor): Monitors database and instance health, collecting statistics and managing advisory components.

MMNL (Manageability Monitor Light): Light version of MMON for low-priority monitoring tasks.

DIA0 (Diagnosability Process): Handles diagnostic data collection and monitoring for cluster-related issues.

LREG (Local Enqueue Service Registration): Registers local enqueue services with the Global Cache Service (GCS).

How instance recovery happens in oracle RAC?

When any one of the instance is crashed in RAC, then this node failure is detected by the surviving instances. Now the GRD resouces will be distributed across the existing instances. The instance which first detects the crash, will the start the online redo log thread of the crashed instance. The SMON of that instance.

Sequence

Normal RAC operation, all nodes are available.
One or more RAC instances fail.
Node failure is detected.
Global Cache Service (GCS) reconfigures to distribute resource management to the surviving instances.
The SMON process in the instance that first discovers the failed instance(s) reads the failed instance(s) redo logs to determine which blocks have to be recovered.
SMON issues requests for all of the blocks it needs to recover. Once all blocks are made available to the SMON process doing the recovery, all other database blocks are available for normal processing.
Oracle performs roll forward recovery against the blocks, applying all redo log recorded transactions.
Once redo transactions are applied, all undo records are applied, which eliminates non-committed transactions.
Database is now fully available to surviving nodes. will read the redo to do rollforward ( i.e to apply both committed and noncommited data). Once rollforward is done, it will rollback the uncommited transactions using UNDO tablespace of the failed instance.

What is TAF in oracle RAC?

BASIC

PRECONNECT

SELECT FAILOVER

SESSION FAILOVER

Can we have multiple SCAN(name) in a RAC?

From 12c onwards, We can have multiple scan with different subnets. As part of installation only scan will be configured. Post installation we need to configure another SCAN with different subnet( If required).

In RAC, where we define the SCAN?

We can define SCAN with below 2 option.

Using corporate DNS

Using Oracle GNS( Grid naming service)

What g stand for in views like gv$session , gv$sql etc.?

In Oracle Database, the “g” in views like gv$session, gv$sql, and similar views stands for “Global.” These views provide a global or cluster-wide perspective of database activity and resources in Oracle Real Application Clusters (RAC) environments.

The “gv” prefix indicates that these views are “Global Views,” meaning they aggregate information from all instances in the RAC cluster. They allow administrators and users to monitor and manage database resources and activities across all nodes in the cluster from a single point of access.

What is load balancing advisory?

Load Balancing Advisory in Oracle Database is a feature designed to assist in distributing client connections across instances within a Real Application Clusters (RAC) environment. It provides recommendations to help determine the optimal instance to direct a new connection request based on the current workload and resource utilization across the cluster.

The Load Balancing Advisory collects and analyzes statistics related to instance performance, such as CPU utilization, I/O rates, and other metrics. Based on these statistics, it determines which instance is the least loaded and most suitable to handle incoming connection requests.

By leveraging the Load Balancing Advisory, Oracle Connection Managers can intelligently route new connection requests to the recommended instance, thereby achieving better load distribution, maximizing resource utilization, and improving overall system performance in Oracle RAC deployments.

What is gc cr 2 way and gc cr 3 way?

gc cr 2-way (Global Cache Consistent Read 2-way):

This wait event occurs when an instance in the RAC environment is waiting for a consistent read block from another instance.

It indicates contention between two instances for access to the same consistent read block in the global cache.

The number “2-way” signifies that the contention is between two instances.

gc cr 3-way (Global Cache Consistent Read 3-way):

Similar to “gc cr 2-way”, this wait event occurs when an instance is waiting for a consistent read block from another instance.

However, “gc cr 3-way” indicates contention among three instances for access to the same consistent read block in the global cache.

It signifies contention between three instances.

What is the role of LMON background process?

The LMON (Global Enqueue Service Monitor) background process in Oracle RAC manages and monitors global resources, including global enqueue locks and resources. It helps ensure consistency and coordination across multiple instances in the RAC cluster by managing distributed lock management and resolving lock conflicts.

What are some RAC related wait events?

Some RAC-related wait events include:

gc buffer busy: Indicates contention for access to a global cache buffer in a RAC environment.

gc cr block busy: Denotes contention for access to a consistent read block in the global cache.

gc current block busy: Indicates contention for access to the current block in the global cache.

gc freelist: Denotes contention for access to the global cache freelist.

gc grant 2-way: Indicates contention for global cache resource grants between two RAC instances.

What is ACMS?

ACMS stands for Atomic Controlfile to Memory Service. It synchronizes the control file updates across Oracle Real Application Clusters (RAC) instances, ensuring consistency and high availability of database metadata.