Blog
Voting disks in RAC
- October 4, 2022
- Posted by: sayali@learnomate.org
- Category: Uncategorized
Voting Disks In RAC
Voting Disks :
- Oracle Clusterware uses voting disk files to determine which nodes are members of a cluster. You can configure voting disks on Oracle ASM, or you can configure voting disks on shared storage.
- If you configure voting disks on Oracle ASM, then you do not need to manually configure the voting disks. Depending on the redundancy of your disk group, an appropriate number of voting disks are created.
- If you do not configure voting disks on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting disks on physically separate storage. This avoids having a single point of failure. If you configure a single voting disk, then you must use external mirroring to provide redundancy.
[root@node name ~]$ crsctl query css votedisk
Main Voting Disk Function :
CSS is the service that determines which nodes in the cluster are available, and provides cluster group membership and simple locking services to the other processes. CSS typically determines node availability via communication through a dedicated private network with a voting disk used as a secondary communication mechanism. Basically, this is done by sending heartbeat messages through the network and the voting disk as illustrated by the top graphic in the slide. The voting disk is a shared raw disk partition or file on a clustered file system that is accessible to all nodes in the cluster. Its primary purpose is to help in situations where the private network communication fails. When that happens, the cluster is unable to have all nodes remain available because they are no longer able to synchronize I/O to the shared disks. Therefore, some of the nodes must go offline. The voting disk is then used to communicate the node state information used to determine which nodes go offline. Without the voting disk, it can become impossible for an isolated node(s), to determine whether it is experiencing a network failure or whether the other nodes are no longer available. It would then be possible for the cluster to get into a state where multiple subclusters of nodes would have unsynchronized access to the same database files. This situation is commonly referred to as the cluster split-brain problem.
The graphic at the bottom of the slide illustrates what happens when node3 can no longer send heartbeats to other members of the cluster. When others can no longer see node3’s heartbeats, they decide to evict that node by using the voting disk. When node3 reads the removal message, it generally reboots itself to make sure all outstanding write I/Os are lost.
Multiplexing Voting Disks :
- Voting disk is a vital resource for your cluster availability.
- Use one voting disk if it is stored on a reliable disk.
- Otherwise use multiplxed voting disks:
- # There is no need to relay on multipathing solutions.
- # Multiplxed copies should be stored on independent devices.
- # Make sure that there is no I/O starvation for your voting disks devices.
- # Use atleast 3 multiplxed copies.
- CSS uses simple majority rule to decide whether voting disk reads are consistent : V= f*2+1
Change Voting Disk Configuration :
-
- Voting disk configuration can be changed dynamically.
- To add a new voting disk:
crsctl add css votedisk
- To remove a voting disk:
crsctl delete css votedisk
If Oracle Clusterware is down on all nodes, use the force option
crsctl add css votedisk -force <new voting disk path> crsctl delete css votedisk -force <old voting disk path>
Back Up and Recover Your Voting Disks
Should not be needed. Instead, you should add/remove.
Recommendation is to use symbolic links.
Back up one voting disk by using the dd command.
- After Oracle Clusterware installation
- After node addition or deletion
- Cannot be done online.
crsctl query css votedisk
dd if= of= bs=4k
- Recover voting disks by restoring the first one using the dd command, and then multiplex it if necessary.
- If no voting disk backup is available, reinstall Oracle Clusterware.
Back Up and Recover Your Voting Disks :
There should be no need to back up a voting disk. Simply add a new one and drop a bad one.
It is recommended to use symbolic links to specify your voting disk paths. This is because the voting disk paths are directly stored in OCR, and editing the OCR file directly is not supported. By using symbolic links to your voting disks, it becomes easier to restore your voting disks if their original locations can no longer be used as a restore location.
A new backup of one of your available voting disks should be taken any time a new node is added, or an existing node is removed. The recommended way to do that is to use the dd command (ocopy in Windows environments). As a general rule on most platforms, including Linux and Sun, the block size for the dd command should be at least 4 KB to ensure that the backup of the voting disk gets complete blocks.
Before backing up your voting disk with the dd command, make sure that you stopped Oracle Clusterware on all nodes.
The crsctl query css votedisk command lists the voting disks currently used by CSS. This can help you to determine which voting disk to backup.
The slide shows you the procedure you can follow to back up and restore your voting disk.
Note: If you lose all your voting disks and you do not have any backup, you must reinstall Oracle Clusterware.
OCR Architecture :
Cluster configuration information is maintained in Oracle Cluster Registry (OCR). OCR relies on a distributed shared-cache architecture for optimizing queries, and clusterwide atomic updates against the cluster repository. Each node in the cluster maintains an in-memory copy of OCR, along with the Cluster Ready Services Daemon (CRSD) that accesses its OCR cache. Only one of the CRS processes actually reads from and writes to the OCR file on shared storage. This process is responsible for refreshing its own local cache, as well as the OCR cache on other nodes in the cluster. For queries against the cluster repository, the OCR clients communicate directly with the local OCR process on the node from which they originate. When clients need to update OCR, they communicate through their local CRS process to the CRS process that is performing input/output (I/O) for writing to the repository on disk.
The main OCR client applications are the Oracle Universal Installer (OUI), SRVCTL, Enterprise Manager (EM), the Database Configuration Assistant (DBCA), the Database Upgrade Assistant (DBUA), NetCA, and the Virtual Internet Protocol Configuration Assistant (VIPCA). Furthermore, OCR maintains dependency and status information for application resources defined within Oracle Clusterware, specifically databases, instances, services, and node applications.
OCR Contents and Organization :
Every clustering technology requires a repository through which the clustering software and other cluster-aware application processes can share information. Oracle Clusterware uses Oracle Cluster Registry to store information about resources it manages. This information is stored in a treelike structure using key–value pairs.
The slide shows you the main branches composing the OCR structure:
The SYSTEM keys contain data related to the main Oracle Clusterware processes such as CSSD, CRSD, and EVMD. For example, CSSD keys contain information about the misscount parameter and voting disk paths.
The DATABASE keys contain data related to the RAC databases that you registered with Oracle Clusterware. As shown, you have information about instances, nodeapps, services, and so on.
The last category of keys that you can find in OCR relate to the resource profiles used by Oracle Clusterware to maintain availability of the additional application you registered. These resources include the additional application VIPs, the monitoring scripts, and the check interval values.
Note: The XML data on the right side of the slide were obtained by using the ocrdump –xml command.
Managing OCR Files and Locations:
Overview :
You use the ocrconfig tool (the main configuration tool for Oracle Cluster Registry) to:
Generate logical backups of OCR using the –export option, and use them later to restore your OCR information using the –import option
Upgrade or downgrade OCR
Use the –showbackup option to view the generated backups (by default, OCR is backed up on a regular basis). These backups are generated in a default location that you can change using the –backuploc option. If need be, you can then restore physical copies of your OCR using the –restore option. You can also manually create OCR backups using the -manualbackup option.
Use the –replace ocr or –replace ocrmirror options to add, remove, or replace the primary OCR files or the OCR mirror file
Use the –overwrite option under the guidance of Support Services because it allows you to overwrite some OCR protection mechanisms when one or more nodes in your cluster cannot start because of an OCR corruption
Use the –repair option to change the parameters listing the OCR and OCR mirror locations
The ocrcheck tool enables you to verify the OCR integrity of both OCR and its mirror. Use the ocrdump utility to write the OCR contents (or part of it) to a text or XML file.
Automatic OCR Backups :
- The OCR content is critical to Oracle Clusterware.
- OCR is automatically backed up physically:
o At the end of every day: CRS keeps the last two copies.
o At the end of every week: CRS keeps the last two copies.
o Every four hours: CRS keeps the last three copies.
cd $ORACLE_BASE/Crs/cdata/jfv_clus $ ls -lrt
Automatic OCR Backups :
OCR contains important cluster and database configuration information for RAC and Oracle Clusterware. One of the Oracle Clusterware instances (CRSD master) in the cluster automatically creates OCR backups every four hours, and CRS retains the last three copies. That CRSD process also creates an OCR backup at the beginning of each day and of each week, and retains the last two copies. This is illustrated in the slide where you can see the content of the default backup directory of the CRSD master.
Although you cannot customize the backup frequencies or the number of retained copies, you have the possibility to identify the name and location of the automatically retained copies by using the ocrconfig -showbackup command.
The default target location of each automatically generated OCR backup file is the
ocrconfig -backuploc
#
ocrconfig –backuploc /shared/bak
Back Up OCR Manually :
• Daily backups of your automatic OCR backups to a different storage device:
• Use your favorite backup tool.
• On demand physical backups:
# ocrconfig –manualbackup
• Logical backups of your OCR before and after making significant changes:
# ocrconfig –export file name
• make sure that you restore OCR backups that match your current system configuration.
Back Up OCR Manually :
Because of the importance of OCR information, it is also recommended to manually create copies of the automatically generated physical backups. You can use any backup software to copy the automatically generated backup files, and it is recommended to do that at least once daily to a different device from where the primary OCR resides.
You can performs an OCR backup on demand using the –manualbackup option. The backup is generated in the location that you specify with the -backuploc option. .
In addition, you should also export the OCR contents before and after making significant configuration changes such as adding or deleting nodes from your environment, modifying Oracle Clusterware resources, or creating a database. Use the ocrconfig -export command as the root user to generate OCR logical backups. You need to specify a file name as the argument of the command, and it generates a binary file that you should not try to edit.
Most configuration changes that you make not only change the OCR contents but also cause file and database object creation. Some of these changes are often not restored when you restore OCR. Do not perform an OCR restore as a correction to revert to previous configurations if some of these configuration changes fail. This may result in an OCR with contents that do not match the state of the rest of your system.
Note: If you try to export OCR while an OCR client is running, you get an error.
Recover OCR Using Physical Backups :
1. Locate a physical backup:
$ ocrconfig –showbackup
2. Review its contents:
# ocrdump –backupfile file_name
3. Stop Oracle Clusterware
on all nodes:
# crsctl stop crs
4. Restore the physical OCR backup:
# ocrconfig –restore <CRS HOME>/cdata/jfv_clus/day.ocr
5. Restart Oracle Clusterware on all nodes:
# crsctl start crs
Check OCR integrity:
$ cluvfy comp ocr -n all
Recover OCR Using Physical Backups
Use the following procedure to restore OCR on UNIX-based systems:
1. Identify the OCR backups by using the ocrconfig -showbackup command. You can execute this command from any node as user oracle. The output tells you on which node and which path to retrieve both automatically and manually generated backups. Use the auto or manual argument to display only one category.
2. Review the contents of the backup by using ocrdump -backupfile file_name, where file_name is the name of the backup file.
3. Stop Oracle Clusterware on all the nodes of your cluster by executing the
crsctl stop crs command on all the nodes as the root user.
4. Perform the restore by applying an OCR backup file that you identified in step one using the following command as the root user, where file_name is the name of the OCR file that you want to restore. Make sure that the OCR devices that you specify in the OCR configuration file (/etc/oracle/ocr.loc) exist and that these OCR devices are valid before running this command: ocrconfig -restore file_name
5. Restart Oracle Clusterware on all the nodes in your cluster by restarting each node or by running the crsctl start crs command as the root user.
6. Run the following command to verify OCR integrity, where the -n all argument retrieves a listing of all the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all
Recover OCR Using Logical Backups
1. Locate a logical backup created using an OCR export.
2. Locate a logical backup created using an OCR export.
1. Locate a logical backup created using an OCR export.
2. Stop Oracle Clusterware on all nodes:
# crsctl stop crs
3. Restore the logical OCR backup:
# ocrconfig –import /shared/export/ocrback.dmp
4. Restart Oracle Clusterware on all nodes:
# crsctl start crs
5. Check OCR integrity:
$ cluvfy comp ocr -n all
Recover OCR Using Logical Backups
Use the following procedure to import OCR on UNIX-based systems:
1. Identify the OCR export file that you want to import by identifying the OCR export file that you previously created using the ocrconfig -export file_name command.
2. Stop Oracle Clusterware on all the nodes in your RAC database by executing the
crsctl stop crs command on all the nodes as the root user.
3. Perform the import by applying an OCR export file that you identified in step one using the following command, where file_name is the name of the OCR file from which you want to import OCR information: ocrconfig -import file_name
4. Restart Oracle Clusterware on all the nodes in your cluster by restarting each node using the crsctl start crs command as the root user.
5. Run the following Cluster Verification Utility (CVU) command to verify OCR integrity, where the -n all argument retrieves a listing of all the cluster nodes that are configured as part of your cluster:
cluvfy comp ocr -n all
Replace an OCR Mirror: Example
# ocrcheck
Status of Oracle Cluster Registry is as follows:
Version : 2
Total space (kbytes) : 200692
Used space (kbytes) : 3752
Available space (kbytes) : 196940
ID : 495185602
Device/File Name : /oradata/OCR1
Device/File integrity check succeeded
Device/File Name : /oradata/OCR2
Device/File needs to be synchronized with the other device
# ocrconfig –replace ocrmirror /oradata/OCR2
Replace, Add, or Remove an OCR File
The code example in the slide shows you how to replace the existing OCR mirror file. It is assumed that you already have an OCR mirror, and that this mirror is no longer working as expected. Such a reorganization can be triggered because you received an OCR failure alert in Enterprise Manager, or because you saw an alert directly in the Oracle Clusterware alert log file.
Using the ocrcheck command, you clearly see that the OCR mirror is no longer in sync with the primary OCR. You then issue the ocrconfig –replace ocrmirror filename command to replace the existing mirror with a copy of your primary OCR. In the example, filename can be a new file name if you decide to also relocate your OCR mirror file.
If it is the primary OCR file that is failing, and if your OCR mirror is still in good health, you can use the ocrconfig –replace ocr filename command instead.
Note: The example in the slide shows you a replace scenario. However, you can also use a similar command to add or remove either the primary or the mirror OCR file:
Executing ocrconfig –replace ocr|ocrmirror filename adds the primary or mirror OCR file to your environment if it does not already exist.
Executing ocrconfig –replace ocr|ocrmirror removes the primary or the mirror OCR file.
Repair OCR Configuration: Example
1. Stop Oracle Clusterware on Node2:
# crsctl stop crs
2. Add OCR mirror from Node1: # ocrconfig –replace ocrmirror /OCRMirror
3. Repair OCR mirror location on Node2: # ocrconfig –repair ocrmirror /OCRMirror
4. Start Oracle Clusterware on Node2:
# crsctl start crs
Repair OCR Configuration: Example
Use the ocrconfig –repair command to repair inconsistent OCR configuration information. The OCR configuration information is stored in:
/etc/oracle/ocr.loc on Linux and AIX
/var/opt/oracle/ocr.loc on Solaris and HP-UX
Registry key HKEY_LOCAL_MACHINE\SOFTWARE\Oracle\ocr on Windows
You may need to repair an OCR configuration on a particular node if your OCR configuration changes while that node is stopped. For example, you may need to repair the OCR on a node that was not up while you were adding, replacing, or removing an OCR.
The example in the slide illustrates the case where the OCR mirror file is added on the first node of your cluster while the second node is not running Oracle Clusterware.
You cannot perform this operation on a node on which Oracle Clusterware is running.
Note: This repairs the OCR configuration information only; it does not repair OCR itself.
OCR Considerations :
• If using raw devices to store OCR files, make sure they exist before add or replace operations.
• You must be the root user to be able to add, replace, or remove an OCR file while using ocrconfig.
• While adding or replacing an OCR file, its mirror needs to be online.
• If you remove a primary OCR file, the mirror OCR file becomes primary
• Never remove the last remaining OCR file.
Replacing OCR Considerations
Here is a list of important considerations when you use the ocrconfig –replace command:
If you are using raw devices, make sure that the file name exists before issuing an add or replace operation using ocrconfig.
To be able to execute an add, replace, or remove operation using ocrconfig, you must be logged in as the root user.
The OCR file that you are replacing can be either online or offline.
If you remove a primary OCR file, then the mirrored OCR file becomes the primary OCR file.
Do not perform an OCR removal operation unless there is at least one other active OCR file online.
Hope it Helps!!!!!
Voting Disks In RAC
Voting Disks :
- Oracle Clusterware uses voting disk files to determine which nodes are members of a cluster. You can configure voting disks on Oracle ASM, or you can configure voting disks on shared storage.
- If you configure voting disks on Oracle ASM, then you do not need to manually configure the voting disks. Depending on the redundancy of your disk group, an appropriate number of voting disks are created.
- If you do not configure voting disks on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting disks on physically separate storage. This avoids having a single point of failure. If you configure a single voting disk, then you must use external mirroring to provide redundancy.
[root@node name ~]$ crsctl query css votedisk
Main Voting Disk Function :
CSS is the service that determines which nodes in the cluster are available, and provides cluster group membership and simple locking services to the other processes. CSS typically determines node availability via communication through a dedicated private network with a voting disk used as a secondary communication mechanism. Basically, this is done by sending heartbeat messages through the network and the voting disk as illustrated by the top graphic in the slide. The voting disk is a shared raw disk partition or file on a clustered file system that is accessible to all nodes in the cluster. Its primary purpose is to help in situations where the private network communication fails. When that happens, the cluster is unable to have all nodes remain available because they are no longer able to synchronize I/O to the shared disks. Therefore, some of the nodes must go offline. The voting disk is then used to communicate the node state information used to determine which nodes go offline. Without the voting disk, it can become impossible for an isolated node(s), to determine whether it is experiencing a network failure or whether the other nodes are no longer available. It would then be possible for the cluster to get into a state where multiple subclusters of nodes would have unsynchronized access to the same database files. This situation is commonly referred to as the cluster split-brain problem.
The graphic at the bottom of the slide illustrates what happens when node3 can no longer send heartbeats to other members of the cluster. When others can no longer see node3’s heartbeats, they decide to evict that node by using the voting disk. When node3 reads the removal message, it generally reboots itself to make sure all outstanding write I/Os are lost.
Multiplexing Voting Disks :
- Voting disk is a vital resource for your cluster availability.
- Use one voting disk if it is stored on a reliable disk.
- Otherwise use multiplxed voting disks:
- # There is no need to relay on multipathing solutions.
- # Multiplxed copies should be stored on independent devices.
- # Make sure that there is no I/O starvation for your voting disks devices.
- # Use atleast 3 multiplxed copies.
- CSS uses simple majority rule to decide whether voting disk reads are consistent : V= f*2+1
Change Voting Disk Configuration :
-
- Voting disk configuration can be changed dynamically.
- To add a new voting disk:
crsctl add css votedisk
- To remove a voting disk:
crsctl delete css votedisk
If Oracle Clusterware is down on all nodes, use the force option
crsctl add css votedisk -force <new voting disk path> crsctl delete css votedisk -force <old voting disk path>
Back Up and Recover Your Voting Disks
Should not be needed. Instead, you should add/remove.
Recommendation is to use symbolic links.
Back up one voting disk by using the dd command.
- After Oracle Clusterware installation
- After node addition or deletion
- Cannot be done online.
crsctl query css votedisk
dd if= of= bs=4k
- Recover voting disks by restoring the first one using the dd command, and then multiplex it if necessary.
- If no voting disk backup is available, reinstall Oracle Clusterware.