RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic) [ID 810394.1] | |||||
Modified 03-FEB-2010 Type BULLETIN Status PUBLISHED |
In this Document
Purpose
Scope and Application
RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic)
RAC Platform Specific Starter Kits and Best Practices
RAC Platform Generic Load Testing and System Test Plan Outline
RAC Platform Generic Highlighted Recommendations
RAC Platform Generic Best Practices
Getting Started - Preinstallation and Design Considerations
Clusterware Considerations
Networking Considerations
Storage Considerations
Installation Considerations
Patching Considerations
Upgrade Considerations
Oracle VM Considerations
Database Initialization Parameter Considerations
Performance Tuning Considerations
General Configuration Considerations
E-Business Suite (with RAC) Considerations
Peoplesoft (with RAC) Considerations
Tools/Utilities for Diagnosing and Working with Oracle Support
11gR2 Specific Considerations
RAC Platform Generic References
CRS / RAC Related References
RAC / RDBMS Related References
VIP References
ASM References
11.2 References
Infiniband References
MAA / Standby References
Patching References
Upgrade References
E-Business References
Unix References
Weblogic/RAC References
References Related to Working with Oracle Support
Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.1.0 - Release: 10.2 to 11.2Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.7
Information in this document applies to any platform.
Purpose
The goal of the Oracle Real Application Clusters (RAC) Starter Kit is to provide you with the latest information on generic and platform specific best practices for implementing an Oracle RAC cluster. This document is compiled and provided based on Oracle's experience with its global RAC customer base.This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit.
All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.
As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.
Scope and Application
This article is intended for use by all new (and existing) Oracle RAC implementers.RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic)
RAC Platform Specific Starter Kits and Best Practices
While this note focuses on Generic RAC Best Practices, the following notes contain detailed platform specific best practices. Please refer to the below notes for more specifics, including example step-by-step install cookbooks, and sample system test plans.Note 811306.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Linux)
Note 811280.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Solaris)
Note 811271.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (Windows)
Note 811293.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (AIX)
Note 811303.1 RAC Assurance Support Team: RAC Starter Kit and Best Practices (HP-UX)
RAC Platform Generic Load Testing and System Test Plan Outline
A critical component of any successful implementation, particularly in the High Availability arena, is testing. For a RAC environment, testing should include both load generation, to monitor and measure how the system works under heavy load, and a system test plan, to understand how the system reacts to certain types of failures. To assist with this type of testing, this document contains links to documents to get you started in both of these areas.
Click here for a White Paper on available RAC System Load Testing Tools
Click here for a platform generic RAC System Test Plan Outline
Use these documents to validate your system setup and configuration, and also as a means to practice responses and establish procedures in case of certain types of failures.
RAC Platform Generic Highlighted Recommendations
Highlighted Recommendations are recommendations that are thought to have the greatest impact, or answer most commonly addressed questions or issues. In this case, Generic Highlighted Recommendations talk about commonly asked or encountered issues that are generic to RAC implementations across all platforms.- Having a step-by-step plan for your RAC project implementation is invaluable. The following OTN article contains a sample project outline: http://www.oracle.com/technology/pub/articles/haskins-rac-project-guide.html
- To simplify the stack and simplify vendor interactions, Oracle recommends avoiding 3rd party clusterware, unless absolutely necessary.
- Automatic Storage Management (ASM) is recommended for datafile storage. The following is a link to the ASM collateral index: http://www.oracle.com/technology/products/database/asm/index.html. In addition, this link references the ASM Overview and Technical Best Practices White Paper. Reference: http://www.oracle.com/technology/products/database/asm/pdf/asm_10gr2_bestpractices 09-07.pdf
- The RAC Assurance Team recommends placement of Oracle Homes on local drives whenever possible. The following white paper contains an analysis of the pros and cons of shared versus local Oracle Homes: http://www.oracle.com/technology/products/database/clustering/pdf/oh_rac.pdf
- Having a system test plan to help plan for and practice unplanned outages is crucial. The following paper discusses Best Practices for Optimizing Availability During Unplanned Outages Using Oracle Clusterware and Oracle Real Application Clusters: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_FastRecoveryOracleClusterwareandRAC.pdf
In addition, this note has an attached sample System Test Plan Outline, to guide your system testing to help prepare for potential unplanned failures. - Develop a proactive patching strategey, to stay ahead of the latest known issues. Keep current with the latest Patch Set Updates (as documented in Note 850471.1) and be aware of the most current receommended patches (as documented in Note 756671.1). Plan for periodic (for example: quarterly) maintenance windows to keep current with the latest recommend patche set updates and patches.
- Understanding how to minimize downtime while patching is a key piece of this strategy. The following paper discusses patching strategies geared towards minimizing downtime in a RAC/Clusterware environment: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_PlannedMaintwithClusterwareandRAC.pdf
- For all Unix platforms running Oracle version 11.1.0.6 or 11.1.0.7: Take note / implement the solution explained in Note 858279.1.
- When patching, be sure to use the latest version of OPATCH. Available for download from My Oracle Support under Patch 6880880.
RAC Platform Generic Best Practices
Beyond the Highlighted Recommendations above, the RAC Assurance Team has recommendations for various different parts/components of your RAC setup. These additional recommendations are broken into categories and listed below.Getting Started - Preinstallation and Design Considerations
- Check with the Disk Vendor that the Number of Nodes, OS version, RAC version, CRS version, Network fabric, and Patches are certified, as some Storage/San vendors may require special certification for a certain number of nodes.
- Use both external and Oracle provided redundancy for the OCR and Voting disks. Note 428681.1 explains how to add OCR mirror and how to add additional voting disks.
- Check the support matrix to ensure supportability of product, version and platform combinations or for understanding any specific steps which need to be completed which are extra in the case of some such combinations. Note 337737.1
- Avoid SSH and XAUTH warning before RAC 10G installation. Reference Note 285070.1
- Consider configuring the system logger to log messages to one central server.
- For CRS, ASM, and Oracle ensure one unique User ID with a single name, is in use across the cluster. Problems can occur accessing OCR keys when multiple O/S users share the same UID. Also this results in logical corruptions and permission problems which are hard to diagnose.
- Make sure machine clocks are synchronized on all nodes to the same NTP source.
Implementing NTP (Network Time Protocol) on all nodes prevents evictions and helps to facilitate problem diagnosis. Use the -x option (ie. ntpd -x, xntp -x) if available to prevent time from moving backwards in large amounts. This slewing will help reduce time changes into multiple small changes, such that they will not impact Oracle Clusterware. Note 759143.1 - Eliminate any single points of failure in the architecture. Examples include (but are not limited to): Cluster interconnect redundancy (NIC bonding etc), multiple access paths to storage, using 2 or more HBA's or initiators and multipathing software, and Disk mirroring/RAID
- Plan and document capacity requirements. Work with server vendor to produce detailed capacity plan and system configuration, but consider: Use normal capacity planning process to estimate number of CPUs required to run workload. Both SMP and RAC clusters have synchronization costs as the number of CPUs increase. SMPs normally scale well for small number of CPUs, RAC clusters normally scale better than SMPs for large number of CPUs. Typical synchronization cost: 5-20%
- Use proven high availability strategies. RAC is one component in a high availability architecture. Make sure all parts are covered. Review Oracle's Maximimum Availability Architecture recommendations and references further down in this document.
- It is strongly advised that a production RAC instance does not share a node with a DEV, TEST, QA or TRAINING instance. These extra instances can often introduce unexpected performance changes into a production environment.
- Configure Servers to boot from SAN disk, rather than local disk for easier repair, quick provisioning and consistency.
Clusterware Considerations
- Configure 3 or more voting disks (always an odd number). This is because losing 1/2 or more of all of your voting disks will cause nodes to get evicted from the cluster, or nodes to kick themselves out of the cluster.
Networking Considerations
- Underscores should not be used in a host or domainname according to RFC952 - DoD Internet host table specification. The same applies for Net, Host, Gateway, or Domain name. Reference: http://www.faqs.org/rfcs/rfc952.html
- Ensure the default gateway is on the same subnet as the VIP. Otherwise this can cause problems with racgvip and cause the vip and listener to keep restarting.
- Make sure network interfaces have the same name on all nodes. This is required. To check - use ifconfig (on Unix) or ipconfig (on Windows).
- Use Jumbo Frames if supported and possible in the system. Reference: Note 341788.1
- Use non-routable network addresses for private interconnect; Class A: 10.0.0.0 to 10.255.255.255, Class B: 172.16.0.0 to 172.31.255.255, Class C: 192.168.0.0 to 192.168.255.255. Reference: http://www.faqs.org/rfcs/rfc1918.html and Note 338924.1
- Make sure network interfaces are configured correctly in terms of speed, duplex, etc. Various tools exist to monitor and test network: ethtool, iperf, netperf, spray and tcp. Note 563566.1
- Configure nics for fault tolerance (bonding/link aggregation). Note 787420.1.
- Performance: check for faulty switches, bad hba's or ports which drop packets. Most cases we see with network related evictions is when either there is too much traffic on the interconnect (so the interconnect capacity is exhausted which is where aggregation or some other hardware solution helps) or the switch, network card is not configured properly and this is evident from the "netstat -s | grep udp" settings (if using UDP protocol for IPC for RAC) where this will register underflows (buffer size configuration for UDP) or errors due to bad ports, switches, network card, network card settings. Please review the same in the context of errors reported from packets sent through the interface.
- For more predictable hardware discovery, place hba and nic cards in the same corresponding slot on each server in the Grid.
- Ensure that all network cables are terminated in a grounded socket. A switch is required for the private network. Use dedicated redundant switches for private interconnect and VLAN considerations. RAC and Clusterware deployment best practices recommend that the interconnection be deployed on a stand-alone, physically separate, dedicated switch.
- Deploying the RAC/Clusterware interconnect on a shared switch, segmented VLAN may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Asymmetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port.
- Consider using Infiniband on the interconnect for workloads that have high volume requirements. Infiniband can also improve performance by lowering latency, particularly with Oracle 11g, with the RDS protocol. See Note 751343.1.
- Configure IPC address first in listener.ora address list. For databases upgraded from earlier versions to 10gR2 the netca did not configure the IPC address first in the listener.ora file. In 10gR2 this is the default but if you upgrade this isn't changed unless you do it manually. Failure to do so can adversely impact the amount of time it takes the VIP to fail over if the public network interface should fail. Therefore, check the 10gR1 and 10gR2 listener.ora file. Not only should the IPC address be contained in the address list but it should be FIRST. Note 403743.1
- Increase the SDU (and in older versions the TDU as well) to a higher value (e.g. 4KB 8KB, up to 32KB), thus reducing round trips on the network, possibly decreasing response time and over all perceived user responsiveness of the system. Note 44694.1
- To avoid ORA-12545 errors, ensure that client HOSTS files and/or DNS are furnished with both VIP and Public hostnames.
Storage Considerations
- Ensure Correct Mount Options for NFS Disks when RAC is used with NFS.The documented mount options are detailed in Note 359515.1 for each platform.
- Implement multiple access paths to storage array using two or more HBAs or initiators with multi-pathing software over these HBAs. Where possible, use the pseudo devices (multi-path I/O) as the diskstring for ASM. Examples are: EMC PowerPath, Veritas DMP, Sun Traffic Manager, Hitachi HDLM, IBM SDDPC, Linux 2.6 Device Mapper. This is useful for I/O loadbalancing and failover. Reference: Note 294869.1 and Note 394956.1
- Adhere to ASM best practices. Reference: Note 265633.1 ASM Technical Best Practices
- ORA-15196 (ASM block corruption) can occur, if LUNs larger than 2TB are presented to an ASM diskgroup. As a result of the fix, ORA-15099 will be raised if a disk larger than 2TBis specified. This is irrespective of the presence of asmlib. Workaround: Do not add more than 2 TB size disk to a diskgroup. Reference: Note 6453944.8
- On some platforms repeat warnings about AIO limits may be seen in the alert log:
"WARNING:Oracle process running out of OS kernel I/O resources." Apply patch 6687381, available on many platforms. This issue affects 10.2.0.3, 10.2.0.4, and 11.1.0.6. It is fixed in 11.1.0.7. Note 6687381.8 - Create two ASM disk groups, one for database area and one for flash recovery area, on separate physical disks. RAID storage array LUNs can be used as ASM disks to minimize the number of LUNs presented to the OS . Place database and redo log files in database area.
Installation Considerations
- Check Cluster Prequisites Using cluvfy (Cluster Verification Utility). Use cluvfy at all stages prior to and during installation of Oracle software. Also, rather than using the version on the installation media, it is crucial to download the latest version of cluvfy OTN: http://www.oracle.com/technology/products/database/clustering/cvu/cvu_download_homepage.html. Note 339939.1 and Note 316817.1 contain more relevant information on this topic.
- It is recommended to patch the Clusterware Home to the desired level before doing any RDBMS or ASM home install.
For example, install clusterware 10.2.0.1 and patch to 10.2.0.4 before installing 10.2.0.1 RDBMS. - Install ASM in a separate ORACLE_HOME from the database for maintenance and availability reasons (eg., to independently patch and upgrade).
- If you are installing Oracle Clusterware as a user that is a member of multiple operating system groups, the installer installs files on all nodes of the cluster with group ownership set to that of the user's current active or primary group. Therefore: ensure that the first group listed in the file /etc/ group is the current active group OR invoke the Oracle Clusterware installation using the following additional command line option, to force the installer to use the proper group when setting group ownership on all files: runInstaller s_usergroup=current_active_group (Bug 4433140)
Patching Considerations
This section is targeted towards customers beginning a new implementation of Oracle Real Application Clusters, or customers who are developing a proactive patching strategy for an existing implementation. For new implementations, it is strongly recommended that the latest available patchset for your platform be applied at the outset of your testing. In cases where that latest version of the RDBMS cannot be used because of lags in internal or 3rd party application certification or due to other limitations, it is still supported to have the CRS Home and ASM Homes running at a later patch level than the RDBMS Home, therefore, it may still be possible to run either the CRS or ASM Home at the latest patchset level. As a best practice (with some exceptions, see the Note in the references section below), Oracle Support recommends that the following be true:
- The CRS_HOME must be at a patch level or version that is greater than or equal to the patch level or version of the ASM Home. The CRS_HOME must be a patch level or version that is greater than or equal to the patch level or version of the RDBMS home.
- The ASM_HOME must be at a patch level or version that is greater than or equal to the patch level or version of the RDBMS Home. The ASM_HOME must be a patch level or version that is equal to but not greater than the patch level or version of the CRS_HOME.
- Before patching the database, ASM or clusterware homes using opatch check the available space on the filesystem and use Note:550522.1 in order to estimate how much space will be needed and how to handle the situation if the filesystem should fill up during the patching process.
- Review known issues specific to the 10.2 0.4.0 patchset: Note 555579.1.
For more detailed notes and references on patching in a RAC environment, see the patching section below, in the "RAC Platform Generic References" section at the end of this note.
Upgrade Considerations
- Begin with minimum version 10.2.0.3 when upgrading 10.2.0.X to 11.X
- Use rolling upgrades where appropriate for Oracle Clusterware (CRS) Note 338706.1. For detailed upgrade assistance, refer to the appropriate Upgrade Companion for your release: Note 466181.1 10g Upgrade Companion and Note 601807.1 Oracle 11gR1 Upgrade Companion
- For information about upgrading a database using a transient logical standby, refer to: Note 949322.1 : Oracle11g Data Guard: Database Rolling Upgrade Shell Script
Oracle VM Considerations
- Oracle Real Application Clusters in Oracle VM Environments: http://www.oracle.com/technology/products/database/clusterware/pdf/oracle_rac_in_oracle_vm_environments.pdf
Database Initialization Parameter Considerations
- Set PRE_PAGE_SGA=false. If set to true, it can significantly increase the time required to establish database connections. In cases where clients might complain that connections to the database are very slow then consider setting this parameter to false, doing so avoids mapping the whole SGA and process startup and thus saves connection time.
- Set PARALLEL_MIN_SERVERS to CPU_COUNT-1. This will pre-spawn recovery slaves at startup time and will avoid having to spawn them when recovery is required which could delay recovery due to the fact that slaves are started in serial. Note that SGA memory for PX msg pool will be allocated for all PARALLEL_MAX_SERVERS if you set PARALLEL_MIN_SERVERS.
- Tune PARALLEL_MAX_SERVERS to your hardware. Start with (2 * ( 2 threads ) *(CPU_COUNT)) = 4 x CPU count and repeat test for higher values with test data.
- Consider setting FAST_START_PARALLEL_ROLLBACK. This parameter determines how many processes are used for transaction recovery, which is done after redo application. Optimizing transaction recovery is important to ensure an efficient workload after an unplanned failure. As long as the system is not CPU bound, setting this to a value of HIGH is a best practice. This causes Oracle to use four times the CPU count (4 X cpu_count) parallel processes for transaction recovery. The default for this parameter is LOW, or two times the CPU count (2 X cpu_count).
- Set FAST_START_MTTR_TARGET to a non-zero value in seconds. Crash recovery will complete within this desired time frame.
- In 10g and 11g databases, init parameter ACTIVE_INSTANCE_COUNT should no longer be set. This is because the RACG layer doesn't take this parameter into account. As an alternative, you should create a service with one preferred instance.
- Increase PARALLEL_EXECUTION_MESSAGE_SIZE from default (normallly 2048) to 8192. This can be set higher for datawarehousing based systems where there is a lot of data transferred through PQ.
- Set OPTIMIZER_DYNAMIC_SAMPLING = 1 or simply analyze your objects because 10g Dynamic sampling can generate extra CR buffers during execution of SQL statements.
- Tune DataGuard to avoid cluster related waits. Improperly tuned DataGuard settings can cause high LOG FILE SYNC WAIT and GLOBAL CACHE LOG FLUSH TIME. Reference: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_DataGuardNetworkBestPractices.pdf, http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_RecoveryBestPractices.pdf, http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_Roadmap.pdf
Performance Tuning Considerations
In any database system, RAC or single instance, the most significant performance gains are usually obtained from traditional application tuning techniques. The benefits of those techniques are even more remarkable in a RAC database.
- Many sites run with too few redo logs or with logs that are sized too small. With too few redo logs configured, there is the potential that the archiver process(es) cannot keep up which could cause the database to stall. Small redo logs cause frequent log switches, which can put a high load on the buffer cache and I/O system. As a general practice each thread should have at least three redo log groups with two members in each group.
Oracle Database 10g introduced the Redo Logfile Size Advisor which determines the optimal, smallest online redo log file size based on the current FAST_START_MTTR_TARGET setting and corresponding statistics. Thus, the Redo Logfile Size Advisor is enabled only if FAST_START_MTTR_TARGET is set.
A new column is added to V$INSTANCE_RECOVERY. This column shows the redo log file size (in megabytes) that is considered to be optimal based on the current FAST_START_MTTR_TARGET setting. It is recommended that you set all online redo log files to at least this value.
- Avoid and eliminate long full table scans in OLTP environments.
- Use Automatic Segment Space Management (ASSM). Hard to avoid in 10gR2 and higher. All tablespaces except system, temp, and undo should use ASSM.
- Increasing sequence caches in insert intensive applications improves instance affinity to index keys deriving their values from sequences. Increase the Cache for Application Sequences and some System sequences for better performance. Use a large cache value of maybe 10,000 or more. Additionaly use of the NOORDER attribute is most effective, but it does not guarantee sequence numbers are generated in order of request (this is actually the default.)
- The default setting for the SYS.AUDSES$ sequence is 20, this is too low for a RAC system where logins can occur concurrently from multiple nodes. Refer to Note 395314.1.
General Configuration Considerations
- In 10gR2 and above the LMS process is intended to run in the real time scheduling class. In some instances we have seen this prevented due to incorrect ownership or permissions for the oradism executable which is stored in the $ORACLE_HOME/bin directory. See Note 602419.1 for more details on this.
- Avoid SETTING ORA_CRS_HOME environment variable. Setting this variable can cause problems for various Oracle components, and it is never necessary for CRS programs because they all have wrapper scripts.
- Use Enterprise Manager or Grid Control to create database services - all features available in one tool. For 10.2 and 10.1 one can use dbca to create these services and hence define the preferred and available instances for these services as part of database creation. However in 11.1.0.6 this is only available in Enterprise Manager and has been removed from DBCA.
- Configure Oracle Net Services load balancing properly to distribute connections. Load balancing should be used in combination with 10g Workload Services to provide the highest availability. The CLB_GOAL attribute of 10g workload services should be configured appropriately depending upon application requirements. Different workloads might require different load balancing goals. Use separate services for each workload with different CLB_GOAL.
- Ensure the NUMA (Non Uniform Memory Architecture) feature is turned OFF unless explicitly required and tested, as there have been issues reported with NUMA enabled. Refer to Note 759565.1 for more details.
- Read and follow the Best Practices Guide for XA and RAC to avoid problems with XA transactions being split across RAC Instances. Reference: http://www.oracle.com/technology/products/database/clustering/pdf/bestpracticesforxaandrac.pdf
- Increase retention period for AWR data from 7 days to at least one business cycle. Use the awrinfo.sql script to budget for the amount of information required to be stored in the AWR and hence sizing the same.
- ONS spins consuming high CPU and/or memory. This is fixed in 10.2.0.4 & 11.1.0.6. Refer to Note 4417761.8 and Note 731370.1 for more details and workaround.
- Use SRVCTL to register resources as the Oracle user (not as root user). Registering (database, instances, asm, listener, and services) resources as root can lead to inconsistent behavior. During clusterware install, nodeapps is created by the root user. Only the vip resource should be owned by root. Any other resources owned by root will need to be removed (as root) then re-created via the oracle user. Check the OCRDDUMP output for resource keys owned by root.
- For versions 10gR2 and 11gR1, it is a best practice on all platforms to set the CSS diagwait parameter to 13 in order to provide time for dumping diagnostics in case of node evictions. Note 559365.1 has more details on diagwait. In 11gR2 it is possible but should not be necessary to set diagwait.
E-Business Suite (with RAC) Considerations
- Patch against known issues Bug 6142040 : ICM DOES NOT UPDATE TARGET NODE AFTER FAILOVER and Bug 6161806 : APPSRAP: PCP NODE FAILURE IS NOT WORKING
- Change RAC APPS default setting to avoid slow Purchase Order approval. Note 339508.1
- It is recommended to set the init.ora parameter max_commit_propagation_delay= 0 in the init.ora or spfile for the E-business Suite on RAC. Note 259454.1
- You can use Advanced Planning and Scheduling (APS) on a separate RAC (clustered). Merging APS into OLTP database and isolating the load to a separate RAC instance is supported. Refer to Knowledge Documents Note 279156.1 and Note 286729.1 for more details.
- You can run Email Center in a RAC environment. Reference Knowledge Document Note 272266.1 for RAC related specific instructions.
- You can run Oracle Financial Services Applications (OFSA) in a RAC environment? Refer to Knowledge Document Note 280294.1 for RAC related best practices.
- Activity Based Management (ABM) is supported in a RAC environment. Reference Knowledge Document Note 303542.1 for RAC related best practices.
- When using Oracle Application Tablespace Migration Utility (OATM) in a RAC environment, be sure to follow the instructions for RAC environments in Note 404954.1.
Peoplesoft (with RAC) Considerations
- Each instance and service must have its own row in the PSDBOWNER table. PSDBOWNER table must have as many rows as the number of database instances in cluster plus number of services in database.
- If the batch servers are on database nodes then set USELOCALORACLEDB=1. By default process scheduler connects to database using sqlnet even its running locally and uses TCP/IP. If we set UseLocalOracleDB=1 in process scheduler domain configuration file(prcs.conf), it will use bequeath rather than TCP/IP and will improve performance. If we set UseLocalOracleDB=1, we have to set ORACLE_SID in peoplesoft users profile otherwise process scheduler will not boot.
- For REN (Remote Event Notification) server work to properly, DB_NAME parameter should match in Application server domain and Process scheduler domain configuration which is being used to run the report. In the case of RAC, we should always use Service name for App and batch server as database name, so it will match the DB_NAME for REN server to work, as well as balance the load across all instances.
- See Note 747587.1 regarding PeopleSoft Enterprise PeopleTools Certifications
Tools/Utilities for Diagnosing and Working with Oracle Support
- Install and run OSWATCHER (OSW) proactively for OS resource utilization diagnosability. OSW is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid diagnosing performance issues that is designed to run continuously and to write the metrics to ASCII files which are saved to an archive directory. The amount of archived data saved and frequency of collection are based on user parameters set when starting OSW. It is highly recommended that OSW be installed and run continuously on ALL cluster nodes, at all times. Note 301137.1. Be sure to use separate directories per node for storing OSW output. When using OSWatcher in a RAC environment, each node must write its output files to a separate archive directory. Combining the output files under one archive (on shared storage) is not supported and causes the OSWg tool to crash. Shared storage is fine, but each node needs a separate archive directory.
- Use the ASM command line utility (ASMCMD) to manage Automatic Storage Management (ASM). Oracle database 10gR2 provides two new options to access and manage ASM files and related information via command line interface - asmcmd and ASM ftp. Note 332180.1 discusses asmcmd and provides sample Linux shell script to demonstrate the asmcmd in action.
- Use the cluster deinstall tool to remove CRS install - if needed. The clusterdeconfig tool removes and deconfigures all of the software and shared files that are associated with an Oracle Clusterware or Oracle RAC Database installation. The clusterdeconfig tool removes the software and shared files from all of the nodes in a cluster. Reference: http://www.oracle.com/technology/products/database/clustering/index.html
- Use diagcollection.pl for CRS diagnostic collections. Located in $ORA_CRS_HOME/bin as part of a default installation. Note 330358.1
- On Windows and Linux Platforms, the Cluster Health Monitor can be used to track OS resource consumption and collect and analyze data cluster-wide. For more information, and to download the tool, refer to the following link on OTN: http://www.oracle.com/technology/products/database/clustering/ipd_download_homepage.html
11gR2 Specific Considerations
- Understanding SCAN VIP: http://www.oracle.com/technology/products/database/clustering/pdf/scan.pdf
- Review the attached presentation: Upgrade to Oracle Real Application Clusters 11g Release 2 - Key Success Factors
- Upgrading from Oracle Database 10g to 11g: What to expect from the Optimizer: http://www.oracle.com/technology/products/bi/db/11g/pdf/twp_upgrading_10g_to_11g_what_to_expect_from_optimizer.pdf
- Review the note: 11gR2 Clusterware and Grid Home - What You Need to Know - Note:1053147.1
RAC Platform Generic References
CRS / RAC Related References
- Note 220970.1 RAC: Frequently Asked Questions
- Note 293819.1 Placement of voting and OCR disk files in 10gRAC
- Note 239998.1 10g RAC How To Clean Up After a Failed CRS Install
- Note 399482.1 How to recreate OCR/Voting disk accidentally deleted
- Note 240001.1 10g RAC: Troubleshooting CRS Root.sh Problems
- Note 270512.1 Adding a Node to a 10g RAC Cluster
- Note 184875.1 How To Check The Certification Matrix for Real Application Clusters
- Note 289690.1 Data Gathering for Troubleshooting CRS Issues
- Note 556679.1 Data Gathering for Troubleshooting RAC Issues
- Note 135714.1 Script to Collect RAC Diagnostic Information (racdiag.sql)
- Note 272332.1 CRS 10g Diagnostic Collection Guide
- Note 428681.1 How to ADD/REMOVE/REPLACE/MOVE Oracle Cluster Registry (OCR) and Voting Disk
- Note 283684.1 How to Change Interconnect/Public Interface IP Subnet in a 10g Cluster
- Note 787420.1 Cluster Interconnect in Oracle 10g and 11g
- Note 276434.1 Modifying the VIP or VIP Hostname of a 10g Oracle Clusterware Node
- Note 403743.1 VIP Failover Take Long Time After Network Cable Pulled
- Note 453309.1 Fix for Bug 5454831 Can cause ORA-600 errors in RAC/CRS Environments
- Note 259301.1 CRS and 10g Real Application Clusters
- Note 294430.1 CSS Timeout Computation in RAC 10g (10g Release 1 and 10g Release 2)
- Note 265769.1 Troubleshooting CRS Reboots
- Note 401783.1 Changes in Oracle Clusterware after applying 10.2.0.3 Patchset
- Note 567730.1 Changes in Oracle Clusterware on Linux with the 10.2.0.4
- Note 357808.1 Diagnosability for CRS / EVM / RACG
- Note 316817.1 CLUSTER VERIFICATION UTILITY FAQ
- Note 465840.1 Configuring Temporary Tablespaces for RAC Databases for Optimal Performance
- Note 181489.1 Tuning Inter-Instance Performance in RAC and OPS
- Note 559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions
- Note 754305.1 Announcement on using Raw devices with release 11.2
- Note 341788.1 Recommendation for the Real Application Cluster Interconnect and Jumbo Frames
- Note 279793.1 How to Restore a Lost Voting Disk in 10g
- Note 219361.1 Troubleshooting ORA-29740 in a RAC Environment
- Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide 10g Release 2 (10.2): http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/toc.htm
- Oracle Homes in an Oracle Real Application Clusters Environment An Oracle White Paper - January 2008: http://www.oracle.com/technology/products/database/clustering/pdf/oh_rac.pdf
- Real Application Clusters: http://www.oracle.com/technology/products/database/clustering/index.html
- Client Failover Best Practices for Highly Available Oracle Databases: Oracle Database 10g Release 2: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_ClientFailoverBestPractices.pdf
- Using Standard NFS To Support A Third Voting Disk On A Stretch Cluster Configuration: http://www.oracle.com/technology/products/database/clustering/pdf/thirdvoteonnfs.pdf
- Best Practices for Using XA with RAC: http://www.oracle.com/technology/products/database/clustering/pdf/bestpracticesforxaandrac.pdf
- Data Warehousing on Oracle RAC Best Practices: http://www.oracle.com/technology/products/database/clustering/pdf/bp_rac_dw.pdf
- Note 339939.1 Running Cluster Verification Utility to Diagnose Install
- Note 316817.1 CLUSTER VERIFICATION UTILITY FAQ
- Note 428681.1 OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices
- Note 363254.1 Applying one-off Oracle Clusterware patches in a mixed version home environment
- Note 332257.1 Using Oracle Clusterware with Vendor Clusterware FAQ
- Note 759895.1 The ONS Daemon Explained In RAC/CRS environment
RAC / RDBMS Related References
- Note 359395.1 Remote Diagnostic Agent (RDA) 4 - RAC Cluster Guide
- Note 300548.1 How To Configure SSH for a RAC Installation
- Note 359515.1 Mount Options for Oracle files when used with NAS devices
- Note 77483.1 External Support FTP site: Information Sheet
- Note 316900.1 ALERT: Oracle 10g Release 2 (10.2) Support Status and Alerts
- Note 454507.1 Oracle 11g Release 1 (11.1) Support Status and Alerts
- Note 260986.1 Setting Listener Passwords With an Oracle 10g Listener
- Note 390483.1 DRM - Dynamic Resource management
- Note 188134.1 Tracing the Database Configuration Assistant (DBCA)
- Note 160178.1 How to set EVENTS in the SPFILE
- Note 331168.1 Oracle Clusterware consolidated logging in 10gR2/11
- Note 1051056.6 How to Set Multiple Events in INIT.ORA
- Note 460982.1 How To Configure Server Side Transparent Application Failover
- Note 466181.1 10g Upgrade Companion
- Note 601807.1 Oracle 11g Upgrade Companion
- Note 135063.1 How To Change the Listener Log Filename Without Stopping the Listener
- Note 300903.1 Load balancing with RAC
- Note 438452.1 Performance Tools Quick Reference Guide
- Note 394937.1 Statistics Package (STATSPACK) Guide
- Note 280939.1 Checklist for Performance Problems with Parallel Execution
- Note 181489.1 Tuning Inter-Instance Performance in RAC and OPS
- Note 359536.1 Systemstate dump when connection to the instance is not possible
- Note 736752.1 Introducing Oracle instantaneous Problem detection - OS tool (IPD/OS)
- Note 563566.1 gc lost blocks diagnostics
- Note 92602.1 How to Password Protect the Listener
- Note 453293.1 10g & 11g :Configuration of TAF(Transparent Application Failover) and Load Balancing
- Note 404644.1 Configuration of Transparent Application Failover(TAF) works with server side service
- Note 602419.1 LMS not running in RT (real time) mode in 10.2.0.3 RAC database
- Note 44694.1 SQL*Net Packet Sizes (SDU & TDU Parameters)
- Note 858279.1 ASM and Database Instance hang when exceeding around 1800
sessions- Note 454506.1 11.1.0.6 Base Release - Availability and Known Issues
VIP References
- Note 298895.1 Modifying the default gateway address used by the Oracle 10g VIP
- Note 338924.1 CLUVFY Fails With Error: Could not find a suitable set of interfaces for VIPs
ASM References
- Note 351117.1 Information to gather when diagnosing ASM space issues
- Note 6453944.8 Bug 6453944 - ORA-15196 with ASM disks larger than 2TB
- Automatic Storage Management: http://www.oracle.com/technology/products/database/asm/index.html
- Note 265633.1 ASM Technical Best Practices
- Note 337737.1 Oracle Clusterware - ASM - Database Version Compatibility
11.2 References
- Note 1050693.1 Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
- Note 1053147.1 11gR2 Clusterware and Grid Home - What You Need to Know
Infiniband References
- Note 751343.1 RAC Support for RDS Over Infiniband
- Note 761804.1 Oracle Reliable Datagram Socets (RDS) and Infiniband (IB) Support (For Linux x86 and x86-64 Platforms)
- Vendor Paper from Voltaire on implementing RAC with Infiniband:
Voltaire_InfiniBand_for_Oracle_RAC_Starter_Kit_1.0MAA / Standby References
Oracle's Maximum Availability Architecture (MAA) provides superior data protection and availability by minimizing or eliminating planned and unplanned downtime at all technology stack layers including hardware or software components. Data protection and high availability are achieved regardless of the scope of a failure event - whether from hardware failures that cause data corruptions or from catastrophic acts of nature that impact a broad geographic area.
MAA also eliminates guesswork and uncertainty when implementing a high availability architecture utilizing the full complement of Oracle HA technologies. RAC is an integral component of the MAA Architecture, but is just one piece of the MAA strategy. The following references will provide more background and refrences on the Oracle MAA Strategy:
- 11g/10g MAA and HA information and articles: http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
- Rapid Oracle RAC Standby Deployment: Oracle Database 11g Release 2: http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_11g_rac_standby.pdf
- Platform Migration Using Transportable Database Oracle Database 11g and 10g Release 2: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_PlatformMigrationTDB.pdf
- Coherence Planning: From Proof of Concept to Production: http://www.oracle.com/technology/products/coherence/pdf/Oracle_Coherence_Planning_WP.pdf
- The Right Choice for Disaster Recovery: Data Guard, Stretch Clusters or Remote Mirroring: http://www.oracle.com/technology/deploy/availability/pdf/DRChoices_TWP.pdf
- Oracle Real Application Clusters On Extended Distance Clusters: http://www.oracle.com/technology/products/database/clustering/pdf/ExtendedRAC10gR2.pdf
- Data Guard 11g Installation and Configuration On Oracle RAC Systems: http://www.oracle.com/technology/deploy/availability/pdf/dataguard11g_rac_maa.pdf
- Oracle Active Data Guard Oracle Data Guard 11g Release 1: http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_11gr1_activedataguard.pdf
- Configuring Oracle BI EE Server with Oracle Active Data Guard: http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_11g_biee_activedataguard.pdf
- Using Recovery Manager with Oracle Data Guard in Oracle Database 10g: http://www.oracle.com/technology/deploy/availability/pdf/RMAN_DataGuard_10g_wp.pdf
- Extended Datatype Support: SQL Apply and Streams: http://www.oracle.com/technology/deploy/availability/pdf/maa_edtsoverview.pdf
- Oracle Data Guard and Remote Mirroring Solutions: http://www.oracle.com/technology/deploy/availability/htdocs/DataGuardRemoteMirroring.html
- Fast-Start Failover Best Practices: Oracle Data Guard 10g Release 2: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_FastStartFailoverBestPractices.pdf
- 11g/10g MAA and HA information and articles: http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
- SQL Apply Best Practices: Oracle Data Guard 10/g /Release 2: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_SQLApplyBestPractices.pdf
- MAA/Data Guard 10g Setup Guide: Creating a RAC Physical Standby for a RAC Primary: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10g_RACPrimaryRACPhysicalStandby.pdf
- Database Upgrade Using Transportable Tablespaces: Oracle Database 11g Release 1 http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_11g_upgradetts.pdf
- MAA_WP_10gR2_RACPrimaryRACLogicalStandby.pdf: http://download.oracle.com/docs/cd/B19306_01/server.102/b25159/migrate.htm
- Data Guard Redo Apply and Media Recovery Best Practices: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_RecoveryBestPractices.pdf
- Data Guard Redo Transport & Network Best Practices Oracle Database 10g Release 2: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_DataGuardNetworkBestPractices.pdf
- Oracle - Golden Gate Statement of Direction: http://www.oracle.com/technology/products/goldengate/htdocs/statement-of-direction-gg.pdf
- Note 239100.1 TRANSPORT: Data Guard Protection Mode
- Note 275977.1 Data Guard Broker High Availability
- Note 312434.1 Oracle10g Data Guard SQL Apply Troubleshooting
- Note 387450.1 MAA - SQL Apply Best Practices 10gR2
- Note 273015.1 Migrating to RAC using Data Guard
- Note 413484.1 Data Guard Support for Heterogeneous Primary and Standby Systems in Same Data Guard Configuration
- Note 414043.1 Role Transitions for Data Guard Configurations Using Mixed Oracle Binaries
- Note 751600.1 10.2 Data Guard Physical Standby Switchover
- Note 459411.1 Steps to recreate a Physical Standby Controlfile
- Note 858975.1 How To Create Standby Control File Placed In A Raw Device
Patching References
- Note 854428.1 Intro to Patch Set Updates (PSU)
- Note 850471.1 Oracle Announces First Patch Set Update For Oracle Database Release 10.2
- Note 756671.1 Oracle Recommended Patches -- Oracle Database
- Note 567631.1 How to Check if a Patch requires Downtime?
- Note 761111.1 Online Patches
- Note 438314.1 Critical Patch Update - Introduction to Database n-Apply CPUs
- Note 405820.1 10.2.0.X CRS Bundle Patch Information
- Note 810663.1 11.1.0.X CRS Bundle Patch Information
- Note 742060.1 Release Schedule of Current Database Patch Sets
- Note 363254.1 Applying one-off Oracle Clusterware patches in a mixed version home environment
- Note 550522.1 How To Avoid Disk Full Issues Because OPatch Backups Take Big Amount Of Disk Space.
- Note 555579.1 10.2.0.4 Patch Set - Availability and Known Issues
Upgrade References
- Database Rolling Upgrade Using Data Guard SQL Apply Oracle Database 11g and 10gR2: http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_10gr2_rollingupgradebestpractices.pdf
- Database Rolling Upgrade Using Transient Logical Standby: Oracle Data Guard 11g: http://www.oracle.com/technology/deploy/availability/pdf/maa_wp_11g_transientlogicalrollingupgrade.pdf
E-Business References
- 11g E-business white papers: http://www.oracle.com/apps_benchmark/html/white-papers-e-business.html
- Note 455398.1 Using Oracle 11g Release 1 Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 11i (11.1.0.7)
- Note 388577.1 Using Oracle 10g Release 2 Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 12
- Note 559518.1 Cloning Oracle E-Business Suite Release 12 RAC-Enabled Systems with Rapid Clone
- Note 165195.1 Using AutoConfig to Manage System Configurations with Oracle Applications 11i
- Note 294652.1 E-Business Suite 11i on RAC : Configuring Database Load balancing & Failover
- Note 362135.1 Configuring Oracle Applications Release 11i with 10g R2 RAC and ASM
- Note 362203.1 Oracle Applications Release 11i with Oracle 10g Release 2 (10.2.0)
- Note 241370.1 Concurrent Manager Setup and Configuration Requirements in an 11i RAC Environment
- Note 240818.1 Concurrent Processing: Transaction Manager Setup and Configuration Requirement in an 11i RAC Environment
Unix References
- Refer to UNIX commands for most platforms from unixguide: http://www.unixguide.net/unixguide.pdf
Weblogic/RAC References
- Using WebLogic Server with Oracle RAC: http://e-docs.bea.com/wls/docs92/jdbc_admin/oracle_rac.html
My Oracle Support (formerly MetaLink) Knowledge DocumentsReferences Related to Working with Oracle Support
- Note 736737.1 My Oracle Support - The Next Generation Support Platform
- Note 730283.1 Get the most out of My Oracle Support
- Note 747242.1 My Oracle Support Configuration Management FAQ
- Note 209768.1 Database, FMW, Em Grid Control, and OCS Software Error Correction Support Policy
- Note 868955.1 My Oracle Support Health Checks Catalog
Process Oriented and Self Service Notes
- Note 374370.1 New Customers Start Here
- Note 166650.1 Working Effectively With Global Customer Support
- Note 199389.1 Escalating Service Requests with Oracle Support Services
- Note 77483.1 External Support FTP site: Information Sheet
Service Request Diagnostics
- Note 301137.1 OS Watcher User Guide
- Note 459694.1 Procwatcher: Script to Monitor and Examine Oracle and CRS
Modification History
[11-Aug-2009] created this Modification History section
[21-Aug-2009] added ORA-12545 suggestion
[16-Sep-2009] changed IPD/OS to new name: Cluster Health Monitor
[22-Sep-2009] added opatch patch number
[29-Sep-2009] clarified support of OATM in RAC environments
[09-Oct-2009] added odd # of voting disks recommendation and reference to Health Check catalog note
[23-Oct-2009] added reference to space considerations while patching and 11.1 CRS patch bundle reference
[10-Nov-2009] uploaded new version of RAC System Load Testing white paper
[12-Nov-2009] added 11gR2 specific section
[24-Nov-2009] added Infiniband References
[20-Nov-2009] added link to 11gR2 upgrade presentation and reference to 555579.1 and 454506.1
[09-Dec-2009] added 'REN' success factor
[21-Dec-2009] added reference to Rapid Oracle RAC Standby Deployment white paper, Golden Gate reference, created Oracle VM section, added optimizer reference to the 11gR2 section, added reference to PeopleSoft Enterprise PeopleTools Certifications
[7-Jan-2010] added some MAA/Standby reference links
[19-Jan-2010] added reference to Note 1050693.1
[27-Jan-2010] added reference to Note 1053147.1 11gR2 Clusterware and Grid Home - What You Need to Know
[28-Jan-2010] modified diagwait best practice to include information on 11gR2
[1-Feb-2010] added reference to Note 949322.1 Oracle11g Data Guard: Database Rolling Upgrade Shell Script
[3-Feb-2010] added reference to Database Upgrade Using Transportable Tablespaces
Attachments
|
No hay comentarios:
Publicar un comentario