sábado, 6 de febrero de 2010

RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic)

RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic) [ID 810394.1]  

  Modified 03-FEB-2010     Type BULLETIN     Status PUBLISHED  

In this Document
  Purpose
  Scope and Application
  RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic)
     RAC Platform Specific Starter Kits and Best Practices
     
     RAC Platform Generic Load Testing  and System Test Plan Outline
     
     RAC Platform Generic Highlighted Recommendations
     
     RAC Platform Generic Best Practices
     Getting Started - Preinstallation and Design Considerations
     Clusterware Considerations
     Networking Considerations
     Storage Considerations
     Installation Considerations
     Patching Considerations
     Upgrade Considerations
     Oracle VM Considerations
     Database Initialization Parameter Considerations
     Performance Tuning Considerations
     General Configuration Considerations
     E-Business Suite (with RAC) Considerations
     Peoplesoft (with RAC) Considerations
     Tools/Utilities for Diagnosing and Working with Oracle Support
     11gR2 Specific Considerations
     RAC Platform Generic References
     CRS / RAC Related References
     RAC / RDBMS Related References
     VIP References
     ASM References
     11.2 References
     Infiniband References
     MAA / Standby References
     Patching References
     Upgrade References
     E-Business References
     Unix References
     Weblogic/RAC References
     References Related to Working with Oracle Support


Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.1.0 - Release: 10.2 to 11.2
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.7
Information in this document applies to any platform.

Purpose

The goal of the Oracle Real Application Clusters (RAC) Starter Kit is to provide you with the latest information on generic and platform specific best practices for implementing an Oracle RAC cluster. This document is compiled and provided based on Oracle's experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit. 

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.

Scope and Application

This article is intended for use by all new (and existing) Oracle RAC implementers.

RAC Assurance Support Team: RAC Starter Kit and Best Practices (Generic)

RAC Platform Specific Starter Kits and Best Practices

While this note focuses on Generic RAC Best Practices, the following notes contain detailed platform specific best practices. Please refer to the below notes for more specifics, including example step-by-step install cookbooks, and sample system test plans.

Note 811306.1 RAC Assurance Support Team:   RAC Starter Kit and Best Practices (Linux)
Note 811280.1 RAC Assurance Support Team:   RAC Starter Kit and Best Practices (Solaris)
Note 811271.1 RAC Assurance Support Team:   RAC Starter Kit and Best Practices (Windows)
Note 811293.1 RAC Assurance Support Team:   RAC Starter Kit and Best Practices (AIX)
Note 811303.1 RAC Assurance Support Team:   RAC Starter Kit and Best Practices (HP-UX)


RAC Platform Generic Load Testing  and System Test Plan Outline

A critical component of any successful implementation, particularly in the High Availability arena, is testing.  For a RAC environment, testing should include both load generation, to monitor and measure how the system works under heavy load, and a system test plan, to understand how the system reacts to certain types of failures.   To assist with this type of testing, this document contains links to documents to get you started in both of these areas.

Click here for a White Paper on available RAC System Load Testing Tools
Click here for a platform generic RAC System Test Plan Outline

Use these documents to validate your system setup and configuration, and also as a means to practice responses and establish procedures in case of certain types of failures.


RAC Platform Generic Highlighted Recommendations

Highlighted Recommendations are recommendations that are thought to have the greatest impact, or answer most commonly addressed questions or issues. In this case, Generic Highlighted Recommendations talk about commonly asked or encountered issues that are generic to RAC implementations across all platforms.


RAC Platform Generic Best Practices

Beyond the Highlighted Recommendations above, the RAC Assurance Team has recommendations for various different parts/components of your RAC setup. These additional recommendations are broken into categories and listed below.

Getting Started - Preinstallation and Design Considerations

  • Check with the Disk Vendor that the Number of Nodes, OS version, RAC version, CRS version, Network fabric, and Patches are certified, as some Storage/San vendors may require special certification for a certain number of nodes.
  • Use both external and Oracle provided redundancy for the OCR and Voting disks.  Note 428681.1 explains how to add OCR mirror and how to add additional voting disks.
  • Check the support matrix to ensure supportability of product, version and platform combinations or for understanding any specific steps which need to be completed which are extra in the case of some such combinations.  Note 337737.1
  • Avoid SSH and XAUTH warning before RAC 10G installation. Reference Note 285070.1
  • Consider configuring the system logger to log messages to one central server.
  • For CRS, ASM, and Oracle ensure one unique User ID with a single name, is in use across the cluster. Problems can occur accessing OCR keys when multiple O/S users share the same UID. Also this results in logical corruptions and permission problems which are hard to diagnose.
  • Make sure machine clocks are synchronized on all nodes to the same NTP source.
    Implementing NTP (Network Time Protocol) on all nodes prevents evictions and helps to facilitate problem diagnosis. Use the -x option (ie. ntpd -x, xntp -x) if available to prevent time from moving backwards in large amounts. This slewing will help reduce time changes into multiple small changes, such that they will not impact Oracle Clusterware. Note 759143.1
  • Eliminate any single points of failure in the architecture. Examples include (but are not limited to):  Cluster interconnect redundancy (NIC bonding etc), multiple access paths to storage, using 2 or more HBA's or initiators and multipathing software, and Disk mirroring/RAID
  • Plan and document capacity requirements.  Work with server vendor to produce detailed capacity plan and system configuration, but consider:  Use normal capacity planning process to estimate number of CPUs required to run workload. Both SMP and RAC clusters have synchronization costs as the number of CPUs increase. SMPs normally scale well for small number of CPUs, RAC clusters normally scale better than SMPs for large number of CPUs. Typical synchronization cost: 5-20%
  • Use proven high availability strategies.  RAC is one component in a high availability architecture. Make sure all parts are covered.  Review Oracle's Maximimum Availability Architecture recommendations and references further down in this document. 
  • It is strongly advised that a production RAC instance does not share a node with a DEV, TEST, QA or TRAINING instance. These extra instances can often introduce unexpected performance changes into a production environment.
  • Configure Servers to boot from SAN disk, rather than local disk for easier repair, quick provisioning and consistency.

Clusterware Considerations

  • Configure 3 or more voting disks (always an odd number).  This is because losing 1/2 or more of all of your voting disks will cause nodes to get evicted from the cluster, or nodes to kick themselves out of the cluster.

Networking Considerations

  • Underscores should not be used in a host or domainname according to RFC952 - DoD Internet host table specification. The same applies for Net, Host, Gateway, or Domain name. Reference: http://www.faqs.org/rfcs/rfc952.html
  • Ensure the default gateway is on the same subnet as the VIP. Otherwise this can cause problems with racgvip and cause the vip and listener to keep restarting.
  • Make sure network interfaces have the same name on all nodes. This is required. To check - use ifconfig (on Unix) or ipconfig (on Windows).
  • Use Jumbo Frames if supported and possible in the system. Reference: Note 341788.1
  • Use non-routable network addresses for private interconnect; Class A: 10.0.0.0 to 10.255.255.255, Class B: 172.16.0.0 to 172.31.255.255, Class C: 192.168.0.0 to 192.168.255.255.  Reference: http://www.faqs.org/rfcs/rfc1918.html and Note 338924.1
  • Make sure network interfaces are configured correctly in terms of speed, duplex, etc. Various tools exist to monitor and test network: ethtool, iperf, netperf, spray and tcp. Note 563566.1
  • Configure nics for fault tolerance (bonding/link aggregation). Note 787420.1.
  • Performance: check for faulty switches, bad hba's or ports which drop packets. Most cases we see with network related evictions is when either there is too much traffic on the interconnect (so the interconnect capacity is exhausted which is where aggregation or some other hardware solution helps) or the switch, network card is not configured properly and this is evident from the "netstat -s | grep udp" settings (if using UDP protocol for IPC for RAC) where this will register underflows (buffer size configuration for UDP) or errors due to bad ports, switches, network card, network card settings. Please review the same in the context of errors reported from packets sent through the interface.
  • For more predictable hardware discovery, place hba and nic cards in the same corresponding slot on each server in the Grid.
  • Ensure that all network cables are terminated in a grounded socket. A switch is required for the private network. Use dedicated redundant switches for private interconnect and VLAN considerations. RAC and Clusterware deployment best practices recommend that the interconnection be deployed on a stand-alone, physically separate, dedicated switch.
  • Deploying the RAC/Clusterware interconnect on a shared switch, segmented VLAN may expose the interconnect links to congestion and instability in the larger IP network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or multiple switches. Deployment concerns in this environment include Spanning Tree loops when the larger IP network topology changes, Asymmetric routing that may cause packet flooding, and lack of fine grained monitoring of the VLAN/port.
  • Consider using Infiniband on the interconnect for workloads that have high volume requirements.   Infiniband can also improve performance by lowering latency, particularly with Oracle 11g, with the RDS protocol.  See Note 751343.1.
  • Configure IPC address first in listener.ora address list. For databases upgraded from earlier versions to 10gR2 the netca did not configure the IPC address first in the listener.ora file. In 10gR2 this is the default but if you upgrade this isn't changed unless you do it manually. Failure to do so can adversely impact the amount of time it takes the VIP to fail over if the public network interface should fail. Therefore, check the 10gR1 and 10gR2 listener.ora file. Not only should the IPC address be contained in the address list but it should be FIRST. Note 403743.1
  • Increase the SDU (and in older versions the TDU as well) to a higher value (e.g. 4KB 8KB, up to 32KB), thus reducing round trips on the network, possibly decreasing response time and over all perceived user responsiveness of the system.  Note 44694.1
  • To avoid ORA-12545 errors, ensure that client HOSTS files and/or DNS are furnished with both VIP and Public hostnames.

Storage Considerations

  • Ensure Correct Mount Options for NFS Disks when RAC is used with NFS.The documented mount options are detailed in Note 359515.1 for each platform. 
  • Implement multiple access paths to storage array using two or more HBAs or initiators with multi-pathing software over these HBAs. Where possible, use the pseudo devices (multi-path I/O) as the diskstring for ASM. Examples are: EMC PowerPath, Veritas DMP, Sun Traffic Manager, Hitachi HDLM, IBM SDDPC, Linux 2.6 Device Mapper. This is useful for I/O loadbalancing and failover. Reference: Note 294869.1 and Note 394956.1
  • Adhere to ASM best practices. Reference: Note 265633.1 ASM Technical Best Practices
  • ORA-15196 (ASM block corruption) can occur, if LUNs larger than 2TB are presented to an ASM diskgroup. As a result of the fix, ORA-15099 will be raised if a disk larger than 2TBis specified. This is irrespective of the presence of asmlib. Workaround: Do not add more than 2 TB size disk to a diskgroup. Reference: Note 6453944.8
  • On some platforms repeat warnings about AIO limits may be seen in the alert log:
    "WARNING:Oracle process running out of OS kernel I/O resources." Apply patch 6687381, available on many platforms. This issue affects 10.2.0.3, 10.2.0.4, and 11.1.0.6. It is fixed in 11.1.0.7. Note 6687381.8
  • Create two ASM disk groups, one for database area and one for flash recovery area, on separate physical disks. RAID storage array LUNs can be used as ASM disks to minimize the number of LUNs presented to the OS . Place database and redo log files in database area.

Installation Considerations

  • Check Cluster Prequisites Using cluvfy (Cluster Verification Utility). Use cluvfy at all stages prior to and during installation of Oracle software. Also, rather than using the version on the installation media, it is crucial to download the latest version of cluvfy OTN: http://www.oracle.com/technology/products/database/clustering/cvu/cvu_download_homepage.html. Note 339939.1 and Note 316817.1 contain more relevant information on this topic.
  • It is recommended to patch the Clusterware Home to the desired level before doing any RDBMS or ASM home install.
    For example, install clusterware 10.2.0.1 and patch to 10.2.0.4 before installing 10.2.0.1 RDBMS.
  • Install ASM in a separate ORACLE_HOME from the database for maintenance and availability reasons (eg., to independently patch and upgrade).
  • If you are installing Oracle Clusterware as a user that is a member of multiple operating system groups, the installer installs files on all nodes of the cluster with group ownership set to that of the user's current active or primary group.  Therefore:  ensure that the first group listed in the file /etc/ group is the current active group OR invoke the Oracle Clusterware installation using the following additional command line option, to force the installer to use the proper group when setting group ownership on all files:  runInstaller s_usergroup=current_active_group (Bug 4433140)

Patching Considerations

This section is targeted towards customers beginning a new implementation of Oracle Real Application Clusters, or customers who are developing a proactive patching strategy for an existing implementation. For new implementations, it is strongly recommended that the latest available patchset for your platform be applied at the outset of your testing. In cases where that latest version of the RDBMS cannot be used because of lags in internal or 3rd party application certification or due to other limitations, it is still supported to have the CRS Home and ASM Homes running at a later patch level than the RDBMS Home, therefore, it may still be possible to run either the CRS or ASM Home at the latest patchset level. As a best practice (with some exceptions, see the Note in the references section below), Oracle Support recommends that the following be true:
  • The CRS_HOME must be at a patch level or version that is greater than or equal to the patch level or version of the ASM Home. The CRS_HOME must be a patch level or version that is greater than or equal to the patch level or version of the RDBMS home.
  • The ASM_HOME must be at a patch level or version that is greater than or equal to the patch level or version of the RDBMS Home. The ASM_HOME must be a patch level or version that is equal to but not greater than the patch level or version of the CRS_HOME. 
  • Before patching the database, ASM or clusterware homes using opatch check the available space on the filesystem and use Note:550522.1 in order to estimate how much space will be needed and how to handle the situation if the filesystem should fill up during the patching process.
  • Review known issues specific to the 10.2 0.4.0 patchset:  Note 555579.1.

    For more detailed notes and references on patching in a RAC environment, see the patching section below, in the "RAC Platform Generic References" section at the end of this note.

Upgrade Considerations

  • Begin with minimum version 10.2.0.3 when upgrading 10.2.0.X to 11.X
  • Use rolling upgrades where appropriate for Oracle Clusterware (CRS) Note 338706.1.  For detailed upgrade assistance, refer to the appropriate Upgrade Companion for your release:  Note 466181.1 10g Upgrade Companion and Note 601807.1 Oracle 11gR1 Upgrade Companion
  • For information about upgrading a database using a transient logical standby, refer to:  Note 949322.1 : Oracle11g Data Guard: Database Rolling Upgrade Shell Script

Oracle VM Considerations

Database Initialization Parameter Considerations

  • Set PRE_PAGE_SGA=false. If set to true, it can significantly increase the time required to establish database connections. In cases where clients might complain that connections to the database are very slow then consider setting this parameter to false, doing so avoids mapping the whole SGA and process startup and thus saves connection time.
  • Set PARALLEL_MIN_SERVERS to CPU_COUNT-1. This will pre-spawn recovery slaves at startup time and will avoid having to spawn them when recovery is required which could delay recovery due to the fact that slaves are started in serial. Note that SGA memory for PX msg pool will be allocated for all PARALLEL_MAX_SERVERS if you set PARALLEL_MIN_SERVERS.
  • Tune PARALLEL_MAX_SERVERS to your hardware. Start with (2 * ( 2 threads ) *(CPU_COUNT)) = 4 x CPU count and repeat test for higher values with test data.
  • Consider setting FAST_START_PARALLEL_ROLLBACK. This parameter determines how many processes are used for transaction recovery, which is done after redo application. Optimizing transaction recovery is important to ensure an efficient workload after an unplanned failure. As long as the system is not CPU bound, setting this to a value of HIGH is a best practice. This causes Oracle to use four times the CPU count (4 X cpu_count) parallel processes for transaction recovery. The default for this parameter is LOW, or two times the CPU count (2 X cpu_count).
  • Set FAST_START_MTTR_TARGET to a non-zero value in seconds. Crash recovery will complete within this desired time frame.
  • In 10g and 11g databases, init parameter ACTIVE_INSTANCE_COUNT should no longer be set. This is because the RACG layer doesn't take this parameter into account. As an alternative, you should create a service with one preferred instance.
  • Increase PARALLEL_EXECUTION_MESSAGE_SIZE from default (normallly 2048) to 8192. This can be set higher for datawarehousing based systems where there is a lot of data transferred through PQ.
  • Set OPTIMIZER_DYNAMIC_SAMPLING = 1 or simply analyze your objects because 10g Dynamic sampling can generate extra CR buffers during execution of SQL statements.
  • Tune DataGuard to avoid cluster related waits. Improperly tuned DataGuard settings can cause high LOG FILE SYNC WAIT and GLOBAL CACHE LOG FLUSH TIME. Reference: http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_DataGuardNetworkBestPractices.pdf, http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_RecoveryBestPractices.pdf, http://www.oracle.com/technology/deploy/availability/pdf/MAA_WP_10gR2_Roadmap.pdf

Performance Tuning Considerations

In any database system, RAC or single instance, the most significant performance gains are usually obtained from traditional application tuning techniques. The benefits of those techniques are even more remarkable in a RAC database.
  • Many sites run with too few redo logs or with logs that are sized too small. With too few redo logs configured, there is the potential that the archiver process(es) cannot keep up which could cause the database to stall. Small redo logs cause frequent log switches, which can put a high load on the buffer cache and I/O system. As a general practice each thread should have at least three redo log groups with two members in each group.
    Oracle Database 10g introduced the Redo Logfile Size Advisor which determines the optimal, smallest online redo log file size based on the current FAST_START_MTTR_TARGET setting and corresponding statistics. Thus, the Redo Logfile Size Advisor is enabled only if FAST_START_MTTR_TARGET is set.
A new column is added to V$INSTANCE_RECOVERY. This column shows the redo log file size (in megabytes) that is considered to be optimal based on the current FAST_START_MTTR_TARGET setting. It is recommended that you set all online redo log files to at least this value.
  • Avoid and eliminate long full table scans in OLTP environments.
  • Use Automatic Segment Space Management (ASSM). Hard to avoid in 10gR2 and higher. All tablespaces except system, temp, and undo should use ASSM.
  • Increasing sequence caches in insert intensive applications improves instance affinity to index keys deriving their values from sequences.  Increase the Cache for Application Sequences and some System sequences for better performance. Use a large cache value of maybe 10,000 or more. Additionaly use of the NOORDER attribute is most effective, but it does not guarantee sequence numbers are generated in order of request (this is actually the default.)
  • The default setting for the SYS.AUDSES$ sequence is 20, this is too low for a RAC system where logins can occur concurrently from multiple nodes.  Refer to Note 395314.1.

General Configuration Considerations

  • In 10gR2 and above the LMS process is intended to run in the real time scheduling class. In some instances we have seen this prevented due to incorrect ownership or permissions for the oradism executable which is stored in the $ORACLE_HOME/bin directory. See Note 602419.1 for more details on this.
  • Avoid SETTING ORA_CRS_HOME environment variable. Setting this variable can cause problems for various Oracle components, and it is never necessary for CRS programs because they all have wrapper scripts.
  • Use Enterprise Manager or Grid Control to create database services - all features available in one tool. For 10.2 and 10.1 one can use dbca to create these services and hence define the preferred and available instances for these services as part of database creation. However in 11.1.0.6 this is only available in Enterprise Manager and has been removed from DBCA.
  • Configure Oracle Net Services load balancing properly to distribute connections. Load balancing should be used in combination with 10g Workload Services to provide the highest availability. The CLB_GOAL attribute of 10g workload services should be configured appropriately depending upon application requirements. Different workloads might require different load balancing goals. Use separate services for each workload with different CLB_GOAL.
  • Ensure the NUMA (Non Uniform Memory Architecture) feature is turned OFF unless explicitly required and tested, as there have been issues reported with NUMA enabled.  Refer to Note 759565.1 for more details.
  • Read and follow the Best Practices Guide for XA and RAC to avoid problems with XA transactions being split across RAC Instances. Reference: http://www.oracle.com/technology/products/database/clustering/pdf/bestpracticesforxaandrac.pdf
  • Increase retention period for AWR data from 7 days to at least one business cycle. Use the awrinfo.sql script to budget for the amount of information required to be stored in the AWR and hence sizing the same.
  • ONS spins consuming high CPU and/or memory. This is fixed in 10.2.0.4 & 11.1.0.6. Refer to Note 4417761.8 and Note 731370.1 for more details and workaround.
  • Use SRVCTL to register resources as the Oracle user (not as root user). Registering (database, instances, asm, listener, and services) resources as root can lead to inconsistent behavior. During clusterware install, nodeapps is created by the root user. Only the vip resource should be owned by root. Any other resources owned by root will need to be removed (as root) then re-created via the oracle user. Check the OCRDDUMP output for resource keys owned by root.
  • For versions 10gR2 and 11gR1, it is a best practice on all platforms to set the CSS diagwait parameter to 13 in order to provide time for dumping diagnostics in case of node evictions. Note 559365.1 has more details on diagwait.  In 11gR2 it is possible but should not be necessary to set diagwait.

E-Business Suite (with RAC) Considerations

  • Patch against known issues Bug 6142040 :  ICM DOES NOT UPDATE TARGET NODE AFTER FAILOVER and Bug 6161806 : APPSRAP: PCP NODE FAILURE IS NOT WORKING 
  • Change RAC APPS default setting to avoid slow Purchase Order approval.  Note 339508.1
  • It is recommended to set the init.ora parameter max_commit_propagation_delay= 0 in the init.ora or spfile for the E-business Suite on RAC. Note 259454.1
  • You can use Advanced Planning and Scheduling (APS) on a separate RAC (clustered). Merging APS into OLTP database and isolating the load to a separate RAC instance is supported. Refer to Knowledge Documents Note 279156.1 and Note 286729.1 for more details.
  • You can run Email Center in a RAC environment. Reference Knowledge Document Note 272266.1 for RAC related specific instructions.
  • You can run Oracle Financial Services Applications (OFSA) in a RAC environment? Refer to Knowledge Document Note 280294.1 for RAC related best practices.
  • Activity Based Management (ABM) is supported in a RAC environment. Reference Knowledge Document Note 303542.1 for RAC related best practices.
  • When using Oracle Application Tablespace Migration Utility (OATM) in a RAC environment, be sure to follow the instructions for RAC environments in Note 404954.1.

Peoplesoft (with RAC) Considerations

  • Each instance and service must have its own row in the PSDBOWNER table.  PSDBOWNER table must have as many rows as the number of database instances in cluster plus number of services in database.  
  • If the batch servers are on database nodes then set USELOCALORACLEDB=1.  By default process scheduler connects to database using sqlnet even its running locally and uses TCP/IP. If we set UseLocalOracleDB=1 in process scheduler domain configuration file(prcs.conf), it will use bequeath rather than TCP/IP and will improve performance.  If we set UseLocalOracleDB=1, we have to set ORACLE_SID in peoplesoft users profile otherwise process scheduler will not boot.
  • For REN (Remote Event Notification) server work to properly,  DB_NAME parameter should match in Application server domain and Process scheduler domain configuration which is being used to run the report.  In the case of RAC, we should always use Service name for App and batch server as database name, so it will match the DB_NAME for REN server to work, as well as balance the load across all instances.
  • See Note 747587.1 regarding PeopleSoft Enterprise PeopleTools Certifications

Tools/Utilities for Diagnosing and Working with Oracle Support

    • Install and run OSWATCHER (OSW) proactively for OS resource utilization diagnosability. OSW is a collection of UNIX shell scripts intended to collect and archive operating system and network metrics to aid diagnosing performance issues that is designed to run continuously and to write the metrics to ASCII files which are saved to an archive directory. The amount of archived data saved and frequency of collection are based on user parameters set when starting OSW. It is highly recommended that OSW be installed and run continuously on ALL cluster nodes, at all times. Note 301137.1. Be sure to use separate directories per node for storing OSW output. When using OSWatcher in a RAC environment, each node must write its output files to a separate archive directory. Combining the output files under one archive (on shared storage) is not supported and causes the OSWg tool to crash. Shared storage is fine, but each node needs a separate archive directory.
    • Use the ASM command line utility (ASMCMD) to manage Automatic Storage Management (ASM). Oracle database 10gR2 provides two new options to access and manage ASM files and related information via command line interface - asmcmd and ASM ftp. Note 332180.1 discusses asmcmd and provides sample Linux shell script to demonstrate the asmcmd in action.
    • Use the cluster deinstall tool to remove CRS install - if needed. The clusterdeconfig tool removes and deconfigures all of the software and shared files that are associated with an Oracle Clusterware or Oracle RAC Database installation. The clusterdeconfig tool removes the software and shared files from all of the nodes in a cluster. Reference: http://www.oracle.com/technology/products/database/clustering/index.html
    • Use diagcollection.pl for CRS diagnostic collections. Located in $ORA_CRS_HOME/bin as part of a default installation. Note 330358.1
    • On Windows and Linux Platforms, the Cluster Health Monitor can be used to track OS resource consumption and collect and analyze data cluster-wide. For more information, and to download the tool, refer to the following link on OTN:  http://www.oracle.com/technology/products/database/clustering/ipd_download_homepage.html

    11gR2 Specific Considerations

    RAC Platform Generic References

    CRS / RAC Related References

    RAC / RDBMS Related References

    VIP References

    • Note 298895.1 Modifying the default gateway address used by the Oracle 10g VIP
    • Note 338924.1 CLUVFY Fails With Error: Could not find a suitable set of interfaces for VIPs

    ASM References

    11.2 References

    • Note 1050693.1 Troubleshooting 11.2 Clusterware Node Evictions (Reboots)
    • Note 1053147.1 11gR2 Clusterware and Grid Home - What You Need to Know

    Infiniband References

    MAA / Standby References

    Oracle's Maximum Availability Architecture (MAA) provides superior data protection and availability by minimizing or eliminating planned and unplanned downtime at all technology stack layers including hardware or software components. Data protection and high availability are achieved regardless of the scope of a failure event - whether from hardware failures that cause data corruptions or from catastrophic acts of nature that impact a broad geographic area.

    MAA also eliminates guesswork and uncertainty when implementing a high availability architecture utilizing the full complement of Oracle HA technologies.   RAC is an integral component of the MAA Architecture, but is just one piece of the MAA strategy.    The following references will provide more background and refrences on the Oracle MAA Strategy:

    Patching References

    • Note 854428.1 Intro to Patch Set Updates (PSU)
    • Note 850471.1 Oracle Announces First Patch Set Update For Oracle Database Release 10.2
    • Note 756671.1 Oracle Recommended Patches -- Oracle Database
    • Note 567631.1 How to Check if a Patch requires Downtime?
    • Note 761111.1 Online Patches
    • Note 438314.1 Critical Patch Update - Introduction to Database n-Apply CPUs
    • Note 405820.1 10.2.0.X CRS Bundle Patch Information
    • Note 810663.1 11.1.0.X CRS Bundle Patch Information
    • Note 742060.1 Release Schedule of Current Database Patch Sets
    • Note 363254.1 Applying one-off Oracle Clusterware patches in a mixed version home environment
    • Note 550522.1 How To Avoid Disk Full Issues Because OPatch Backups Take Big Amount Of Disk Space.
    • Note 555579.1  10.2.0.4 Patch Set - Availability and Known Issues

      Upgrade References

    E-Business References

    • 11g E-business white papers: http://www.oracle.com/apps_benchmark/html/white-papers-e-business.html
    • Note 455398.1 Using Oracle 11g Release 1 Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 11i (11.1.0.7)
    • Note 388577.1 Using Oracle 10g Release 2 Real Application Clusters and Automatic Storage Management with Oracle E-Business Suite Release 12
    • Note 559518.1 Cloning Oracle E-Business Suite Release 12 RAC-Enabled Systems with Rapid Clone
    • Note 165195.1 Using AutoConfig to Manage System Configurations with Oracle Applications 11i
    • Note 294652.1 E-Business Suite 11i on RAC : Configuring Database Load balancing & Failover
    • Note 362135.1 Configuring Oracle Applications Release 11i with 10g R2 RAC and ASM
    • Note 362203.1 Oracle Applications Release 11i with Oracle 10g Release 2 (10.2.0)
    • Note 241370.1 Concurrent Manager Setup and Configuration Requirements in an 11i RAC Environment
    • Note 240818.1 Concurrent Processing: Transaction Manager Setup and Configuration Requirement in an 11i RAC Environment

    Unix References

    Weblogic/RAC References

    References Related to Working with Oracle Support

              My Oracle Support (formerly MetaLink) Knowledge Documents
    • Note 736737.1 My Oracle Support - The Next Generation Support Platform
    • Note 730283.1 Get the most out of My Oracle Support
    • Note 747242.1 My Oracle Support Configuration Management FAQ
    • Note 209768.1 Database, FMW, Em Grid Control, and OCS Software Error Correction Support Policy
    • Note 868955.1 My Oracle Support Health Checks Catalog
    Process Oriented and Self Service Notes
    Service Request Diagnostics


            Modification History
            [11-Aug-2009] created this Modification History section

            [21-Aug-2009] added ORA-12545 suggestion

            [16-Sep-2009] changed IPD/OS to new name:  Cluster Health Monitor

            [22-Sep-2009] added opatch patch number

            [29-Sep-2009]  clarified support of OATM in RAC environments

            [09-Oct-2009]  added odd # of voting disks recommendation and reference to Health Check catalog note

            [23-Oct-2009]  added reference to space considerations while patching and 11.1 CRS patch bundle reference

            [10-Nov-2009]  uploaded new version of RAC System Load Testing white paper

            [12-Nov-2009]  added 11gR2 specific section

            [24-Nov-2009]  added Infiniband References

            [20-Nov-2009]  added link to 11gR2 upgrade presentation and reference to 555579.1 and 454506.1

            [09-Dec-2009]  added 'REN' success factor

            [21-Dec-2009]  added reference to Rapid Oracle RAC Standby Deployment white paper, Golden Gate reference, created Oracle VM section, added optimizer reference to the 11gR2 section, added reference to PeopleSoft Enterprise PeopleTools Certifications

            [7-Jan-2010]  added some MAA/Standby reference links

            [19-Jan-2010] added reference to Note 1050693.1

            [27-Jan-2010] added reference to Note 1053147.1 11gR2 Clusterware and Grid Home - What You Need to Know

            [28-Jan-2010] modified diagwait best practice to include information on 11gR2

            [1-Feb-2010]  added reference to Note 949322.1 Oracle11g Data Guard: Database Rolling Upgrade Shell Script

            [3-Feb-2010]  added reference to Database Upgrade Using Transportable Tablespaces




          Show Attachments Attachments

          Blogged with the Flock Browser

          No hay comentarios:

          Publicar un comentario