CRS crash after upgrade to Oracle 12.2

vector illustration of Crazy man cartoon

Cai num bug “interessante” na última semana após a atualização para o Oracle 12.2

I have hit an interesting bug this week after upgrade to Oracle 12.2.

A nota Doc ID 2460394.1 tem os detalhes do bug mas basicamente sua base será reiniciada pelo CRS de tempos em tempos.

MOS note Bug 28298447: cluster crashed due to mellanox driver related issue (Doc ID 2460394.1) has the details but basically your cluster will bounce the DB from time to time.

Ou seja, se você tem um Exadata, atualize o kernel para a versão 4.1.12-94.8.5 antes de realizar o upgrade do banco para 12.2

So if you running an Exadata, upgrade the kernel to version 4.1.12-94.8.5 before upgrading db to 12.2.

Essa correção de kernel também está disponível nas versões 12.2.1.1.7 (ou superior) ou 18.1.5.0.0 (ou superior), ou seja, QFSDP abril/2018.

This kernel fix is also included on image version 12.2.1.1.7 (or higher) or 18.1.5.0.0 (or higher), or QFSDP from april/2018.

ODC Appreciation Day: Procwatcher

odc

The ODC Appreciation Day was proposed by Tim Hall and you can find more information about it here.

Procwatcher is a tool to examine and monitor Oracle database and/or clusterware processes at an interval. The tool will collect stack traces of these processes using Oracle tools like oradebug short_stack and/or OS debuggers like pstack, gdb, dbx, or ladebug and collect SQL data if specified.

You can download Procwatcher from MOS or run it from within the latest Trace File Analyzer.

Procwatcher is Ideal for:

  • Session level hangs or severe contention in the database/instance. See Note: 1352623.1
  • Severe performance issues. See Note: 1352623.1
  • Instance evictions and/or DRM timeouts.
  • Clusterware or DB processes stuck or consuming high CPU (must set EXAMINE_CLUSTER=true and run as root for clusterware processes)
  • ORA-4031 and SGA memory management issues. (Set sgamemwatch=diag or sgamemwatch=avoid4031 (not the default). See Note: 1355030.1
  • ORA-4030 and DB process memory issues. (Set USE_SQL=true and process_memory=y).
  • RMAN slowness/contention during a backup. (Set USE_SQL=true and rmanclient=y).

 

Procwatcher is Not Ideal for…

  • Node evictions/reboots. In order to troubleshoot these you would have to enable Procwatcher for a process(es) that are capable of rebooting the machine. If the OS debugger suspends the processs for too long *that* could cause a reboot of the machine. I would only use Procwatcher for a node eviction/reboot if the problem was reproducing on a test system and I didn’t care of the node got rebooted. Even in that case the INTERVAL would need to be set low (30) and many options would have to be turned off to get the cycle time low enough (EXAMINE_BG=false, USE_SQL=false, probably removing additional processes from the CLUSTERPROCS list).
  • Non-severe database performance issues. AWR/ADDM/statspack are better options for this…
  • Most installation or upgrade issues. We aren’t getting data for this unless we are at a stage of the installation/upgrade where key processes are already started.

 

For more details please check:

MOS note Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes [ID 459694.1]

And this blog from Tanel: https://blog.tanelpoder.com/2013/05/27/debugger-dangers-part-2/