How-To: Move and delete old Partitions made easy

Hello fellow DBAs,

it’s time to share another tidbit from my daily work.

Given, that we have a table in our schema, that has a range partiton by month, our application people wanted to have a procedure to move all partitions older than a month to an archive tablespace and all older than 90 Days to be deleted, I came up with this small bit of code to make their lives easier.

First, let’s take a look at how the table is created.

CREATE TABLE DB_USER.PART_TABLE_1 
   (	
   ID01 NUMBER(19,0) NOT NULL ENABLE, 
   DATA1 varchar2(4000), 
   DATA2 varchar2(4000), 
   CREATE_TIMESTAMP TIMESTAMP (6) NOT NULL ENABLE 
   ) TABLESPACE DATA 
   PARTITION BY RANGE (CREATE_TIMESTAMP) INTERVAL (NUMTOYMINTERVAL(1, 'MONTH')) 
   (
     PARTITION OLD_DATA VALUES LESS THAN (TIMESTAMP' 2021-03-01 00:00:00') SEGMENT CREATION IMMEDIATE TABLESPACE DATA
   ); 
ALTER TABLE DB_USER.PART_TABLE_1 ADD UNIQUE (ID01)
  TABLESPACE DATA  ENABLE;

As you can see, I partitioned, I created a unique Index which will give me an Index on the table.
Then we fill and fill and fill the Table with lots and lots of data, partitions will be created.

And now to the interesting part, the procedures.

First, let’s move all partitons that are older than one month to the Archive Tablespace.

create or replace procedure DB_USER.MOVE_PART_TABLE_1_PARTITIONS AS
begin
for i in
(
select table_name, partition_name,high_value_in_date_format from (
   select 
          table_name,
          partition_name,
          to_date (
          trim (
             '''' from regexp_substr (
                        extractvalue (
                          dbms_xmlgen.getxmltype (
                          'select high_value from user_tab_partitions where table_name='''
                                   || table_name
                                   || ''' and partition_name = '''
                                   || partition_name
                                   || ''''),
                                '//text()'),
                             '''.*?''')),
             'syyyy-mm-dd hh24:mi:ss')
             high_value_in_date_format
     FROM user_tab_partitions
    WHERE (table_name = 'PART_TABLE_1') AND TABLESPACE_NAME = 'DATA'
 )
WHERE high_value_in_date_format < trunc(sysdate, 'MONTH')
)
LOOP
    execute immediate 'alter table '||i.table_name||' move partition '||i.partition_name||' tablespace ARCHIVE online update indexes';
end loop;

end;
/

The ONLINE Keyword in the move command works only on Enterprise Editions, keep that in mind.
After we moved everything, we want to get rid of old data regularly, so let’s delete old partitions.

create or replace procedure DB_USER.DELETE_PART_TABLE_1_PARTITIONS AS
begin
for i in
(
select table_name, partition_name,high_value_in_date_format from (
   select 
          table_name,
          partition_name,
          to_date (
          trim (
             '''' from regexp_substr (
                        extractvalue (
                          dbms_xmlgen.getxmltype (
                          'select high_value from user_tab_partitions where table_name='''
                                   || table_name
                                   || ''' and partition_name = '''
                                   || partition_name
                                   || ''''),
                                '//text()'),
                             '''.*?''')),
             'syyyy-mm-dd hh24:mi:ss')
             high_value_in_date_format
     FROM user_tab_partitions
    WHERE (table_name = 'PART_TABLE_1')  AND TABLESPACE_NAME = 'ARCHIVE'
 )
WHERE high_value_in_date_format < trunc(sysdate-90, 'MONTH')
)
LOOP
    execute immediate 'alter table '||i.table_name||' drop partition '||i.partition_name||' online update indexes';
end loop;

end;
/

As with the former procedure, the online keywork only works in enterprise editions. And I have a little Failsave, that it only takes partitions in the ARCHIVE Tablespace into account.

By adjusting the Where clauses you can easily transform this to work with multiple tables, different high values, etc.

Now that you have these procedures, call them in a scheduler job, cronjob or whatever you desire.

I welcome your comments.

TDE Wallets – Best Practices and How-Tos

Hello fellow Oracle DBAs. In this Article I will give you a few established „Best Practices“ that have emerged in my work when using TDE (Transparent Data Encryption) Wallets. Also, I will give you a few commands for handling Wallets and Keys and a Guide for a little bit of Troubleshooting.

CAVE: On Premises TDE is a feature that needs to be licensed. You need to have "Advanced Security" licensed in order to use TDE. The license is included in Autonmous Databases and Database Cloud Services within the OCI (Oracle Cloud Infrastructure)

Versions that are covered by this Article: 18c, 19c

Best Practices

Naturally this list is not complete and probably never will be and some of these practices may not be for you. Nonetheless, let me know, if you have something that could be missing.

Use TDE Wallets only for TDE Masterkeys. No Credentials and no Certificates

Storing Credentials or Certificates in an TDE Wallet can lead to the database not being able to open the Wallet at startup and thus not being able to access TDE encrypted Data in the Database.

This is one possible reason for the error: „ORA-28365 Wallet not open“

Manage TDE Wallets only with SQL*Plus

When managing TDE Wallets and Keys use only the SQL*Plus „ADMINISTER KEY MANAGEMENT“ commands.

These command require a backup parameter whenever there is a change done to the wallet which then backs up the file.

Store Wallet in Filesystem

Store the Wallet in a Filesystem that is part of the Filesystembackup, thus we regularly have a backup. On RAC Systems this should be a shared ACFS Filesystem.

Don’t use ASM to store your TDE Wallets, or have a really good concept for backups whenver you make a change to the Wallet.

Store the Wallet Password securely in a Password Manager

Don’t lose the Password. Losing Access to the Wallet and Keys within can lead to not being able to access the Data

Use „wallet_root“ Parameter in Database

Don’t use the old sqlnet.ora Parameters to define the path to your Oracle TDE Wallet this is deprecated in future Database Versions.

Use the Database Parameter „wallet_root“, which overrides any old sqlnet.ora parameters.

How-Tos

Step by Step Instructions to Configure a Wallet in a Non-CDB or CDB Database

Create Wallet Directory in Non-Rac Environment and secure directory

[oracle@dbhost01:ORCL]/home/oracle: 
[oracle@dbhost01:ORCL]/home/oracle: mkdir /u01/tde_wallet
[oracle@dbhost01:ORCL]/home/oracle: cd /u01/tde_wallet
[oracle@dbhost01:ORCL]/u01/tde_wallet: ls -la .
total 0
drwxrwxr-x  2 oracle dba   6 Jan 11 14:14 .
drwxrwxr-x 13 oracle dba 140 Jan 11 14:14 ..
[oracle@dbhost01:ORCL]/u01/tde_wallet: chmod 750 .
[oracle@dbhost01:ORCL]/u01/tde_wallet: ls -la .
total 0
drwxr-x---  2 oracle dba   6 Jan 11 14:14 .
drwxrwxr-x 13 oracle dba 140 Jan 11 14:14 ..
[oracle@dbhost01:ORCL]/u01/tde_wallet:

Create shared ACFS Wallet Directory in RAC Environment and secure directory

Create ACFS Volume
[oracle@dbhost01:+ASM1]/u01/tde_wallet: asmcmd
ASMCMD> volcreate -G DATA -s 10G vol_wallets
ASMCMD> volinfo -G DATA vol_wallets
Diskgroup Name: DATA
 
         Volume Name: VOL_WALLETS
         Volume Device: /dev/asm/vol_wallets-377
         State: ENABLED
         Size (MB): 10240
         Resize Unit (MB): 64
         Redundancy: HIGH
         Stripe Columns: 8
         Stripe Width (K): 1024
         Usage:
         Mountpath:
ASMCMD> exit
[oracle@dbhost01:+ASM1]/u01/tde_wallet:
Create Filesystem on ACFS Volume
[oracle@dbhost01:+ASM1]/u01/tde_wallet: /sbin/mkfs -t acfs /dev/asm/vol_wallets-377
mkfs.acfs: version                   = 19.0.0.0.0
mkfs.acfs: on-disk version           = 46.0
mkfs.acfs: volume                    = /dev/asm/vol_wallets-377
mkfs.acfs: volume size               = 10737418240  (  10.00 GB )
mkfs.acfs: Format complete.
Register Filesystem and wait until it is mounted on both nodes
[root@dbhost01:+ASM1]/root:
[root@dbhost01:+ASM1]/root: /sbin/acfsutil registry -a /dev/asm/vol_wallets-377 /u01/tde_wallets
acfsutil registry: mount point /u01/tde_wallets successfully added to Oracle Registry
[root@dbhost01:+ASM1]/root: df -h /u01/tde_wallets
Filesystem                Size  Used Avail Use% Mounted on
/dev/asm/vol_wallets-377   10G  570M  9.5G   6% /u01/tde_wallets
[root@dbhost01:+ASM1]/root: chown oracle:dba /u01/tde_wallets
[root@dbhost01:+ASM1]/root: ls -la /u01/tde_wallets
total 100
drwxr-xr-x  4 oracle dba  32768 Jan 11 14:35 .
drwxr-xr-x 11 root   dba   4096 Jan 11 14:35 ..
drwx------  2 root   root 65536 Jan 11 14:35 lost+found
[root@dbhost01:+ASM1]/root:

Set Database Parameters for Wallet and restart Database

Set wallet_root Parameter
[oracle@dbhost01:ORCL]/u01/tde_wallet: sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jan 11 15:04:43 2023
Version 19.16.0.0.0
 
Copyright (c) 1982, 2022, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0
 
SQL> alter system set wallet_root='/u01/tde_wallets/ORCL' scope=spfile;
 
System altered.
 
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0
[oracle@dbhost01:ORCL]/u01/tde_wallet:
Restart Database and create Wallet directory
[oracle@dbhost01:ORCL]/u01/tde_wallet: srvctl stop database -db ORCL
[oracle@dbhost01:ORCL]/u01/tde_wallet: mkdir -p /u01/tde_wallets/ORCL/tde
[oracle@dbhost01:ORCL]/u01/tde_wallet: srvctl start database -db ORCL
Set tde_configuration parameter
[oracle@dbhost01:ORCL]/u01/tde_wallet: sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jan 11 15:22:46 2023
Version 19.16.0.0.0

Copyright (c) 1982, 2022, Oracle. All rights reserved.


Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> alter system set tde_configuration='KEYSTORE_CONFIGURATION=FILE' scope=spfile;

System altered.

SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0
Create Wallet and TDE Masterkey
  • Create Wallet
  • Open Wallet
  • Create and Set Masterkey
  • Create Autologin
  • Restart Database to test Autologin Wallet
[oracle@dbhost01:ORCL]/u01/tde_wallets/ORCL/tde: sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Thu Jan 12 08:24:53 2023
Version 19.17.0.0.0
 
Copyright (c) 1982, 2022, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.17.0.0.0
 
SQL> ADMINISTER KEY MANAGEMENT CREATE KEYSTORE '/u01/tde_wallets/ORCL/tde' IDENTIFIED BY "SuperSecretPassword";
 
keystore altered.
 
SQL> ADMINISTER KEY MANAGEMENT SET KEYSTORE OPEN IDENTIFIED BY "SuperSecretPassword";
 
keystore altered.
 
SQL> ADMINISTER KEY MANAGEMENT SET KEY IDENTIFIED BY "SuperSecretPassword" with backup;
 
keystore altered.
 
SQL> ADMINISTER KEY MANAGEMENT CREATE AUTO_LOGIN KEYSTORE FROM KEYSTORE '/u01/tde_wallets/ORCL/tde' IDENTIFIED BY "SuperSecretPassword";
 
keystore altered.
 
SQL> exit
Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.17.0.0.0
[oracle@dbhost01:ORCL]/u01/tde_wallets/ORCL/tde: srvctl stop database -db ORCL
[oracle@dbhost01:ORCL]/u01/tde_wallets/ORCL/tde: srvctl start database -db ORCL
[oracle@dbhost01:ORCL]/u01/tde_wallets/ORCL/tde: sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Thu Jan 12 08:28:03 2023
Version 19.17.0.0.0
 
Copyright (c) 1982, 2022, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.17.0.0.0
 
SQL> select * from gv$encryption_wallet;
 
   INST_ID WRL_TYPE
---------- --------------------
WRL_PARAMETER
--------------------------------------------------------------------------------
STATUS                         WALLET_TYPE          WALLET_OR KEYSTORE FULLY_BAC
------------------------------ -------------------- --------- -------- ---------
    CON_ID
----------
         2 FILE
/u01/tde_wallets/ORCL/tde/
OPEN                           AUTOLOGIN            SINGLE    NONE     NO
         1
 
 
         1 FILE
/u01/tde_wallets/ORCL/tde/
OPEN                           AUTOLOGIN            SINGLE    NONE     NO
         1
 
 
2 rows selected.
 
SQL> select inst_id, key_id, creation_time, CREATOR_DBNAME from gv$encryption_keys;
 
   INST_ID
----------
KEY_ID
------------------------------------------------------------------------------
CREATION_TIME
---------------------------------------------------------------------------
CREATOR_DBNAME
--------------------------------------------------------------------------------
         1
AUC/+0hT/E9pv4p+f5I7KQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
12-JAN-23 09.25.39.988752 AM +02:00
ORCL
 
         2
AUC/+0hT/E9pv4p+f5I7KQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
12-JAN-23 09.25.39.988752 AM +02:00
ORCL
 
 
SQL>
Open Wallet in Pluggable Database and Create/Set Masterkey

For this to work, you have to have the CDB Steps completed. Thus, a functioning Wallet already has to exist
Change into Pluggable Database and show Wallet Status

SQL> show pdbs
 
    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 ORCL_P01                 READ WRITE NO
SQL> alter session set container=ORCL_P01;
 
Session altered.
 
SQL> SELECT wrl_parameter, status, wallet_type FROM v$encryption_wallet;
 
WRL_PARAMETER
--------------------------------------------------------------------------------
STATUS                         WALLET_TYPE
------------------------------ --------------------
 
OPEN_NO_MASTER_KEY             AUTOLOGIN

If Wallet is Closed, open in CDB.

Create Masterkey in Pluggable Database and Check Key
[oracle@dbhost01:ORCL]/DBA/users/oracle: sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jan 18 14:24:44 2023
Version 19.17.0.0.0
 
Copyright (c) 1982, 2022, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.17.0.0.0
 
SQL> alter session set container=ORCL_P01;
 
Session altered.
 
SQL> ADMINISTER KEY MANAGEMENT SET KEY USING TAG 'ORCL_P01' FORCE KEYSTORE IDENTIFIED BY "SuperSecretPassword" WITH BACKUP;
 
keystore altered.
 
SQL> SELECT wrl_parameter, status, wallet_type FROM v$encryption_wallet;
 
WRL_PARAMETER
--------------------------------------------------------------------------------
STATUS                         WALLET_TYPE
------------------------------ --------------------
 
OPEN                           AUTOLOGIN
 
 
SQL> set line window
SQL> set pagesize 50
SQL> select key_id, con_id, CREATOR_PDBNAME from gv$encryption_keys order by con_id,key_id;
 
KEY_ID                                                                             CON_ID
------------------------------------------------------------------------------ ----------
CREATOR_PDBNAME
--------------------------------------------------------------------------------------------------------------------------------
AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                    3
ORCL_P01
 
AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                    3
ORCL_P01
 
SQL>

Various Commands for Wallet Handling and Troubleshooting

Possible Troubleshooting for „Wallet not open“ Error

Various Steps can now be taken to check for the most common reasons that lead this error. I give you here a small checklist and the necessary commands to check your wallet.

  • Check Wallet Parameters and Keys
  • Check if Wallet and Autologin Wallet are existing and have correct Permissions
  • Try to show contents of Wallet as OS User oracle

Check Wallet Parameters and Keys

Here’s an example output of a correct open autologin wallet with various keys

[oracle@dbhost01:ORCL]/DBA/users/oracle: sqlplus / as sysdba
SQL*Plus: Release 19.0.0.0.0 - Production on Wed Jan 18 16:16:22 2023
Version 19.17.0.0.0
 
Copyright (c) 1982, 2022, Oracle.  All rights reserved.
 
 
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.17.0.0.0
 
SQL> set line 400
SQL> set pagesize 50
SQL> col STATUS for a20
SQL> col WRL_PARAMETER for a100
SQL> col STATUS for a20
SQL> col WALLET_TYPE for a10
SQL> select INST_ID,WRL_PARAMETER,STATUS, WALLET_TYPE, con_id from gv$encryption_wallet;
 
   INST_ID WRL_PARAMETER                                                                                        STATUS               WALLET_TYP     CON_ID
---------- ---------------------------------------------------------------------------------------------------- -------------------- ---------- ----------
         1 /u01/tde_wallets/ORCL/tde/                                                                     OPEN                 AUTOLOGIN           1
         1                                                                                                      OPEN                 AUTOLOGIN           2
         1                                                                                                      OPEN                 AUTOLOGIN           3
         2 /u01/tde_wallets/ORCL/tde/                                                                     OPEN                 AUTOLOGIN           1
         2                                                                                                      OPEN                 AUTOLOGIN           2
         2                                                                                                      OPEN                 AUTOLOGIN           3
 
6 rows selected.
 
SQL> select key_id, con_id, CREATOR_PDBNAME from gv$encryption_keys order by con_id,key_id;
 
KEY_ID                                                                             CON_ID CREATOR_PDBNAME
------------------------------------------------------------------------------ ---------- --------------------------------------------------------------------------------------------------------------------------------
AUC/+0hT/E9pv4p+f5I7KQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                    1 CDB$ROOT
AUC/+0hT/E9pv4p+f5I7KQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                    1 CDB$ROOT
AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                    3 ORCL_P01
AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA                                    3 ORCL_P01
 
SQL>

Check if Wallet and Autologin Wallet are existing and have correct Permissions

Files ewallet.p12 for Wallet and also cwallet.sso for autologin Wallet should be present in the Wallet directory. Owner oracle, permissions rw for Oracle and rwx on the directory

[oracle@dbhost01:ORCL]/DBA/users/oracle: ls -la /u01/tde_wallets/ORCL/tde/
total 136
drwxrwxr-x 2 oracle dba 20480 Jan 18 14:25 .
drwxrwxr-x 3 oracle dba 20480 Jan 11 15:20 ..
-rw------- 1 oracle dba  5704 Jan 18 14:25 cwallet.sso
-rw------- 1 oracle dba  2555 Jan 12 08:25 ewallet_2023011207253993.p12
-rw------- 1 oracle dba  3995 Jan 18 14:25 ewallet_2023011813250035.p12
-rw------- 1 oracle dba  5659 Jan 18 14:25 ewallet.p12

The files ewallet_BunchOfNumbers.p12 are Backups created by „ADMINISTER KEYSTORE“ Commands. Better keep them also and don’t delete them.

Try to show contents of Wallet as OS User oracle

You can show the Wallet contents with the following command as OS User „oracle“.

If a valid Autologin Wallet is missing, you will be prompted for the Wallet password.

[oracle@dbhost01:ORCL]/DBA/users/oracle: cd /u01/tde_wallets/ORCL/tde/
[oracle@dbhost01:ORCL]/u01/tde_wallets/ORCL/tde: orapki wallet display -wallet .
Oracle PKI Tool Release 19.0.0.0.0 - Production
Version 19.4.0.0.0
Copyright (c) 2004, 2022, Oracle and/or its affiliates. All rights reserved.
 
Requested Certificates:
Subject:        CN=oracle
User Certificates:
Oracle Secret Store entries:
ORACLE.SECURITY.DB.ENCRYPTION.AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.DB.ENCRYPTION.AUC/+0hT/E9pv4p+f5I7KQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.DB.ENCRYPTION.MASTERKEY
ORACLE.SECURITY.DB.ENCRYPTION.MASTERKEY.E04A6D293EC456C4E0539618A10A9767
ORACLE.SECURITY.ID.ENCRYPTION.
ORACLE.SECURITY.KB.ENCRYPTION.
ORACLE.SECURITY.KM.ENCRYPTION.AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.KM.ENCRYPTION.AUC/+0hT/E9pv4p+f5I7KQcAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
ORACLE.SECURITY.KT.ENCRYPTION.AefCYCk3qk+Hv+MPk9F/dJgAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Trusted Certificates:
[oracle@dbhost01:ORCL]/u01/tde_wallets/ORCL/tde:

Be aware: The Wallet must only contain TDE Keys in the „Oracle Secret Store entries“.

If there are other types of contents in the Wallet, this could also lead to the Error „Wallet not open“, because then the Database will not be able to open the TDE Wallet.

Things that happen on christmas morning

Hello friends of the Oracle Database Administration and Troubleshooting,

here’s another story for you. Once upon a time on christmas day (that would be December the 25th) I got a call from work because I was the Oracle DBA On-Call that day. Yay.

Usually On-Call is pretty calm with no calls on most days, but this christmas we had an unfortunate coincidence, that the Unix Patch Days in our company fell right upon Christmas, so I got a nice little Christmas Present which was a call, because on one database server the Oracle Grid/ASM Software didn’t start correctly.

So, my Christmas Present was to investigate as to why this happened. And boy did I have fun, as this was a mixture of something quite unusual and later on a small human error which lead to a second error that came up after I found and corrected the first one.

So after checking that some processes like the HAS Daemon where running, I took a look into the Alert Log of the ASM Instance and I found some quite interesting Error Messages in between some status messages.

2022-12-25T06:38:21.239192+01:00
Errors in file /u01/oracle/diag/asm/+asm/+ASM/trace/+ASM_rbal_11371.trc:
ORA-15183: ASMLIB initialization error [driver/agent not installed]
2022-12-25T06:38:21.239280+01:00
WARNING: FAILED to load library: /opt/oracle/extapi/64/asm/orcl/1/libasm.so

SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs // {0:0:208} */
2022-12-25T06:38:21.326369+01:00
NOTE: Diskgroups listed in ASM_DISKGROUP are
ASMFR1

ASMDG1

2022-12-25T06:38:21.327622+01:00
NOTE: ASM instance +ASM is discoverable by local clients on node dbhost01
2022-12-25T06:38:21.336612+01:00
NOTE: cache registered group ASMDG1 1/0x41816392
NOTE: cache began mount (first) of group ASMDG1 1/0x41816392
NOTE: cache registered group ASMFR1 2/0x41816393
NOTE: cache began mount (first) of group ASMFR1 2/0x41816393
2022-12-25T06:38:21.371829+01:00
ERROR: no read quorum in group: required 2, found 0 disks
2022-12-25T06:38:21.372163+01:00
NOTE: cache dismounting (clean) group 1/0x41816392 (ASMDG1)
NOTE: messaging CKPT to quiesce pins Unix process pid: 11383, image: oracle@dbhost01 (TNS V1-V3)
NOTE: dbwr not being msg'd to dismount
NOTE: LGWR not being messaged to dismount
NOTE: cache dismounted group 1/0x41816392 (ASMDG1)
NOTE: cache ending mount (fail) of group ASMDG1 number=1 incarn=0x41816392
NOTE: cache deleting context for group ASMDG1 1/0x41816392
2022-12-25T06:38:21.408229+01:00
GMON dismounting group 1 at 2 for pid 24, osid 11383
2022-12-25T06:38:21.409859+01:00
ERROR: diskgroup ASMDG1 was not mounted

So, okay. Failed to load library. Let’s check if the library exists and how the permissions are set:

[oracle@dbhost01:+ASM]/home/oracle: ls -la /opt/oracle/extapi/64/asm/orcl/1/libasm.so
-rwxr-xr-x 1 root root 32992 Feb 25 2020 /opt/oracle/extapi/64/asm/orcl/1/libasm.so
[oracle@dbhost01:+ASM]/home/oracle: ldd /opt/oracle/extapi/64/asm/orcl/1/libasm.so
linux-vdso.so.1 (0x00007fff144c5000)
libc.so.6 => /lib64/libc.so.6 (0x00007f3a0c3fe000)
/lib64/ld-linux-x86-64.so.2 (0x00007f3a0c9cc000)
[oracle@dbhost01:+ASM]/home/oracle:

Well, this looks quite good. So, what else could it be?
I had no idea, so I asked the well known Senior DBA (okay…I typed the error into a well known search engine).
The results haven’t been good. Well, yes. I obviously checked file permissions already, but then there was a hint towards the oracleasm tool to show the actual configuration. So I did:

[oracle@dbhost01:+ASM]/home/oracle: /usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=
ORACLEASM_GID=
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"
ORACLEASM_SCAN_DIRECTORIES=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"
[oracle@dbhost01:+ASM]/home/oracle:

So, let’s compare this with another ASM installation.

[oracle@dbhost02:+ASM]/home/oracle: /usr/sbin/oracleasm configure
ORACLEASM_ENABLED=true
ORACLEASM_UID=5000
ORACLEASM_GID=300
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"
ORACLEASM_SCAN_DIRECTORIES=""
ORACLEASM_USE_LOGICAL_BLOCK_SIZE="false"

Well, there is a slight difference between these two. On the second server UID and GID have been set to the appropriate values for the oracle user and dba group.
So, by calling the command „/usr/sbin/oracleasm configure -i“ I was able to set the needed values.
ASM was now able to start and find all necessary devices. A quick reboot of the server to check wether this is reboot-proof showed the ASM looked good.

But now we had another problem, because the Database did not want to start. So, let’s take a look at what’s going on there.

The Alert Log showed the following errors:

Sys-V shared memory will be used for creating SGA


2022-12-25T06:57:35.084474+01:00
Error: ENOSPC while creating shared memory segment of size 23555211264 bytes
Check the system shared memory parameters
2022-12-25T06:57:35.084533+01:00
Error: ENOSPC while creating shared memory segment of size 23555211264 bytes
Check the system shared memory parameters
.
.
.
2022-12-25T06:57:35.084803+01:00
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
2022-12-25T06:57:35.084830+01:00
4K Configured 14 0 ORA-27125
2022-12-25T06:57:35.084874+01:00
2048K 11265 11265 5 ORA-27125
2022-12-25T06:57:35.084916+01:00

2022-12-25T06:57:35.084958+01:00
SGA: Realm creation failed

Okay. What is going on here? Aren’t these enough hugepages?

[oracle@dbhost01:ORCL]/u01/oracle/ORCL/logs: cat /proc/meminfo | grep Huge
AnonHugePages: 784384 kB
ShmemHugePages: 0 kB
FileHugePages: 0 kB
HugePages_Total: 11275
HugePages_Free: 31
HugePages_Rsvd: 21
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 23091200 kB

Well…something is using the databases hugepages.
Let’s check the parameters.
The Databases Alert Log says, it want’s about 20G.

Starting ORACLE instance (normal) (OS id: 17486)
2022-12-25T07:09:20.609947+01:00


Instance SGA_TARGET = 22528 MB and SGA_MAX_SIZE = 22528 MB


That’s fine. One Hugepage is 2M. So we should have enough if we look at the values above.
There is no other database running on the machine, so where have they gone?

Well…not database exactly, but we have an ASM Instance.
Let’s check the SGA Parameter there:

SQL> show parameter sga

NAME TYPE VALUE


sga_max_size big integer 22G
sga_target big integer 22G

Wow…22G for an ASM Instance, that’s a lot. Let’s scale that down to 2G and restart the ASM.

alter system set sga_max_size=2G scope=spfile;
alter system set sga_target=2G scope=spfile;
shutdown immediate;
startup;

After succesful restart of the ASM instance we have now been able to restart the Database.

But how did this happen?
Well, as often is the case, human error. The longer you are working in IT the more often things like these will happen.
It was planned to change the parameter in the Database, but the DBA changed into the ASM environment by accident without really registering what they did and changed the parameters there. After restarting the database and seeing that seemingly nothing changed they changed the parameters again, this time being in the correct environment and thus having succes after a database restart.

So, this was fine until the Server rebooted and thus the ASM Instance restarted with a bigger SGA.

I hope you like these kind of stories as they might give you hints how to investigate similar problems.

Things that happen on a friday afternoon.

This is a story of wrongfully removed ASMLIB devices, a second listener network that won’t network and crashed RAC Instances that perhaps could have been prevented (you tell me)

So, I try to retell this story as best as I can and try to get the order of what happened right. This is to show you how I investigage problems and try to solve them. Are all of my solutions perfect? Absolutely not. Could we have done something different? I’m quiet sure of it. Ideas are welcome so I may learn something.

How it began

I got a call from a colleague who told me: „Markus, I firetrucked up, I removed the wrong block devices and I already delete the headers.“
Well, it turned out the removed and cleared devices belonging to an ASM Diskgroup called REDO2 with EXTERNAL REDUNDANCY.

This was none of my primary customers, but I was on troubleshooting duty this afternoon and so I looked at it.
After connecting to the server I found a running ASM on both nodes and running database instances.

The ASM Alertlog showed me the following (and this quiet often for three disks):

2022-10-21T15:12:29.252796+02:00
Errors in file /oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_26929.trc:
ORA-15025: could not open disk "/dev/oracleasm/disks/ASMLIB_REDO_5"
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
2022-10-21T15:12:29.252893+02:00
WARNING: Read Failed. group:5 disk:4 AU:12 offset:0 size:4096
path:Unknown disk
         incarnation:0xf0f07718 synchronous result:'I/O error'
         subsys:Unknown library krq:0x7fce78793a20 bufp:0x7fce72e8e000 osderr1:0x434c5344 osderr2:0x0
         IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: cache failed reading from group=5(REDO2) fn=8 blk=0 count=1 from disk=4 (ASMLIB_REDO_5) mirror=0 kfkist=0x20 status=0x02 osderr=0x434c5344 file=kfc.c line=13461
ERROR: cache failed to read group=5(REDO2) fn=8 blk=0 from disk(s): 4(ASMLIB_REDO_5)
ORA-15080: synchronous I/O operation failed to read block 0 of disk 4 in disk group REDO2

So, the ASM was aware, that something strange is going on.
I saw that the Diskgroup REDO2 was kind of missing all of it’s disks.
Next, check the database instances which where still running

2022-10-21T15:15:40.789864+02:00
WARNING: Read Failed. group:5 disk:4 AU:364 offset:16384 size:16384
path:Unknown disk
         incarnation:0xf0f06811 synchronous result:'I/O error'
         subsys:Unknown library krq:0x7fc6dc3186d0 bufp:0x7fc6d6ce3000 osderr1:0x434c5344 osderr2:0x0
         IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 265 in group [5.574653800] from disk ASMLIB_REDO_5  allocation unit 364 reason error; if possible, will try another mirror side

So yeah, the databases also knew there was something going on, but have still been running.
So, what data was on this diskgroup? The Diskgroup Name lead me to believe, that it should have been Redologs.
But, guessing is not knowing, so I checked:

select * from ( select name from v$controlfile union select member from v$logfile union select name from v$datafile union select name from v$tempfile ) where name like '+REDO%'; +REDO1/ORCL/CONTROLFILE/current.265.970226155 +REDO1/ORCL/ONLINELOG/group_1.264.1118685065 +REDO1/ORCL/ONLINELOG/group_2.258.1118685067 +REDO1/ORCL/ONLINELOG/group_3.257.1118685127 +REDO1/ORCL/ONLINELOG/group_4.263.1118685069 +REDO1/ORCL/ONLINELOG/group_5.260.1118685123 +REDO1/ORCL/ONLINELOG/group_6.259.1118685137 +REDO1/ORCL/ONLINELOG/group_7.262.1118685071 +REDO1/ORCL/ONLINELOG/group_8.261.1118685071 +REDO2/ORCL/CONTROLFILE/current.256.1118683905 +REDO2/ORCL/ONLINELOG/group_1.257.1118685067 +REDO2/ORCL/ONLINELOG/group_2.258.1118685067 +REDO2/ORCL/ONLINELOG/group_3.263.1118685129 +REDO2/ORCL/ONLINELOG/group_4.259.1118685069 +REDO2/ORCL/ONLINELOG/group_5.262.1118685123 +REDO2/ORCL/ONLINELOG/group_6.264.1118685139 +REDO2/ORCL/ONLINELOG/group_7.260.1118685071 +REDO2/ORCL/ONLINELOG/group_8.261.1118685073

Okay, Redologs and Controlfiles. The same amount on both REDO1 and REDO2 Diskgroup. Let’s check that out.


                                TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_files                        string     +REDO1/ORCL/CONTROLFILE/cur
                                                 rent.265.970226155, +REDO2/ORC
                                                 L/CONTROLFILE/current.256.1118
                                                 683905

Great, the controlfile really is a mirror. We can work with that. Now check the Redologs

GROUP# THREAD# MEMBERS ---------- ---------- ---------- 1 1 2 2 1 2 3 1 2 4 1 2 5 2 2 6 2 2 7 2 2 8 2 2

Okay. Eight Groups, two threads and two members. So we are at 16 Redologs, 8 per Diskgroup.

Next step?
Get Diskgroup REDO2 running again.

I wasn’t able to see the diskgroup correctly in our internal tooling, so I tried to mount it, which lead to the following:

2022-10-21T16:06:59.292016+02:00 ERROR: diskgroup REDO2 was not mounted ORA-15032: not all alterations performed ORA-15017: diskgroup "REDO2" cannot be mounted ORA-15040: diskgroup is incomplete

Okay, that was to be expected.

Then dismount. Drop, whatever. I did so and the database instances all crashed at once.
Remember, at this point the diskgroup was not accessible for the databases, files there couldn’t be accessed.
But now the instances are crashing. Okay Boomer!

Next, recreate the diskgroup:

create diskgroup REDO2 external redundancy disk '/dev/oracleasm/disks/ASMLIB_REDO_4' , '/dev/oracleasm/disks/ASMLIB_REDO_5', '/dev/oracleasm/disks/ASMLIB_REDO_6' attribute 'COMPATIBLE.ASM' ='12.2.0.1.0', 'COMPATIBLE.RDBMS'='10.1.0.0.0', 'AU_SIZE'='4M'

The diskgroup was succesfully created.

Next step, we know, the databases will be missing a controlfile, so we need to restart in NOMOUNT state, which lead to the following messages:

Errors in file /oracle/diag/rdbms/orcl/ORCL1/trace/ORCL1_ora_20996.trc (incident=624164): ORA-00600: internal error code, arguments: [ipc_initialize_3], [1], [16], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/rdbms/orcl/ORCL1/incident/incdir_624164/ORCL1_ora_20996_i624164.trc

FIRETRUCKING FANTASTIC

So, a search for the error showed, that perhaps I used the wrong OS User to start the database.
Nope, didn’t.
Then I read something of a Network not being correct?
Okay, what is going on with our networks? Let’s check the clusterware

[oracle@node1:+ASM1]/DBA/nest/oracle/ORCL1/logs: crsctl stat res -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE node1 STABLE ONLINE ONLINE node2 STABLE ora.chad ONLINE ONLINE node1 STABLE ONLINE ONLINE node2 STABLE ora.net1.network ONLINE ONLINE node1 STABLE ONLINE ONLINE node2 STABLE ora.net2.network ONLINE OFFLINE node1 ONLINE OFFLINE node2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.scan1.vip 1 ONLINE ONLINE node2 STABLE ora.scan2.vip 1 ONLINE ONLINE node1 STABLE ora.scan3.vip 1 ONLINE ONLINE node1 STABLE ora.node1.vip 1 ONLINE ONLINE node1 STABLE ora.node1_2.vip 1 ONLINE OFFLINE node1 ora.node2.vip 1 ONLINE ONLINE node2 STABLE ora.node2_2.vip 1 ONLINE OFFLINE node2 --------------------------------------------------------------------------------

Okay, I cropped the output a bit and left the interesing things there for you to see.
So, we have a second Network going on in our clusterware and the VIPs from this network are offline.
Let’s try to start them up.

CRS-5006: Unable to automatically select a network interface which has subnet mask 255.254.0.0 and subnet number 101.18.0.0

Say what now?
Okay, what interfaces are currently running on my machine?
I see various interfaces, but the interesting one is this:


2: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether xx:xx:xx:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    inet 101.19.46.51/24 brd 101.19.46.255 scope global eth1
       valid_lft forever preferred_lft forever

You’Re asking why? Let me tell you.
Because this if we take a look at the network mentioned in our error message, this IP Adress falls into the range.
Just to let you know, the network with subnet number 101.18.0.0 and subnet mask 255.254.0.0 has the following range:

Host-IPs from: 101.18.0.1 to: 101.19.255.254

Okay…was there some kind of reconfiguration in the past? I don’t know, but let’s corect this one.
So, per documentation, the VIPs should have the following IP Adresses: 101.19.23.115 and 101.19.23.116

Let’s kick them out and create them anew

srvctl remove vip -vip node1_2
srvctl remove vip -vip node2_2
srvctl add vip -node node1 -address 101.19.23.115/255.255.255.0 -netnum 2
srvctl add vip -node node2 -address 101.19.23.116/255.255.255.0 -netnum 2

How is the current configuration in the clusterware?

srvctl config vip -vip node2_2
VIP exists: network number 2, hosting node node2
VIP IPv4 Address: 101.19.23.116
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:
srvctl config vip -vip node1_2
VIP exists: network number 2, hosting node node1
VIP IPv4 Address: 101.19.23.115
VIP IPv6 Address:
VIP is enabled.
VIP is individually enabled on nodes:
VIP is individually disabled on nodes:

Okay. Now it’s time to try and start one instance.

srvctl start instance -db ORCL -instance ORCL1 -startoption nomount

That worked and next let’s start rman and restore the REDO2 controlfile from REDO1

rman target /
restore controlfile from '+REDO1/ORCL/CONTROLFILE/current.265.970226155';
alter database mount;
alter database open;
exit

The last thing to do is clear the redo logs, just for funs sake.

Startup sqlplus and hit it:

ALTER DATABASE CLEAR LOGFILE GROUP 1;
ALTER DATABASE CLEAR LOGFILE GROUP 2;
ALTER DATABASE CLEAR LOGFILE GROUP 3;
ALTER DATABASE CLEAR LOGFILE GROUP 4;
ALTER DATABASE CLEAR LOGFILE GROUP 5;
ALTER DATABASE CLEAR LOGFILE GROUP 6;
ALTER DATABASE CLEAR LOGFILE GROUP 7;
ALTER DATABASE CLEAR LOGFILE GROUP 8;

If a redo log is still in use, set a checkpoint, switch the logfile an try again

alter system checkpoint;
alter system switch logfile;

And now you are back online from a wonderful journey through multiple errors.

Could I have done something better? Let me know.

Shrink your Tablespace

Hello fellow DBAs. As you all are probably aware, sometimes we need to shrink Tablespaces and we don’t have a fancy graphical doodad at hand all the time, so we need to get this done with our trusted command line SQL*Plus and doing so without using trial and error to find the maximum possible size, because there ususally is data somewhere in theses files.

If you try to make it too small, you’ll get the following fancy error:

ORA-03297: file contains used data beyond requested RESIZE value

To get around this, I found a script written by Franck Pachot who worked for DBI Services back in the day. (Click here for his original script)

Though I liked this script, I had two, let’s call it personal issues with it. First, where was the problem, that sometimes I got the ORA-03297 despite using the script output on databases which had a current workload and continously changes going on. Second, if autoextend is turned off for a datafile it would only give you a comment, not the resize command. Third, sometimes if you were able to reclaim really really much from a file it would give you only „#####“ instead of the actual number (this is only a beauty issue). So, here is Francks script with the few amendments for my daily work.

set linesize 1000 pagesize 0 feedback off trimspool on
with
 hwm as (
  -- get highest block id from each datafiles ( from x$ktfbue as we don't need all joins from dba_extents )
  select /*+ materialize */ ktfbuesegtsn ts#,ktfbuefno relative_fno,max(ktfbuebno+ktfbueblks-1) hwm_blocks
  from sys.x$ktfbue group by ktfbuefno,ktfbuesegtsn
 ),
 hwmts as (
  -- join ts# with tablespace_name
  select name tablespace_name,relative_fno,hwm_blocks
  from hwm join v$tablespace using(ts#)
 ),
 hwmdf as (
  -- join with datafiles, put 5M minimum for datafiles with no extents
  select file_name,nvl(hwm_blocks*(bytes/blocks),5*1024*1024) hwm_bytes,bytes,autoextensible,maxbytes
  from hwmts right join dba_data_files using(tablespace_name,relative_fno)
 )
select
  '/* reclaim '||to_char(ceil((bytes-hwm_bytes-102400)/1024/1024),999999999)
   ||'M from '||to_char(ceil(bytes/1024/1024),999999999)||'M */ '
   ||'alter database datafile '''||file_name||''' resize '||(ceil(hwm_bytes/1024/1024)+100)||'M;'
from hwmdf
where
 bytes-hwm_bytes-102400>1024*1024 -- resize only if at least 1MB can be reclaimed
order by bytes-hwm_bytes desc
/