73

ORA-04021: timeout occurred while waiting to lock object的解决办法

 5 years ago
source link: http://www.dboracle.com/archivers/ora-04021-timeout-occurred-while-waiting-to-lock-object的解决办法.html?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址: http://www.dboracle.com/archivers/ORA-04021: TIMEOUT OCCURRED WHILE WAITING TO LOCK OBJECT的解决办法.html

国庆都放假了,咱们还处理了一起小故障。先说说这个故障吧,Acticve DataGuard突然实例宕机。通过查看Alert的Trace发现是LGWR进程把实例宕掉了。具体alert trace如下:

Mon Oct 01 00:15:49 2018
Media Recovery Waiting for thread 2 sequence 309643 (in transit)
Recovery of Online Redo Log: Thread 2 Group 64 Seq 309643 Reading mem 0
  Mem# 0: +DG_DATA/dgskgj/onlinelog/group_64.807.942196417
  Mem# 1: +DG_DATA/dgskgj/onlinelog/group_64.717.936131839
Mon Oct 01 00:15:54 2018
Archived Log entry 186436 added for thread 2 sequence 309642 ID 0x2f689337 dest 1:
Mon Oct 01 00:16:17 2018
Errors in file /oracle/app/product/diag/rdbms/dgskgj/skgj1/trace/skgj_lgwr_14418510.trc:
ORA-04021: timeout occurred while waiting to lock object 
LGWR (ospid: 14418510): terminating the instance due to error 4021
Mon Oct 01 00:16:18 2018
System state dump requested by (instance=1, osid=14418510 (LGWR)), summary=[abnormal instance termination].

可以看到这里首先出现了ORA-04021: timeout occurred while waiting to lock object,紧接着LGWR就terminating实例。所以我们在这里要先看一下LGWR的Trace情况。

Trace file /oracle/app/product/diag/rdbms/dgskgj/skgj1/trace/skgj1_lgwr_14418510.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/product/11.2/db
System name:    AIX
Node name:      drskgj1
Release:        1
Version:        6
Machine:        00F96EC64C00
Instance name: skgj1
Redo thread mounted by this instance: 1
Oracle process number: 23
Unix process pid: 14418510, image: oracle@drskgj1 (LGWR)


*** 2018-10-01 00:16:17.890
*** SESSION ID:(4348.1) 2018-10-01 00:16:17.890
*** CLIENT ID:() 2018-10-01 00:16:17.890
*** SERVICE NAME:(SYS$BACKGROUND) 2018-10-01 00:16:17.890
*** MODULE NAME:() 2018-10-01 00:16:17.890
*** ACTION NAME:() 2018-10-01 00:16:17.890

*** TRACE FILE RECREATED AFTER BEING REMOVED ***

error 4021 detected in background process
ORA-04021: timeout occurred while waiting to lock object 
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+240<-kjzdssdmp()+240<-kjzduptcctx()+228<-kjzdicrshnfy()+120<-ksuitm()+5136<-ksbrdp()+4696<-opirip()+1620<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+276<-main()+204<-__start()+112 
----- End of Abridged Call Stack Trace -----

*** 2018-10-01 00:16:17.896
LGWR (ospid: 14418510): terminating the instance due to error 4021
ksuitm: waiting up to [5] seconds before killing DIAG(15991690)

这里可以看到LGWR给的信息也很有限,就是给了一串堆栈的信息,也没有什么特别的其他信息。所以这种情况下,我们就只能先借助MOS查一下ORA-04021: timeout occurred while waiting to lock object是什么问题。通过搜索我们发现文档ORA-04021: timeout occurred while waiting to lock object : DR Instance terminated by LGWR (文档 ID 2183882.1)和我们遇到的问题是一致的。首先怎么判断问题是一致的呢?第一,alter日志报错的方式相同,第二,lgwr的trace堆栈是一样的。那么这篇告诉我们,我们命中了Bug 16717701 – ADG SHOULD GET THE INSTANCE PARSE LOCK WITH A TIMEOUT或者是Bug 11712267 – ACTIVE DATA GUARD DATABASE HUNG ON ‘LIBRARY CACHE: MUTEX X’ WAIT EVENT。

该问题的原因是当通过ADG中的恢复,LGWR将DB INSTANCE状态对象锁定为独占模式。这样的结果是LGWR可以阻止SQL的解析,而SQL的解析也能阻止LGWR。这是非常糟糕的行为。我们可以通过查看下列视图查询这个行为。

SQL> select a.*,b.name from v$sesstat a , v$statname b
  2  where a.statistic#=b.statistic# 
  3  and a.sid=(select distinct sid from v$mystat)
  4  and b.name like '%parse%';
       SID STATISTIC#	   VALUE NAME
---------- ---------- ---------- ------------------------------
       117	  264	       0 ADG parselock X get attempts
       117	  265	       0 ADG parselock X get successes

所以防止这个问题的办法就是
1.先尝试使用选项步骤1,将cursor_sharing更改为force。减少SQL解析的时间。
2.如果再次发生该问题,将隐含参数”_adg_parselock_timeout”设置成500。这个隐含参数是可以动态修改的。这个参数是防止超时的。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK