77

又遇ORA-04021: timeout occurred while waiting to lock object ,这次是DBRM进程

 5 years ago
source link: http://www.dboracle.com/archivers/又遇ora-04021-timeout-occurred-while-waiting-to-lock-object-,这次是dbrm进程.html?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

版权声明:本文为Buddy Yuan原创文章,未经允许不得转载。原文地址: 又遇ORA-04021: timeout occurred while waiting to lock object,这次是DBRM进程

第二次在ADG上遇到这个错误了,上一次遇到这个错误的传送门,但是这一次的错误并不是LGWR进程引起的,这一次是另外一套11.2.0.4的DG备库,是DBRM进程导致的实例宕机。我们先来看下Trace。

Errors in file /oracle/app/product/diag/rdbms/dghbyydba/hbyydba2/trace/hbyydba2_dbrm_12256282.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04021: timeout occurred while waiting to lock object 
DBRM (ospid: 12256282): terminating the instance due to error 604
Wed Oct 10 01:20:08 2018
System state dump requested by (instance=2, osid=12256282 (DBRM)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/product/diag/rdbms/dghbyydba/hbyydba2/trace/hbyydba2_diag_11470468_20181010012008.trc
Dumping diagnostic data in directory=[cdmp_20181010012008], requested by (instance=2, osid=12256282 (DBRM)), summary=[abnormal instance termination].
Instance terminated by DBRM, pid = 12256282

Alert的trace报错很清楚,是DBRM进程最后终止了实例,我们在看一下DBRM进程的Trace。

*** 2018-10-10 00:35:41.489
PQQ: Active Services changed
PQQ: Old service table
SvcIdx  SvcId Active ActDop
PQQ: New service table
SvcIdx  SvcId Active ActDop
     1      1      1      0
     2      2      1      0
error 604 detected in background process

*** 2018-10-10 01:20:08.823
ORA-00604: error occurred at recursive SQL level 1
ORA-04021: timeout occurred while waiting to lock object
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+240<-kjzdssdmp()+240<-kjzduptcctx()+228<-kjzdicrshnfy()+120<-ksuitm()+1532<-ksbrdp()+4696<-opirip()+1620<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+276<-main()+20
4<-__start()+112
----- End of Abridged Call Stack Trace -----

*** 2018-10-10 01:20:08.824
DBRM (ospid: 12256282): terminating the instance due to error 604
ksuitm: waiting up to [5] seconds before killing DIAG(11470468)

这里有堆栈信息,根据上述错误,我们在MOS中很快就定位了这个问题。根据文档: Bug 18101696 Database Resource Manager Crashes Standby Instance with ORA-604 / ORA-1489 After Turning Apply On。 文档中描述下列信息。我们这里报了ORA-00604的递归错误,同时我们的堆栈也包含了” kjzdicrshnfy<-ksuitm<-ksbrdp “这三个。

Problem Description:
-------------------------
Database Resource Manager (DBRM) terminating with:
 
      ORA-604: error occurred at recursive SQL level 1
      ORA-1489: result of string concatenation is too long
Instance terrmination is likely to follow. The call stack would include:

kjzdicrshnfy<-ksuitm<-ksbrdp

但是疑问就是我们这里没有报ORA-1489,而是报了ORA-04021: timeout occurred while waiting to lock object,其实这个我们在上一篇文章介绍过,当通过ADG中的恢复,LGWR将DB INSTANCE状态对象锁定为独占模式。这样的结果是LGWR可以阻止SQL的解析,而SQL的解析也能阻止LGWR。而这里不是LGWR进程而是DBRM进程,这说明了另外一点,DBRM进程也会将一些对象锁定,从而导致超时问题。针对这个超时问题,还是建议将隐含参数”_adg_parselock_timeout”设置成500。

因此,我认为这个问题的终极solution就是:

1、设置参数”_adg_parselock_timeout”,防止lock object而导致的超时。

2、安装补丁25188350,阻止DBRM进程运行调用SYS.DBMS_RMIN”出现的递归错误引起的ORA-604。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK