又遇ORA-04021: timeout occurred while waiting to lock object ,这次是DBRM进程

第二次在ADG上遇到这个错误了,上一次遇到这个错误的传送门，但是这一次的错误并不是LGWR进程引起的，这一次是另外一套11.2.0.4的DG备库，是DBRM进程导致的实例宕机。我们先来看下Trace。

Errors in file /oracle/app/product/diag/rdbms/dghbyydba/hbyydba2/trace/hbyydba2_dbrm_12256282.trc:
ORA-00604: error occurred at recursive SQL level 1
ORA-04021: timeout occurred while waiting to lock object 
DBRM (ospid: 12256282): terminating the instance due to error 604
Wed Oct 10 01:20:08 2018
System state dump requested by (instance=2, osid=12256282 (DBRM)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/app/product/diag/rdbms/dghbyydba/hbyydba2/trace/hbyydba2_diag_11470468_20181010012008.trc
Dumping diagnostic data in directory=[cdmp_20181010012008], requested by (instance=2, osid=12256282 (DBRM)), summary=[abnormal instance termination].
Instance terminated by DBRM, pid = 12256282

Alert的trace报错很清楚，是DBRM进程最后终止了实例，我们在看一下DBRM进程的Trace。

*** 2018-10-10 00:35:41.489
PQQ: Active Services changed
PQQ: Old service table
SvcIdx  SvcId Active ActDop
PQQ: New service table
SvcIdx  SvcId Active ActDop
     1      1      1      0
     2      2      1      0
error 604 detected in background process

*** 2018-10-10 01:20:08.823
ORA-00604: error occurred at recursive SQL level 1
ORA-04021: timeout occurred while waiting to lock object
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+240<-kjzdssdmp()+240<-kjzduptcctx()+228<-kjzdicrshnfy()+120<-ksuitm()+1532<-ksbrdp()+4696<-opirip()+1620<-opidrv()+608<-sou2o()+136<-opimai_real()+188<-ssthrdmain()+276<-main()+20
4<-__start()+112
----- End of Abridged Call Stack Trace -----

*** 2018-10-10 01:20:08.824
DBRM (ospid: 12256282): terminating the instance due to error 604
ksuitm: waiting up to [5] seconds before killing DIAG(11470468)

这里有堆栈信息，根据上述错误，我们在MOS中很快就定位了这个问题。根据文档： Bug 18101696 Database Resource Manager Crashes Standby Instance with ORA-604 / ORA-1489 After Turning Apply On。文档中描述下列信息。我们这里报了ORA-00604的递归错误，同时我们的堆栈也包含了” kjzdicrshnfy<-ksuitm<-ksbrdp “这三个。

Problem Description:
-------------------------
Database Resource Manager (DBRM) terminating with:
 
      ORA-604: error occurred at recursive SQL level 1
      ORA-1489: result of string concatenation is too long
Instance terrmination is likely to follow. The call stack would include:

kjzdicrshnfy<-ksuitm<-ksbrdp

但是疑问就是我们这里没有报ORA-1489，而是报了ORA-04021: timeout occurred while waiting to lock object，其实这个我们在上一篇文章介绍过，当通过ADG中的恢复，LGWR将DB INSTANCE状态对象锁定为独占模式。这样的结果是LGWR可以阻止SQL的解析，而SQL的解析也能阻止LGWR。而这里不是LGWR进程而是DBRM进程，这说明了另外一点，DBRM进程也会将一些对象锁定，从而导致超时问题。针对这个超时问题，还是建议将隐含参数”_adg_parselock_timeout”设置成500。

因此，我认为这个问题的终极solution就是：

1、设置参数”_adg_parselock_timeout”，防止lock object而导致的超时。

2、安装补丁25188350，阻止DBRM进程运行调用SYS.DBMS_RMIN”出现的递归错误引起的ORA-604。

Recommend

TypeScript 3.1增加可映射元组和数组类型

持续交付会如何影响测试

第一行代码：以太坊（2）-使用Solidity语言开发和测试智能合约

机器人操作系统来到Windows

记一次安全应急响应中遇到的利用SSH日志触发的后门分析

Go语言打造分布式Crontab 轻松搞定高性能任务调度高清无密

大数据推荐系统实时架构和离线架构

出于性能考虑，请别使用pip安装Tensorflow！

苹果和换机党们的「灰色产业」战争，从未停止过

GitHub - magit/libegit2: Emacs bindings for libgit2

About Joyk