3

ADG无法同步:TT00进程报错 Error 12514 - AlfredZhao

 1 year ago
source link: https://www.cnblogs.com/jyzhao/p/17072924.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

环境: Oracle 19.16 ADG (Single Instance -> RAC)
在配置ADG的场景,发现ADG不能同步。

1.查看报错信息

1)主库alert日志:

2023-01-29T11:18:41.489164+08:00
TT00 (PID:27840): Error 12514 received logging on to the standby
TT00 (PID:27840): Attempting LAD:2 network reconnect (12514)
TT00 (PID:27840): LAD:2 network reconnect abandoned
2023-01-29T11:18:41.489623+08:00
Errors in file /u01/app/oracle/diag/rdbms/demo/demo/trace/demo_tt00_27840.trc:
ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
TT00 (PID:27840): krsg_check_connection: Error 12514 connecting to standby 'demorac'

2)ADG配置链路的error信息:

SQL> select error from v$archive_dest where dest_id = 2;

ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor

错误号一致,均为12514。

2.oerr查看该错误说明

[oracle@bogon 2023_01_28]$ oerr ora 12514
12514, 00000, "TNS:listener does not currently know of service requested in connect descriptor"
// *Cause:  The listener received a request to establish a connection to a
// database or other service. The connect descriptor received by the listener
// specified a service name for a service (usually a database service)
// that either has not yet dynamically registered with the listener or has
// not been statically configured for the listener.  This may be a temporary
// condition such as after the listener has started, but before the database
// instance has registered with the listener.
// *Action:
//  - Wait a moment and try to connect a second time.
//  - Check which services are currently known by the listener by executing:
//    lsnrctl services <listener name>
//  - Check that the SERVICE_NAME parameter in the connect descriptor of the
//    net service name used specifies a service known by the listener.
//  - If an easy connect naming connect identifier was used, check that
//    the service name specified is a service known by the listener.
//  - Check for an event in the listener.log file.

3.尝试sqlplus连接到standby

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

分别连接两个节点:

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac1 as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL> 

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac2 as sysdba
...
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
descriptor

4.尝试relocate监听

看来是因为RAC的实例2连接有问题?
嗯,现在实例2还不存在,那直接把scan配置到节点1机器上试试吧:

[grid@db01rac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node db01rac2
[grid@db01rac1 ~]$ srvctl relocate scan_listener -scannumber 1 -node db01rac1
[grid@db01rac1 ~]$ srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is running on node db01rac1

结果依然报错,说明并不是relocate的问题。
其实在relocate之前,就发现scan在节点2时的状态有问题,没有监听到任何服务:

[grid@db01rac2 ~]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 18:54:31

Copyright (c) 1991, 2022, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                28-JAN-2023 18:53:51
Uptime                    0 days 0 hr. 0 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/db01rac2/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521)))
The listener supports no services
The command completed successfully

5.继续排查发现是参数问题

为什么LISTENER_SCAN1监听不到任何服务呢?尝试使用手工alter system register;也不行。
与scan有关的数据库配置就是那个remote参数了,难道设置有问题,oracle用户登陆到数据中查看确认:

SQL> show parameter remote_listener

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
remote_listener 		     string

为什么会是空呢?实际是因为该环境搭建ADG,主库是单实例,参数中没有设置remote_listener参数,备库是RAC,虽然配置了其他RAC独有参数,但漏掉了这个参数。
知道原因后,解决很简单,直接根据当前环境在备库配置正确的 remote_listener 参数值:

alter system set remote_listener='db01rac-scan:1521';

再次查看LISTENER_SCAN1监听状态,已经正常有服务注册了:

[grid@db01rac1 admin]$ lsnrctl status LISTENER_SCAN1

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 28-JAN-2023 19:26:13

Copyright (c) 1991, 2022, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_SCAN1
Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production
Start Date                28-JAN-2023 19:15:24
Uptime                    0 days 0 hr. 10 min. 49 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/19.3.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/db01rac1/listener_scan1/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.15)(PORT=1521)))
Services Summary...
Service "demorac" has 1 instance(s).
  Instance "jydb1", status READY, has 1 handler(s) for this service...
The command completed successfully

此时再次测试连接,已恢复正常,ADG链路的报错信息也已经随之解决。

[oracle@bogon 2023_01_28]$ sqlplus sys/oracle@demorac as sysdba
...
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.16.0.0.0

SQL>

6.总结和延伸

本次问题定位后发现很简单,只是因为备库的一个参数设置导致:

# primary 查看配置链路的错误信息:
SQL> select error from v$archive_dest where dest_id = 2;

ERROR
-----------------------------------------------------------------
ORA-12514: TNS:listener does not currently know of service
requested in connect descriptor

# standby 修正参数设置:
alter system set remote_listener='db01rac-scan:1521';

延伸:其实很多ADG同步问题,日志中都会抛出各类错误,除了按照错误排查之外,还有一些经验性的思路,可以在troubleshooting过程中选择尝试,下面给出示例,希望能对大家有所帮助。

1)ADG配置链路defer后再enable:

SQL> alter system set log_archive_dest_state_2=defer;
SQL> alter system set log_archive_dest_state_2=enable;

2)密码文件同步:

[oracle@bogon dbs]$ scp orapwdemo 192.168.1.11:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb1
[oracle@bogon dbs]$ scp orapwdemo 192.168.1.12:/u01/app/oracle/product/19.3.0/db_1/dbs/orapwjydb2

3)重启MRP进程:

RECOVER MANAGED STANDBY DATABASE CANCEL;
RECOVER MANAGED STANDBY DATABASE DISCONNECT;

注意:有时在备库open read only状态下无法启动MRP的情况,可以选择到mount状态下尝试,遇到过很多情况下都有效。

4)尝试重建SRLs:
遇到过个别情况,实时同步有问题是因为SRLs有问题,按照你的环境要求重建即可:

#recreate:drop and create SRLs:
SQL> recover managed standby database cancel;

alter database drop standby logfile group 14;
alter database drop standby logfile group 13;
alter database drop standby logfile group 12;
alter database drop standby logfile group 11;

#确认asm磁盘组中清理,注意千万核对清楚,别误删除其他的:
ASMCMD> rm standby_group_*
ASMCMD> pwd
+datadg/DEMORAC

#新建SRLs:
SQL> select * from v$standby_log;

alter database add standby logfile thread 1 group 11 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 12 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 13 '+DATADG' size 209715200;
alter database add standby logfile thread 1 group 14 '+DATADG' size 209715200;

SQL> recover managed standby database disconnect;

5)检查基本参数配置:

很多时候,环境有可能被人修改过,要检查基本参数,确认和你的环境要求匹配:

set linesize 500
col value for a70
col name for a50
 
select name, value
from v$parameter
where name in ('db_name','db_unique_name','log_archive_config', 'log_archive_dest_1','log_archive_dest_2',
               'log_archive_dest_state_1','log_archive_dest_state_2', 'remote_login_passwordfile',
               'log_archive_format','log_archive_max_processes','fal_server','db_file_name_convert',
                     'log_file_name_convert', 'standby_file_management');

6)了解主要进程:

遇到过给生产库配置ADG,但无论如何折腾备库也无法实现实时同步,最终发现是主库有异常,重启后就自动解决了。
但很多情况申请重启主库很难,其实可以先申请尝试重启相关进程,比如主库的arc,tt 等进程。
这些进程被杀掉通常不会导致主库crash,而是会重新启动新的进程。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK