12

处理smon清理临时段导致数据库异常案例

 3 years ago
source link: https://blogread.cn/it/article/5651?f=hot1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

处理smon清理临时段导致数据库异常案例

浏览:2389次  出处信息

    一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现.让我帮忙处理

     alert日志报ORA-00600[kafspa:columnBuffer1]

Wed Aug  8 10:55:31 2012
Completed: ALTER DATABASE OPEN
Wed Aug  8 10:55:41 2012
Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_12160.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:55:47 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:55:47 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:55:58 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:55:59 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:10 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 4 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:11 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:22 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 5 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:32 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:56:43 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:56:53 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:04 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:57:14 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:25 2012
Non-fatal internal error happenned while SMON was doing temporary segment drop.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Wed Aug  8 10:57:35 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug  8 10:57:38 2012
Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856.trc:
ORA-00474: SMON process terminated with error
Wed Aug  8 10:57:38 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 8856

    这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常

Non-fatal internal error happenned while SMON was doing temporary segment drop.

    出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系

SMON encountered 8 out of maximum 100 non-fatal internal errors.

    这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉

SQL> col name for a32
SQL> col value for a24
SQL> col description for a70
SQL> set linesize 150
SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description
2    from x$ksppi a,x$ksppcv b
3   where a.inst_id = USERENV (\'Instance\')
4     and b.inst_id = USERENV (\'Instance\')
5     and a.indx = b.indx
6     and upper(a.ksppinm) LIKE upper(\'%¶m%\')
order by name
8
SQL> /
Enter value for param: smon_internal_errlimit
old   6:    and upper(a.ksppinm) LIKE upper(\'%¶m%\')
new   6:    and upper(a.ksppinm) LIKE upper(\'%smon_internal_errlimit%\')
NAME                             VALUE                    DESCRIPTION
-------------------------------- ------------------------ -----------------------------------
_smon_internal_errlimit          100                      limit of SMON internal errors

    分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/ora10/product
System name:    Linux
Node name:  DBN-HLD-155-2
Release:    2.6.18-92.el5PAE
Version:    #1 SMP Tue Apr 29 13:31:02 EDT 2008
Machine:    i686
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON)
*** SERVICE NAME:() 2012-08-08 10:55:20.208
*** SESSION ID:(274.1) 2012-08-08 10:55:20.208
*** 2012-08-08 10:55:20.208
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], []
Current SQL statement for this session:
select file#, block#, ts# from seg$ where type# = 3

    果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts# from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭.

     解决问题

--修改两个参数
event=\'10061 trace name context forever, level 10\'
_smon_internal_errlimit=1000000
--启动数据库
SQL> startup pfile=\'/tmp/pfile\'
ORACLE instance started.
Total System Global Area 1577058304 bytes
Fixed Size                  1267716 bytes
Variable Size             385878012 bytes
Database Buffers         1174405120 bytes
Redo Buffers               15507456 bytes
Database mounted.
Database opened.

    因为屏蔽了smon回收临时段,数据库未出现开始时错误,观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库中.

     导致该异常sql分析

SQL> select file#, block#, ts# from seg$ where type# = 3;
select file#, block#, ts# from seg$ where type# = 3
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> set autot trace exp
SQL> select file#, block#, ts# from seg$ where type# = 3;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    10 |   150 |   389   (1)| 00:00:05 |
|*  1 |  TABLE ACCESS FULL| SEG$ |    10 |   150 |   389   (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("TYPE#"=3)
SQL> select count(rowid) from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    12 |   389   (1)| 00:00:05 |
|   1 |  SORT AGGREGATE    |      |     1 |    12 |            |          |
|   2 |   TABLE ACCESS FULL| SEG$ | 45682 |   535K|   389   (1)| 00:00:05 |
---------------------------------------------------------------------------
SQL> select /*+ full(t) */ count(*) from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
-------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |    29   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |            |          |
|   2 |   TABLE ACCESS FULL| SEG$ |  3223 |    29   (0)| 00:00:01 |
-------------------------------------------------------------------
SQL> select ts# from seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  3223 |  9669 |    29   (0)| 00:00:01 |
|   1 |  TABLE ACCESS FULL| SEG$ |  3223 |  9669 |    29   (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL> set autot off
SQL> select count(rowid) from seg$;
SQL> /
COUNT(ROWID)
------------
45727
SQL> select /*+ full(t) */ count(*) from seg$;
COUNT(*)
----------
45727
SQL> select  ts# from seg$;
0
0
……
7
7
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> !dbv file=\'/oracle/ora10/oradata/ora10g/system01.dbf\'
DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 9 14:05:09 2012
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf
Block Checking: DBA = 4225673, Block Type = KTB-managed data block
data header at 0xb6d8225c
kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72
Page 31369 failed with check code 6135
Block Checking: DBA = 4236289, Block Type = KTB-managed data block
data header at 0xb6d7225c
kdbchk: tosp bad (-13399)
Page 41985 failed with check code 6127
DBVERIFY - Verification complete
Total Pages Examined         : 192000
Total Pages Processed (Data) : 47588
Total Pages Failing   (Data) : 2
Total Pages Processed (Index): 40929
Total Pages Failing   (Index): 0
Total Pages Processed (Other): 1784
Total Pages Processed (Seg)  : 0
Total Pages Failing   (Seg)  : 0
Total Pages Empty            : 101699
Total Pages Marked Corrupt   : 0
Total Pages Influx           : 0
Highest block SCN            : 1775671440 (0.1775671440)
SQL> ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE;
ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE
*
ERROR at line 1:
ORA-01498: block check failure - see trace file

    通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析

建议继续学习:

  1. SMON: recover undo segment 与 事务恢复    (阅读:1657)
QQ技术交流群:445447336,欢迎加入!
扫一扫订阅我的微信号:IT技术博客大学习

Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK