12

处理smon清理临时段导致数据库异常案例

3 years ago

source link: https://blogread.cn/it/article/5651?f=hot1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

neoserver,ios ssh client

处理smon清理临时段导致数据库异常案例

浏览:2389次出处信息

一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现.让我帮忙处理

alert日志报ORA-00600[kafspa:columnBuffer1]

Wed Aug 8 10:55:31 2012

Completed: ALTER DATABASE OPEN

Wed Aug 8 10:55:41 2012

Errors in file /oracle/ora10/admin/ora10g/udump/ora10g_ora_12160.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:55:47 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 2 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:55:47 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:55:58 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 3 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:55:59 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:56:10 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 4 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:56:11 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:56:22 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 5 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:56:32 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:56:43 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 6 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:56:53 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:57:04 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 7 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:57:14 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:57:25 2012

Non-fatal internal error happenned while SMON was doing temporary segment drop.

SMON encountered 8 out of maximum 100 non-fatal internal errors.

Wed Aug 8 10:57:35 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941.trc:

ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []

Wed Aug 8 10:57:38 2012

Errors in file /oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856.trc:

ORA-00474: SMON process terminated with error

Wed Aug 8 10:57:38 2012

PMON: terminating instance due to error 474

Instance terminated by PMON, pid = 8856

这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常

Non-fatal internal error happenned while SMON was doing temporary segment drop.

出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系

SMON encountered 8 out of maximum 100 non-fatal internal errors.

这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉

SQL> col name for a32

SQL> col value for a24

SQL> col description for a70

SQL> set linesize 150

SQL> select a.ksppinm name,b.ksppstvl value,a.ksppdesc description

2 from x$ksppi a,x$ksppcv b

3 where a.inst_id = USERENV (\'Instance\')

4 and b.inst_id = USERENV (\'Instance\')

5 and a.indx = b.indx

6 and upper(a.ksppinm) LIKE upper(\'%¶m%\')

7 order by name

8

SQL> /

Enter value for param: smon_internal_errlimit

old 6: and upper(a.ksppinm) LIKE upper(\'%¶m%\')

new 6: and upper(a.ksppinm) LIKE upper(\'%smon_internal_errlimit%\')

NAME VALUE DESCRIPTION

-------------------------------- ------------------------ -----------------------------------

_smon_internal_errlimit 100 limit of SMON internal errors

分析trace文件

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

ORACLE_HOME = /oracle/ora10/product

System name: Linux

Node name: DBN-HLD-155-2

Release: 2.6.18-92.el5PAE

Version: #1 SMP Tue Apr 29 13:31:02 EDT 2008

Machine: i686

Instance name: ora10g

Redo thread mounted by this instance: 1

Oracle process number: 8

Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON)

*** SERVICE NAME:() 2012-08-08 10:55:20.208

*** SESSION ID:(274.1) 2012-08-08 10:55:20.208

*** 2012-08-08 10:55:20.208

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], []

Current SQL statement for this session:

select file#, block#, ts# from seg$ where type# = 3

果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts# from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭.

解决问题

--修改两个参数

event=\'10061 trace name context forever, level 10\'

_smon_internal_errlimit=1000000

--启动数据库

SQL> startup pfile=\'/tmp/pfile\'

ORACLE instance started.

Total System Global Area 1577058304 bytes

Fixed Size 1267716 bytes

Variable Size 385878012 bytes

Database Buffers 1174405120 bytes

Redo Buffers 15507456 bytes

Database mounted.

Database opened.

因为屏蔽了smon回收临时段,数据库未出现开始时错误，观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库中.

导致该异常sql分析

SQL> select file#, block#, ts# from seg$ where type# = 3;

select file#, block#, ts# from seg$ where type# = 3

*

ERROR at line 1:

ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],

[]

SQL> set autot trace exp

SQL> select file#, block#, ts# from seg$ where type# = 3;

Execution Plan

----------------------------------------------------------

Plan hash value: 1605285479

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 10 | 150 | 389 (1)| 00:00:05 |

|* 1 | TABLE ACCESS FULL| SEG$ | 10 | 150 | 389 (1)| 00:00:05 |

--------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

1 - filter("TYPE#"=3)

SQL> select count(rowid) from seg$;

Execution Plan

----------------------------------------------------------

Plan hash value: 763549841

---------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

---------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 12 | 389 (1)| 00:00:05 |

| 1 | SORT AGGREGATE | | 1 | 12 | | |

| 2 | TABLE ACCESS FULL| SEG$ | 45682 | 535K| 389 (1)| 00:00:05 |

---------------------------------------------------------------------------

SQL> select /*+ full(t) */ count(*) from seg$;

Execution Plan

----------------------------------------------------------

Plan hash value: 763549841

-------------------------------------------------------------------

| Id | Operation | Name | Rows | Cost (%CPU)| Time |

-------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 29 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | | |

| 2 | TABLE ACCESS FULL| SEG$ | 3223 | 29 (0)| 00:00:01 |

-------------------------------------------------------------------

SQL> select ts# from seg$;

Execution Plan

----------------------------------------------------------

Plan hash value: 1605285479

--------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |

--------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 3223 | 9669 | 29 (0)| 00:00:01 |

| 1 | TABLE ACCESS FULL| SEG$ | 3223 | 9669 | 29 (0)| 00:00:01 |

--------------------------------------------------------------------------

SQL> set autot off

SQL> select count(rowid) from seg$;

SQL> /

COUNT(ROWID)

------------

45727

SQL> select /*+ full(t) */ count(*) from seg$;

COUNT(*)

----------

45727

SQL> select ts# from seg$;

0

0

……

7

7

ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],

[]

SQL> !dbv file=\'/oracle/ora10/oradata/ora10g/system01.dbf\'

DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 9 14:05:09 2012

Copyright (c) 1982, 2007, Oracle. All rights reserved.

DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf

Block Checking: DBA = 4225673, Block Type = KTB-managed data block

data header at 0xb6d8225c

kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72

Page 31369 failed with check code 6135

Block Checking: DBA = 4236289, Block Type = KTB-managed data block

data header at 0xb6d7225c

kdbchk: tosp bad (-13399)

Page 41985 failed with check code 6127

DBVERIFY - Verification complete

Total Pages Examined : 192000

Total Pages Processed (Data) : 47588

Total Pages Failing (Data) : 2

Total Pages Processed (Index): 40929

Total Pages Failing (Index): 0

Total Pages Processed (Other): 1784

Total Pages Processed (Seg) : 0

Total Pages Failing (Seg) : 0

Total Pages Empty : 101699

Total Pages Marked Corrupt : 0

Total Pages Influx : 0

Highest block SCN : 1775671440 (0.1775671440)

SQL> ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE;

ANALYZE TABLE sys.SEG$ VALIDATE STRUCTURE CASCADE

*

ERROR at line 1:

ORA-01498: block check failure - see trace file

通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析

建议继续学习：

SMON: recover undo segment 与事务恢复 (阅读：1657)

QQ技术交流群：445447336，欢迎加入！
扫一扫订阅我的微信号：IT技术博客大学习

Recommend

82
- 新浪科技 tech.sina.com.cn 7 years ago
- Cache
12306最高峰抢票时段已过技术团队为自己打90分以上
12306最高峰抢票时段已平稳渡过技术团队为自己打90分以上央广网北京2月3日消息（记者郭淼）据中国之声《全国新闻联播》报道，铁路客服中心12306负责人今天表示，12306最高峰抢票时段已经平稳渡过，今年春运12306售票系统首次没有出现
45
- www.smzdm.com 7 years ago
- Cache
促销活动:天猫国际官方直营全球进口日每天4时段抢188-100、999-150、1899-300除奶粉...
天猫国际官方直营将于2月25日至26日开启全球进口日专场促销，活动商品涵盖母婴、个护、美妆、食品、数码、家电等众多品类，可选范围广。2月23日至24日为预...
39
- www.solidot.org 6 years ago
- Cache
广电总局计划黄金时段禁播境外视听节目
5
- article.mebtte.com 3 years ago
- Cache
如何处理 RESTFUL 数据异常导致的前端错误
如何处理 RESTFUL 数据异常导致的前端错误 2019-05-30最近一段时间监控平台收集了很多 JS 错误，大部分都是TypeError Uncaught TypeError:...
13
- bbs.csdn.net 3 years ago
- Cache
求时段分配的负责人
求时段分配的负责人-CSDN论坛人事面试问题回答思路_z清风与我的博客下面的一些问题能够考核出应聘者在这方面的能力.如果你是某事的负责人的话,你很容易让他人听你的;...
3
- chainb.com 3 years ago
- Cache
亚洲交易时段内，以太坊上的USDT活动出现减少
亚洲交易时段内，以太坊上的USDT活动出现减少作者：Frances Yue | 编译者：Maya | 来源：CoinDesk时间：2021-8-4 10:48:0 | 浏览： | ...
8
- www.beforweb.com 3 years ago
- Cache
大家处于相同起跑线的时段
大家处于相同起跑线的时段设计生涯 90% 的时间在利用成熟技术及设计模式来围绕着业务目标进行设计的大家，若遇到新技术发布且尚无任何产品/竞品可以参考时，该怎样设计...
4
- finance.jrj.com.cn 3 years ago
- Cache
GMV破2000万荣耀超品日首战斩获快手多时段品牌榜头名
GMV破2000万荣耀超品日首战斩获快手多时段品牌榜头名 0评论 2022-01-28 17:09:00 来源：中国网科技 ...
4
- blog.51cto.com 2 years ago
- Cache
Oracle故障处理：Ora-j000_hostname进程导致的异常处理
Oracle故障处理：Ora-j000_hostname进程导致的异常处理精选原创 1、数据库层...
7
- www.yimiaotui.com 1 year ago
- Cache
微博怎么清理异常关注？
微博怎么清理异常关注？微博推广发布日期：...

About Joyk

Aggregate valuable and interesting links.
Joyk means Joy of geeK