

处理smon清理临时段导致数据库异常案例
source link: https://blogread.cn/it/article/5651?f=hot1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

处理smon清理临时段导致数据库异常案例
一个朋友的数据库在经过自己的千辛万苦终于open成功,但是几分钟就down掉,使得他想导出数据重建库的目标不能实现.让我帮忙处理
alert日志报ORA-00600[kafspa:columnBuffer1]
Wed Aug 8 10:55:31 2012
Completed: ALTER DATABASE OPEN
Wed Aug 8 10:55:41 2012
Errors
in
file
/oracle/ora10/admin/ora10g/udump/ora10g_ora_12160
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:55:47 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 2 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:55:47 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:55:58 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 3 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:55:59 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:56:10 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 4 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:56:11 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:56:22 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 5 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:56:32 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:56:43 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 6 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:56:53 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:57:04 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 7 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:57:14 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:57:25 2012
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
SMON encountered 8 out of maximum 100 non-fatal internal errors.
Wed Aug 8 10:57:35 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_smon_8941
.trc:
ORA-00600: internal error code, arguments: [kafspa:columnBuffer1], [28], [22], [], [], [], [], []
Wed Aug 8 10:57:38 2012
Errors
in
file
/oracle/ora10/admin/ora10g/bdump/ora10g_pmon_8856
.trc:
ORA-00474: SMON process terminated with error
Wed Aug 8 10:57:38 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 8856
这里出现ORA-00600[kafspa:columnBuffer1],一个未知的错误,但是根据相关的提示,可以大概猜出来是什么原因导致数据库异常
Non-fatal internal error happenned
while
SMON was doing temporary segment drop.
出现这个错误,使得我们想到一个smon的功能,清理临时段.该数据库down掉很可能和smon清理临时段的过程发生失败有关系
SMON encountered 8 out of maximum 100 non-fatal internal errors.
这个错误提示是因为smon内部最多允许发生100次错误,记录错误发生了8次,当然这次数据库down掉是smon还没有达到100次就直接abort掉
SQL> col
name
for
a32
SQL> col value
for
a24
SQL> col description
for
a70
SQL>
set
linesize 150
SQL>
select
a.ksppinm
name
,b.ksppstvl value,a.ksppdesc description
2
from
x$ksppi a,x$ksppcv b
3
where
a.inst_id = USERENV (\'Instance\')
4
and
b.inst_id = USERENV (\'Instance\')
5
and
a.indx = b.indx
6
and
upper
(a.ksppinm)
LIKE
upper
(\'%¶m%\')
7
order
by
name
8
SQL> /
Enter value
for
param: smon_internal_errlimit
old 6:
and
upper
(a.ksppinm)
LIKE
upper
(\'%¶m%\')
new 6:
and
upper
(a.ksppinm)
LIKE
upper
(\'%smon_internal_errlimit%\')
NAME
VALUE DESCRIPTION
-------------------------------- ------------------------ -----------------------------------
_smon_internal_errlimit 100 limit
of
SMON internal errors
分析trace文件
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME =
/oracle/ora10/product
System name: Linux
Node name: DBN-HLD-155-2
Release: 2.6.18-92.el5PAE
Version:
#1 SMP Tue Apr 29 13:31:02 EDT 2008
Machine: i686
Instance name: ora10g
Redo thread mounted by this instance: 1
Oracle process number: 8
Unix process pid: 8941, image: oracle@DBN-HLD-155-2 (SMON)
*** SERVICE NAME:() 2012-08-08 10:55:20.208
*** SESSION ID:(274.1) 2012-08-08 10:55:20.208
*** 2012-08-08 10:55:20.208
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], []
Current SQL statement
for
this session:
select
file
#, block#, ts# from seg$ where type# = 3
果然是smon在查询type#=3的时候发现异常,出现ORA-00600[25027]错误.通过对seg$相关视图分析,可以知道type#=3表示临时段,也就是说数据库smon在查询哪些segment是临时段的时候发生意外,而结合alert日志,完整的错误应该就是:数据库启动后,smon进程为了清理临时段,需要通过select file#, block#, ts# from seg$ where type# = 3查询临时段,但是在查询的时候发生错误.而这个错误累积几次导致数据库异常关闭.
解决问题
--修改两个参数
event=\'10061 trace name context forever, level 10\'
_smon_internal_errlimit=1000000
--启动数据库
SQL> startup pfile=\'
/tmp/pfile
\'
ORACLE instance started.
Total System Global Area 1577058304 bytes
Fixed Size 1267716 bytes
Variable Size 385878012 bytes
Database Buffers 1174405120 bytes
Redo Buffers 15507456 bytes
Database mounted.
Database opened.
因为屏蔽了smon回收临时段,数据库未出现开始时错误,观察数据库几个小时,运行正常,到此基本上解决了此次异常,通过exp可以顺利导出数据然后导入到新库中.
导致该异常sql分析
SQL>
select
file#, block#, ts#
from
seg$
where
type# = 3;
select
file#, block#, ts#
from
seg$
where
type# = 3
*
ERROR
at
line 1:
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL>
set
autot trace exp
SQL>
select
file#, block#, ts#
from
seg$
where
type# = 3;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id | Operation |
Name
|
Rows
| Bytes | Cost (%CPU)|
Time
|
--------------------------------------------------------------------------
| 0 |
SELECT
STATEMENT | | 10 | 150 | 389 (1)| 00:00:05 |
|* 1 |
TABLE
ACCESS
FULL
| SEG$ | 10 | 150 | 389 (1)| 00:00:05 |
--------------------------------------------------------------------------
Predicate Information (identified
by
operation id):
---------------------------------------------------
1 - filter(
"TYPE#"
=3)
SQL>
select
count
(rowid)
from
seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
---------------------------------------------------------------------------
| Id | Operation |
Name
|
Rows
| Bytes | Cost (%CPU)|
Time
|
---------------------------------------------------------------------------
| 0 |
SELECT
STATEMENT | | 1 | 12 | 389 (1)| 00:00:05 |
| 1 | SORT AGGREGATE | | 1 | 12 | | |
| 2 |
TABLE
ACCESS
FULL
| SEG$ | 45682 | 535K| 389 (1)| 00:00:05 |
---------------------------------------------------------------------------
SQL>
select
/*+
full
(t) */
count
(*)
from
seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 763549841
-------------------------------------------------------------------
| Id | Operation |
Name
|
Rows
| Cost (%CPU)|
Time
|
-------------------------------------------------------------------
| 0 |
SELECT
STATEMENT | | 1 | 29 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 |
TABLE
ACCESS
FULL
| SEG$ | 3223 | 29 (0)| 00:00:01 |
-------------------------------------------------------------------
SQL>
select
ts#
from
seg$;
Execution Plan
----------------------------------------------------------
Plan hash value: 1605285479
--------------------------------------------------------------------------
| Id | Operation |
Name
|
Rows
| Bytes | Cost (%CPU)|
Time
|
--------------------------------------------------------------------------
| 0 |
SELECT
STATEMENT | | 3223 | 9669 | 29 (0)| 00:00:01 |
| 1 |
TABLE
ACCESS
FULL
| SEG$ | 3223 | 9669 | 29 (0)| 00:00:01 |
--------------------------------------------------------------------------
SQL>
set
autot
off
SQL>
select
count
(rowid)
from
seg$;
SQL> /
COUNT
(ROWID)
------------
45727
SQL>
select
/*+
full
(t) */
count
(*)
from
seg$;
COUNT
(*)
----------
45727
SQL>
select
ts#
from
seg$;
0
0
……
7
7
ORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [],
[]
SQL> !dbv file=\'/oracle/ora10/oradata/ora10g/system01.dbf\'
DBVERIFY: Release 10.2.0.4.0 - Production
on
Thu Aug 9 14:05:09 2012
Copyright (c) 1982, 2007, Oracle.
All
rights reserved.
DBVERIFY - Verification starting : FILE = /oracle/ora10/oradata/ora10g/system01.dbf
Block Checking: DBA = 4225673, Block Type = KTB-managed data block
data header
at
0xb6d8225c
kdbchk: bad row offset slot 6 offs 3030 fseo 3752 dtl 8168 bhs 72
Page 31369 failed
with
check
code 6135
Block Checking: DBA = 4236289, Block Type = KTB-managed data block
data header
at
0xb6d7225c
kdbchk: tosp bad (-13399)
Page 41985 failed
with
check
code 6127
DBVERIFY - Verification complete
Total Pages Examined : 192000
Total Pages Processed (Data) : 47588
Total Pages Failing (Data) : 2
Total Pages Processed (
Index
): 40929
Total Pages Failing (
Index
): 0
Total Pages Processed (Other): 1784
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 101699
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1775671440 (0.1775671440)
SQL> ANALYZE
TABLE
sys.SEG$ VALIDATE STRUCTURE
CASCADE
;
ANALYZE
TABLE
sys.SEG$ VALIDATE STRUCTURE
CASCADE
*
ERROR
at
line 1:
ORA-01498: block
check
failure - see trace file
通过这里的分析大概可以确定是由于两块KTB-managed data block数据块异常,导致直接对seg$进行TABLE ACCESS FULL操作的时候发生异常.因为这个库已经破坏了数据一致性,先导出来数据,至于出现该错误的原因,后续继续关注分析
建议继续学习:
扫一扫订阅我的微信号:IT技术博客大学习
Recommend
-
82
12306最高峰抢票时段已平稳渡过技术团队为自己打90分以上央广网北京2月3日消息(记者郭淼)据中国之声《全国新闻联播》报道,铁路客服中心12306负责人今天表示,12306最高峰抢票时段已经平稳渡过,今年春运12306售票系统首次没有出现
-
45
天猫国际官方直营将于2月25日至26日开启全球进口日专场促销,活动商品涵盖母婴、个护、美妆、食品、数码、家电等众多品类,可选范围广。2月23日至24日为预...
-
39
-
5
如何处理 RESTFUL 数据异常导致的前端错误 2019-05-30最近一段时间监控平台收集了很多 JS 错误,大部分都是TypeError Uncaught TypeError:...
-
13
求时段分配的负责人-CSDN论坛人事面试问题回答思路_z清风与我的博客下面的一些问题能够考核出应聘者在这方面的能力.如果你是某事的负责人的话,你很容易让他人听你的;...
-
3
亚洲交易时段内,以太坊上的USDT活动出现减少 作者:Frances Yue | 编译者:Maya | 来源:CoinDesk时间:2021-8-4 10:48:0 | 浏览: | ...
-
8
大家处于相同起跑线的时段 设计生涯 90% 的时间在利用成熟技术及设计模式来围绕着业务目标进行设计的大家,若遇到新技术发布且尚无任何产品/竞品可以参考时,该怎样设计...
-
4
GMV破2000万 荣耀超品日首战斩获快手多时段品牌榜头名 0评论 2022-01-28 17:09:00 来源:中国网科技 ...
-
4
Oracle故障处理:Ora-j000_hostname进程导致的异常处理 精选 原创 1、数据库层...
-
7
微博怎么清理异常关注? 微博推广 发布日期:...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK