![](/style/images/good.png)
![](/style/images/bad.png)
记一次 .NET 某医疗住院系统 崩溃分析
source link: https://www.cnblogs.com/huangxincheng/p/17248323.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
1. 讲故事
最近收到了两起程序崩溃的dump,查了下都是经典的 double free
造成的,蛮有意思,这里就抽一篇出来分享一下经验供后面的学习者避坑吧。
二:WinDbg 分析
1. 崩溃点在哪里
windbg 带了一个自动化分析命令 !analyze -v
可以帮助我们找到崩溃时的程序指令地址以及崩溃的代码,这对我们分析问题非常有帮助。
0:090> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT: (.ecxr)
rax=00007ffec265d6e0 rbx=00000000c0000374 rcx=00000053653fe4f0
rdx=00007ffec1d3e9a0 rsi=0000000000000001 rdi=00007ffed7b827f0
rip=00007ffed7b1b349 rsp=00000053653fed10 rbp=000001c14fd20000
r8=000001c11957d9a0 r9=0000000000000033 r10=000001c453dbc7f0
r11=00007ffeb94db004 r12=0000000000000001 r13=000001c12e8526d0
r14=0000000000000000 r15=000001ce25531c60
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206
ntdll!RtlReportFatalFailure+0x9:
00007ffe`d7b1b349 eb00 jmp ntdll!RtlReportFatalFailure+0xb (00007ffe`d7b1b34b)
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ffed7b1b349 (ntdll!RtlReportFatalFailure+0x0000000000000009)
ExceptionCode: c0000374
ExceptionFlags: 00000001
NumberParameters: 1
Parameter[0]: 00007ffed7b827f0
PROCESS_NAME: w3wp.exe
...
熟悉 windows ntheap 的朋友应该知道,ExceptionCode: c0000374
是经典的 堆破坏 状态码,那到底是谁破坏了呢?
2. 到底是谁破坏了NT堆
windows 给了 ntheap 强大的调试支持,默认开启了 Termination on corruption
破坏检测,也就是说当你使用 !heap -s
的时候会显示具体破坏的详情记录,输出如下:
0:090> !heap -s
************************************************************************************************************************
NT HEAP STATS BELOW
************************************************************************************************************************
**************************************************************
* *
* HEAP ERROR DETECTED *
* *
**************************************************************
Details:
Heap address: 000001c14fd20000
Error address: 000001ce25531c50
Error type: HEAP_FAILURE_BLOCK_NOT_BUSY
Details: The caller performed an operation (such as a free
or a size check) that is illegal on a free block.
Follow-up: Check the error's stack trace to find the culprit.
Stack trace:
Stack trace at 0x00007ffed7b82848
00007ffed7abe109: ntdll!RtlpLogHeapFailure+0x45
00007ffed7acbb0e: ntdll!RtlFreeHeap+0x9d3ce
00007ffeb093276f: OraOps12!ssmem_free+0xf
00007ffeb0943077: OraOps12!OpsMetFreeValCtx+0xd7
00007ffeb093cdd8: OraOps12!OpsDacDispose+0x2b8
00007ffe655e4374: +0x655e4374
LFH Key : 0x5baf44f8068da60f
Termination on corruption : ENABLED
Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
(k) (k) (k) (k) length blocks cont. heap
-------------------------------------------------------------------------------------
000001c14fd20000 00000002 1021576 964388 1020020 19222 6063 166 2 82f LFH
000001c14fc70000 00008000 64 4 64 2 1 1 0 0
...
上面的 Error type: HEAP_FAILURE_BLOCK_NOT_BUSY
表示是一个 double free,从 Stack trace
看是 OpsDacDispose
方法造成的,应该和 Oracle 相关,这就比较迷了。。。
3. 是托管层触发的吗
是不是托管层触发的呢?这就需要理解 Windows 独有的 SEH 异常处理机制,也就是说 Windows 的异常都会在 内核态
走一圈,画个图如下:
![20230323165535.png](https://huangxincheng.oss-cn-hangzhou.aliyuncs.com/img/20230323165535.png)
只要找到 t1
时刻的崩溃点,然后观察线程栈即可,代码如下:
0:090> .ecxr
rax=00007ffec265d6e0 rbx=00000000c0000374 rcx=00000053653fe4f0
rdx=00007ffec1d3e9a0 rsi=0000000000000001 rdi=00007ffed7b827f0
rip=00007ffed7b1b349 rsp=00000053653fed10 rbp=000001c14fd20000
r8=000001c11957d9a0 r9=0000000000000033 r10=000001c453dbc7f0
r11=00007ffeb94db004 r12=0000000000000001 r13=000001c12e8526d0
r14=0000000000000000 r15=000001ce25531c60
iopl=0 nv up ei pl nz na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206
ntdll!RtlReportFatalFailure+0x9:
00007ffe`d7b1b349 eb00 jmp ntdll!RtlReportFatalFailure+0xb (00007ffe`d7b1b34b)
0:090> k
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 00000053`653fed10 00007ffe`d7b1b313 ntdll!RtlReportFatalFailure+0x9
01 00000053`653fed60 00007ffe`d7b23b9e ntdll!RtlReportCriticalFailure+0x97
02 00000053`653fee50 00007ffe`d7b23eaa ntdll!RtlpHeapHandleError+0x12
03 00000053`653fee80 00007ffe`d7abe109 ntdll!RtlpHpHeapHandleError+0x7a
04 00000053`653feeb0 00007ffe`d7acbb0e ntdll!RtlpLogHeapFailure+0x45
05 00000053`653feee0 00007ffe`b093276f ntdll!RtlFreeHeap+0x9d3ce
06 00000053`653fef80 00007ffe`b0943077 OraOps12!ssmem_free+0xf
07 00000053`653fefb0 00007ffe`b093cdd8 OraOps12!OpsMetFreeValCtx+0xd7
08 00000053`653fefe0 00007ffe`655e4374 OraOps12!OpsDacDispose+0x2b8
09 00000053`653ff060 00007ffe`655e31cf 0x00007ffe`655e4374
0a 00000053`653ff150 00007ffe`6a80969d 0x00007ffe`655e31cf
0b 00000053`653ff1f0 00007ffe`c4b96d06 0x00007ffe`6a80969d
0c 00000053`653ff220 00007ffe`c4c30e81 clr!FastCallFinalizeWorker+0x6
0d 00000053`653ff250 00007ffe`c4c30e09 clr!FastCallFinalize+0x55
0e 00000053`653ff2a0 00007ffe`c4c30d3a clr!MethodTable::CallFinalizer+0xb5
0f 00000053`653ff2f0 00007ffe`c4c30bf5 clr!CallFinalizer+0x5e
10 00000053`653ff330 00007ffe`c4c304dc clr!FinalizerThread::DoOneFinalization+0x95
11 00000053`653ff410 00007ffe`c4c31777 clr!FinalizerThread::FinalizeAllObjects+0xbf
12 00000053`653ff450 00007ffe`c4b97d01 clr!FinalizerThread::FinalizeAllObjects_Wrapper+0x18
13 00000053`653ff480 00007ffe`c4b97c70 clr!ManagedThreadBase_DispatchInner+0x39
14 00000053`653ff4c0 00007ffe`c4b97bad clr!ManagedThreadBase_DispatchMiddle+0x6c
15 00000053`653ff5c0 00007ffe`c4b9ac34 clr!ManagedThreadBase_DispatchOuter+0x75
16 00000053`653ff650 00007ffe`c4bf5271 clr!ManagedThreadBase_DispatchInCorrectAD+0x15
17 00000053`653ff680 00007ffe`c4b9ac72 clr!Thread::DoADCallBack+0x109
18 00000053`653ff830 00007ffe`c4c3172a clr!ManagedThreadBase_DispatchInner+0x82
19 00000053`653ff870 00007ffe`c4c304dc clr!FinalizerThread::DoOneFinalization+0x1f1
1a 00000053`653ff950 00007ffe`c4c3062b clr!FinalizerThread::FinalizeAllObjects+0xbf
1b 00000053`653ff990 00007ffe`c4b97d01 clr!FinalizerThread::FinalizerThreadWorker+0xbb
1c 00000053`653ff9d0 00007ffe`c4b97c70 clr!ManagedThreadBase_DispatchInner+0x39
1d 00000053`653ffa10 00007ffe`c4b97bad clr!ManagedThreadBase_DispatchMiddle+0x6c
1e 00000053`653ffb10 00007ffe`c4cf4d4a clr!ManagedThreadBase_DispatchOuter+0x75
1f 00000053`653ffba0 00007ffe`c4d5044f clr!FinalizerThread::FinalizerThreadStart+0x126
20 00000053`653ffc40 00007ffe`d6157e94 clr!Thread::intermediateThreadProc+0x86
21 00000053`653ffd00 00007ffe`d7a87ad1 kernel32!BaseThreadInitThunk+0x14
22 00000053`653ffd30 00000000`00000000 ntdll!RtlUserThreadStart+0x21
0:090> !clrstack
OS Thread Id: 0x5634 (90)
Child SP IP Call Site
00000053653ff0b8 00007ffed7abf0e4 [InlinedCallFrame: 00000053653ff0b8] Oracle.DataAccess.Client.OpsDac.Dispose(IntPtr, IntPtr, IntPtr, IntPtr ByRef, Oracle.DataAccess.Client.OpoMetValCtx*, Oracle.DataAccess.Client.OpoDacValCtx* ByRef, Oracle.DataAccess.Client.OpoSqlValCtx*, Int32, Int32)
00000053653ff0b8 00007ffe655e4374 [InlinedCallFrame: 00000053653ff0b8] Oracle.DataAccess.Client.OpsDac.Dispose(IntPtr, IntPtr, IntPtr, IntPtr ByRef, Oracle.DataAccess.Client.OpoMetValCtx*, Oracle.DataAccess.Client.OpoDacValCtx* ByRef, Oracle.DataAccess.Client.OpoSqlValCtx*, Int32, Int32)
00000053653ff060 00007ffe655e4374 DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, IntPtr, IntPtr, IntPtr ByRef, Oracle.DataAccess.Client.OpoMetValCtx*, Oracle.DataAccess.Client.OpoDacValCtx* ByRef, Oracle.DataAccess.Client.OpoSqlValCtx*, Int32, Int32)
00000053653ff150 00007ffe655e31cf Oracle.DataAccess.Client.OracleDataReader.Dispose(Boolean)
00000053653ff1f0 00007ffe6a80969d Oracle.DataAccess.Client.OracleDataReader.Finalize()
00000053653ff608 00007ffec4b96d06 [DebuggerU2MCatchHandlerFrame: 00000053653ff608]
00000053653ff788 00007ffec4b96d06 [ContextTransitionFrame: 00000053653ff788]
00000053653ff8d0 00007ffec4b96d06 [GCFrame: 00000053653ff8d0]
00000053653ffb58 00007ffec4b96d06 [DebuggerU2MCatchHandlerFrame: 00000053653ffb58]
从调用栈来看,原来是 终结器线程
在调用 OracleDataReader.Dispose()
方法的时候抛的异常,这个结果还是挺意外的,也就是说这个问题不是用户代码造成的,真的是 Oracle 这个 OraOps12.dll
造成的。。。
接下来用 lm
观察下这个 dll 的详情信息,输出如下:
0:090> lmDvmOraOps12
Browse full module list
start end module name
00007ffe`b0920000 00007ffe`b098c000 OraOps12 C (export symbols) OraOps12.dll
Loaded symbol image file: OraOps12.dll
Image path: C:\ODAC\xxxx\OraOps12.dll
Image name: OraOps12.dll
Browse all global symbols functions data
Timestamp: Sat Sep 26 23:16:56 2015 (5606B6E8)
CheckSum: 00000000
ImageSize: 0006C000
File version: 2.121.2.0
Product version: 2.121.2.0
File flags: 0 (Mask 3F)
File OS: 4 Unknown Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0409.04b0
Information from resource tables:
CompanyName: Oracle Corporation
ProductName: Oracle Data Provider for .NET
InternalName: OraOps
OriginalFilename: OraOps12.dll
ProductVersion: 2.121.2.0 ODAC RELEASE 4
FileVersion: 2.121.2.0
FileDescription: Oracle Provider Services
LegalCopyright: Copyright © 2014
虽然对 Oracle 不熟,但从 Timestamp: Sat Sep 26 23:16:56 2015
来看应该是一个比较老的 DLL 了,所以给到朋友的建议就是升级 OraOps12.dll
。
4. 是否有同行者
有时候直接让朋友升级dll有点缺少底气,最好就是找到一些同行者,经过一顿搜索,还真有同行者,又多了一份说服力,网址: https://techcommunity.microsoft.com/t5/iis-support-blog/w3wp-exe-crash-exception-code-0xc0000005/ba-p/334351
。
![20230323172126.png](https://huangxincheng.oss-cn-hangzhou.aliyuncs.com/img/20230323172126.png)
在百加dump的分析旅程中,碰到和 Oracle SDK 相关的也有 3+
起了,可能也许这些 SDK 在对接 .NET 上还不是特别稳健,大家在使用上尽量更新到最新版本吧,且用且珍惜!
![图片名称](https://images.cnblogs.com/cnblogs_com/huangxincheng/345039/o_210929020104%E6%9C%80%E6%96%B0%E6%B6%88%E6%81%AF%E4%BC%98%E6%83%A0%E4%BF%83%E9%94%80%E5%85%AC%E4%BC%97%E5%8F%B7%E5%85%B3%E6%B3%A8%E4%BA%8C%E7%BB%B4%E7%A0%81.jpg)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK