7

记一次 .NET 某医疗住院系统 崩溃分析

 1 year ago
source link: https://www.cnblogs.com/huangxincheng/p/17248323.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

1. 讲故事

最近收到了两起程序崩溃的dump,查了下都是经典的 double free 造成的,蛮有意思,这里就抽一篇出来分享一下经验供后面的学习者避坑吧。

二:WinDbg 分析

1. 崩溃点在哪里

windbg 带了一个自动化分析命令 !analyze -v 可以帮助我们找到崩溃时的程序指令地址以及崩溃的代码,这对我们分析问题非常有帮助。


0:090> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************
CONTEXT:  (.ecxr)
rax=00007ffec265d6e0 rbx=00000000c0000374 rcx=00000053653fe4f0
rdx=00007ffec1d3e9a0 rsi=0000000000000001 rdi=00007ffed7b827f0
rip=00007ffed7b1b349 rsp=00000053653fed10 rbp=000001c14fd20000
 r8=000001c11957d9a0  r9=0000000000000033 r10=000001c453dbc7f0
r11=00007ffeb94db004 r12=0000000000000001 r13=000001c12e8526d0
r14=0000000000000000 r15=000001ce25531c60
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ntdll!RtlReportFatalFailure+0x9:
00007ffe`d7b1b349 eb00            jmp     ntdll!RtlReportFatalFailure+0xb (00007ffe`d7b1b34b)
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffed7b1b349 (ntdll!RtlReportFatalFailure+0x0000000000000009)
   ExceptionCode: c0000374
  ExceptionFlags: 00000001
NumberParameters: 1
   Parameter[0]: 00007ffed7b827f0

PROCESS_NAME:  w3wp.exe
...

熟悉 windows ntheap 的朋友应该知道,ExceptionCode: c0000374 是经典的 堆破坏 状态码,那到底是谁破坏了呢?

2. 到底是谁破坏了NT堆

windows 给了 ntheap 强大的调试支持,默认开启了 Termination on corruption 破坏检测,也就是说当你使用 !heap -s 的时候会显示具体破坏的详情记录,输出如下:


0:090> !heap -s


************************************************************************************************************************
                                              NT HEAP STATS BELOW
************************************************************************************************************************
**************************************************************
*                                                            *
*                  HEAP ERROR DETECTED                       *
*                                                            *
**************************************************************

Details:

Heap address:  000001c14fd20000
Error address: 000001ce25531c50
Error type: HEAP_FAILURE_BLOCK_NOT_BUSY
Details:    The caller performed an operation (such as a free
            or a size check) that is illegal on a free block.
Follow-up:  Check the error's stack trace to find the culprit.


Stack trace:
Stack trace at 0x00007ffed7b82848
    00007ffed7abe109: ntdll!RtlpLogHeapFailure+0x45
    00007ffed7acbb0e: ntdll!RtlFreeHeap+0x9d3ce
    00007ffeb093276f: OraOps12!ssmem_free+0xf
    00007ffeb0943077: OraOps12!OpsMetFreeValCtx+0xd7
    00007ffeb093cdd8: OraOps12!OpsDacDispose+0x2b8
    00007ffe655e4374: +0x655e4374

LFH Key                   : 0x5baf44f8068da60f
Termination on corruption : ENABLED
          Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                            (k)     (k)    (k)     (k) length      blocks cont. heap 
-------------------------------------------------------------------------------------
000001c14fd20000 00000002 1021576 964388 1020020  19222  6063   166    2    82f   LFH
000001c14fc70000 00008000      64      4     64      2     1     1    0      0      
...

上面的 Error type: HEAP_FAILURE_BLOCK_NOT_BUSY 表示是一个 double free,从 Stack trace 看是 OpsDacDispose 方法造成的,应该和 Oracle 相关,这就比较迷了。。。

3. 是托管层触发的吗

是不是托管层触发的呢?这就需要理解 Windows 独有的 SEH 异常处理机制,也就是说 Windows 的异常都会在 内核态 走一圈,画个图如下:

20230323165535.png

只要找到 t1 时刻的崩溃点,然后观察线程栈即可,代码如下:


0:090> .ecxr
rax=00007ffec265d6e0 rbx=00000000c0000374 rcx=00000053653fe4f0
rdx=00007ffec1d3e9a0 rsi=0000000000000001 rdi=00007ffed7b827f0
rip=00007ffed7b1b349 rsp=00000053653fed10 rbp=000001c14fd20000
 r8=000001c11957d9a0  r9=0000000000000033 r10=000001c453dbc7f0
r11=00007ffeb94db004 r12=0000000000000001 r13=000001c12e8526d0
r14=0000000000000000 r15=000001ce25531c60
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ntdll!RtlReportFatalFailure+0x9:
00007ffe`d7b1b349 eb00            jmp     ntdll!RtlReportFatalFailure+0xb (00007ffe`d7b1b34b)
0:090> k
  *** Stack trace for last set context - .thread/.cxr resets it
 # Child-SP          RetAddr               Call Site
00 00000053`653fed10 00007ffe`d7b1b313     ntdll!RtlReportFatalFailure+0x9
01 00000053`653fed60 00007ffe`d7b23b9e     ntdll!RtlReportCriticalFailure+0x97
02 00000053`653fee50 00007ffe`d7b23eaa     ntdll!RtlpHeapHandleError+0x12
03 00000053`653fee80 00007ffe`d7abe109     ntdll!RtlpHpHeapHandleError+0x7a
04 00000053`653feeb0 00007ffe`d7acbb0e     ntdll!RtlpLogHeapFailure+0x45
05 00000053`653feee0 00007ffe`b093276f     ntdll!RtlFreeHeap+0x9d3ce
06 00000053`653fef80 00007ffe`b0943077     OraOps12!ssmem_free+0xf
07 00000053`653fefb0 00007ffe`b093cdd8     OraOps12!OpsMetFreeValCtx+0xd7
08 00000053`653fefe0 00007ffe`655e4374     OraOps12!OpsDacDispose+0x2b8
09 00000053`653ff060 00007ffe`655e31cf     0x00007ffe`655e4374
0a 00000053`653ff150 00007ffe`6a80969d     0x00007ffe`655e31cf
0b 00000053`653ff1f0 00007ffe`c4b96d06     0x00007ffe`6a80969d
0c 00000053`653ff220 00007ffe`c4c30e81     clr!FastCallFinalizeWorker+0x6
0d 00000053`653ff250 00007ffe`c4c30e09     clr!FastCallFinalize+0x55
0e 00000053`653ff2a0 00007ffe`c4c30d3a     clr!MethodTable::CallFinalizer+0xb5
0f 00000053`653ff2f0 00007ffe`c4c30bf5     clr!CallFinalizer+0x5e
10 00000053`653ff330 00007ffe`c4c304dc     clr!FinalizerThread::DoOneFinalization+0x95
11 00000053`653ff410 00007ffe`c4c31777     clr!FinalizerThread::FinalizeAllObjects+0xbf
12 00000053`653ff450 00007ffe`c4b97d01     clr!FinalizerThread::FinalizeAllObjects_Wrapper+0x18
13 00000053`653ff480 00007ffe`c4b97c70     clr!ManagedThreadBase_DispatchInner+0x39
14 00000053`653ff4c0 00007ffe`c4b97bad     clr!ManagedThreadBase_DispatchMiddle+0x6c
15 00000053`653ff5c0 00007ffe`c4b9ac34     clr!ManagedThreadBase_DispatchOuter+0x75
16 00000053`653ff650 00007ffe`c4bf5271     clr!ManagedThreadBase_DispatchInCorrectAD+0x15
17 00000053`653ff680 00007ffe`c4b9ac72     clr!Thread::DoADCallBack+0x109
18 00000053`653ff830 00007ffe`c4c3172a     clr!ManagedThreadBase_DispatchInner+0x82
19 00000053`653ff870 00007ffe`c4c304dc     clr!FinalizerThread::DoOneFinalization+0x1f1
1a 00000053`653ff950 00007ffe`c4c3062b     clr!FinalizerThread::FinalizeAllObjects+0xbf
1b 00000053`653ff990 00007ffe`c4b97d01     clr!FinalizerThread::FinalizerThreadWorker+0xbb
1c 00000053`653ff9d0 00007ffe`c4b97c70     clr!ManagedThreadBase_DispatchInner+0x39
1d 00000053`653ffa10 00007ffe`c4b97bad     clr!ManagedThreadBase_DispatchMiddle+0x6c
1e 00000053`653ffb10 00007ffe`c4cf4d4a     clr!ManagedThreadBase_DispatchOuter+0x75
1f 00000053`653ffba0 00007ffe`c4d5044f     clr!FinalizerThread::FinalizerThreadStart+0x126
20 00000053`653ffc40 00007ffe`d6157e94     clr!Thread::intermediateThreadProc+0x86
21 00000053`653ffd00 00007ffe`d7a87ad1     kernel32!BaseThreadInitThunk+0x14
22 00000053`653ffd30 00000000`00000000     ntdll!RtlUserThreadStart+0x21

0:090> !clrstack 
OS Thread Id: 0x5634 (90)
        Child SP               IP Call Site
00000053653ff0b8 00007ffed7abf0e4 [InlinedCallFrame: 00000053653ff0b8] Oracle.DataAccess.Client.OpsDac.Dispose(IntPtr, IntPtr, IntPtr, IntPtr ByRef, Oracle.DataAccess.Client.OpoMetValCtx*, Oracle.DataAccess.Client.OpoDacValCtx* ByRef, Oracle.DataAccess.Client.OpoSqlValCtx*, Int32, Int32)
00000053653ff0b8 00007ffe655e4374 [InlinedCallFrame: 00000053653ff0b8] Oracle.DataAccess.Client.OpsDac.Dispose(IntPtr, IntPtr, IntPtr, IntPtr ByRef, Oracle.DataAccess.Client.OpoMetValCtx*, Oracle.DataAccess.Client.OpoDacValCtx* ByRef, Oracle.DataAccess.Client.OpoSqlValCtx*, Int32, Int32)
00000053653ff060 00007ffe655e4374 DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, IntPtr, IntPtr, IntPtr ByRef, Oracle.DataAccess.Client.OpoMetValCtx*, Oracle.DataAccess.Client.OpoDacValCtx* ByRef, Oracle.DataAccess.Client.OpoSqlValCtx*, Int32, Int32)
00000053653ff150 00007ffe655e31cf Oracle.DataAccess.Client.OracleDataReader.Dispose(Boolean)
00000053653ff1f0 00007ffe6a80969d Oracle.DataAccess.Client.OracleDataReader.Finalize()
00000053653ff608 00007ffec4b96d06 [DebuggerU2MCatchHandlerFrame: 00000053653ff608] 
00000053653ff788 00007ffec4b96d06 [ContextTransitionFrame: 00000053653ff788] 
00000053653ff8d0 00007ffec4b96d06 [GCFrame: 00000053653ff8d0] 
00000053653ffb58 00007ffec4b96d06 [DebuggerU2MCatchHandlerFrame: 00000053653ffb58] 

从调用栈来看,原来是 终结器线程 在调用 OracleDataReader.Dispose() 方法的时候抛的异常,这个结果还是挺意外的,也就是说这个问题不是用户代码造成的,真的是 Oracle 这个 OraOps12.dll 造成的。。。

接下来用 lm 观察下这个 dll 的详情信息,输出如下:


0:090> lmDvmOraOps12
Browse full module list
start             end                 module name
00007ffe`b0920000 00007ffe`b098c000   OraOps12 C (export symbols)       OraOps12.dll
    Loaded symbol image file: OraOps12.dll
    Image path: C:\ODAC\xxxx\OraOps12.dll
    Image name: OraOps12.dll
    Browse all global symbols  functions  data
    Timestamp:        Sat Sep 26 23:16:56 2015 (5606B6E8)
    CheckSum:         00000000
    ImageSize:        0006C000
    File version:     2.121.2.0
    Product version:  2.121.2.0
    File flags:       0 (Mask 3F)
    File OS:          4 Unknown Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04b0
    Information from resource tables:
        CompanyName:      Oracle Corporation
        ProductName:      Oracle Data Provider for .NET
        InternalName:     OraOps
        OriginalFilename: OraOps12.dll
        ProductVersion:   2.121.2.0 ODAC RELEASE 4
        FileVersion:      2.121.2.0
        FileDescription:  Oracle Provider Services
        LegalCopyright:   Copyright © 2014

虽然对 Oracle 不熟,但从 Timestamp: Sat Sep 26 23:16:56 2015 来看应该是一个比较老的 DLL 了,所以给到朋友的建议就是升级 OraOps12.dll

4. 是否有同行者

有时候直接让朋友升级dll有点缺少底气,最好就是找到一些同行者,经过一顿搜索,还真有同行者,又多了一份说服力,网址: https://techcommunity.microsoft.com/t5/iis-support-blog/w3wp-exe-crash-exception-code-0xc0000005/ba-p/334351

20230323172126.png

在百加dump的分析旅程中,碰到和 Oracle SDK 相关的也有 3+ 起了,可能也许这些 SDK 在对接 .NET 上还不是特别稳健,大家在使用上尽量更新到最新版本吧,且用且珍惜!

图片名称

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK