0

Windows 10 April update can cause data corruption

 11 months ago
source link: https://aloiskraus.wordpress.com/2023/05/24/windows-10-april-update-can-cause-data-corruption/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Alois Kraus

Performance is everything. But correctness comes first.

Starting with Windows 10 KB 5025221 I had issues in Outlook to attach files to mails. Drag and Drop did become unreliable and would sometimes silently do nothing. When I tried the Attach File dialog it would most of the time work, but sometimes fail with:

Update

MS has listed for this KB a Known Issue Rollback which is a MSI that uninstalls the specific portions of the problematic update. The detailed description shows that only CopyFile is affected but not MoveFile. My tests confirm this. More details about KIRs are https://techcommunity.microsoft.com/t5/windows-it-pro-blog/known-issue-rollback-helping-you-keep-windows-devices-protected/ba-p/217683.

grafik-2.png?w=677

That was annoying but not really problematic. More serious was that somtimes I could not open attached PDF files. This resulted in a file not found error box. The issue was sporadic enough and an Outlook restart did always fix the issue. But sometimes I did end up with empty files

grafik-3.png?w=542

which was not so good. Also some mails the attachements (PDFs) were set to 0 bytes which was a real data corruption.

Reason enough to drill deeper what was going on. The first thing I would take is a procmon trace. If someone interferes (Antivirus is always a good reason) we should see some access denied or other file deletion events there. After a first look I did not find a single failed IO operation in the procmon traces which was strange.

At least I did learn how Outlook works. When you double click an attachement in Outlook it copies the data from the mail database .pst/.ost file to a temp file located in

C:\Users\%username%\AppData\Local\Microsoft\Windows\INetCache\Content.Outlook\????\Src.pdf

Then it copies the file again into …\Src (002).pdf which is the file which is then opened by the actual PDF reader.

grafik-4.png?w=1024

When the reader is not a reader but a writer then Outlook will check if the second copy is different to the first copy and will ask if you want to save the changes in Outlook. If you do this manually you still can go back by asking Outlook for a earlier version which is still present. So far so good. But in the corrupted mails no previous go back button was present.

Drag and drop a file from Explorer into a new mail works similar. First the file is copied to %temp%\xxxx.tmp from there it is copied again to C:\Users\%username%\AppData\Local\Microsoft\Windows\INetCache\Content.Outlook\????\YourFile.pdf and then it is finally stored in your mail database .pst/.ost file.

Outlook is copying quite a few files around which is the reason why it uses the default Windows CopyFile method. When looking deeper with ETW tracing I could compare a good vs a bad attach run which did differ in the duration of the CopyFile operation which mysteriously did fail and did run much shorter than it did in the good case

|- OUTLOOK.EXE!HrSaveAttachmentsWorker
|    OUTLOOK.EXE!HrShowSaveAttachDialog
|    |- OUTLOOK.EXE!HrInvoke
|    |    OUTLOOK.EXE!HrInvokeHelper
|    |    OUTLOOK.EXE!StdDispatch::Invoke
|    |    OUTLOOK.EXE!_RenDispatchCall
|    |    OUTLOOK.EXE!AttachmentObject::HrSaveAsFile
|    |    |- OUTLOOK.EXE!AttachmentObject::HrCopyTempFileToLocation
|    |    |    KernelBase.dll!CopyFileW
|    |    |    KernelBase.dll!CopyFileExW
|    |    |    |- KernelBase.dll!BasepCopyFileExW
|    |    |    |- KernelBase.dll!CloseHandle
|    |    |- OUTLOOK.EXE!FDoesFileExistW
|    |- OUTLOOK.EXE!RenFFDialog::HrGetPath
|- OUTLOOK.EXE!DisplayErrorContext

The good thing is since a few month Microsoft delivers public symbols also for its office products which read much better than Ordinal+xxxx stack traces. Since it is not possible to see more with ETW profiling one would need to debug Outlook. The good thing is that this is possible now with the recently released new Windbg which allows you to attach to a running process and capture executed CPU instruction and accessed memory. The way how this works is fascinating. At debugging start a full memory dump is taken, then CPU tracing is enabled which captures the complete CPU instruction stream. The end result is not a static memory dump but a memory dump where you can debug forward and backwards! For more infos see https://aloiskraus.wordpress.com/2017/09/26/it-is-time-for-time-travel/. With that I could record a failed attach incident and send to MS support for further analysis where it did fail. The answer was that a regression did happen in the mentioned KB update which did kick in when the following conditions did occur:

  1. Windows 10 with KB 5025221 from April or May Update
  2. Process is 32 bit
  3. Process is Large Address Aware
  4. It calls into CopyFile / MoveFile
  5. A pointer to NtCreateFile with a pointer address > 2 GB is passed

When these things did happen the CreateFile call did bail out early even before procmon or other filter drivers where able to catch the return code of the failed CreateFile call. Quite a nasty thing which also explains why it did take so long to find out.

With that knowledge we can examine the issue a bit further. We can write a small test application which calls CopyFile in a 32 bit process which has allocated more than 2 GB of memory. The easiest way to achieve this is to use a few lines of C# like this

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Runtime.InteropServices;

namespace FileCopyBug
{
    internal class Program
    {
        static void Main(string[] args)
        {
            if (args.Length < 2)
            {
                Console.WriteLine("FileCopyBug src dest [-noalloc]");
                return;
            }

            Queue<string> queue = new Queue<string>(args);
            bool bAlloc = true;
            string src = null;
            string dest = null;

            while (queue.Count > 0)
            {
                string arg = queue.Dequeue();
                if (arg == "-noalloc")
                {
                    Console.WriteLine("Skip allocations.");
                    bAlloc = false;
                }
                if (src == null)
                {
                    src = arg;
                }

                if (dest == null)
                {
                    dest = arg;
                }
            }

            if (bAlloc) // allocate > 2 GB of memory to force pointers > 2 GB
            {
                int n = 0;
                const int ChunkBytes = 500;  // allocate small to not leave holes
                const int nMax = 4600000;
                while (++n < nMax)
                {
                    Marshal.AllocHGlobal(ChunkBytes);
                }
            }

            Console.WriteLine($"Allocated memory: {Process.GetCurrentProcess().PrivateMemorySize64:N0} bytes");

            // Call KernelBase.dll CopyFile which exhibits the bug when pointers > 2 GB are used for Extended Attributes to NtCreateFile which will bail out with an access violation c0000005 as NTSTATUS return code
            File.Copy(args[0], args[1], true); 
        }
    }
}

The compiled binary and code can be download from my OneDrive here. If you are running Windows 10 with the latest Updates you can try to copy one file to a new location just like with any copy tool:

c:\tmp>FileCopyBug.exe src.pdf dsg.pdf
Allocated memory: 2,485,137,408 bytes

Unhandled Exception: System.IO.IOException: Invalid access to memory location.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalCopy(String sourceFileName, String destFileName, Boolean overwrite, Boolean checkHost)
   at System.IO.File.Copy(String sourceFileName, String destFileName, Boolean overwrite)
   at FileCopyBug.Program.Main(String[] args)

That is really an Access Violation code which was returned by NtCreateFile which was correctly returned by CopyFile. When you add the parameter -noalloc then no additional memory is allocated and the copy operation will work. I have a machine with the Jan Win 10 version around where the FileCopy issue is not present. This also works on Windows 11.

Since we have come so far we can debug now our test application directly and see where this leads.

The steps are

  1. Create a breakpoint for CopyFile (bp kernelbase!CopyFileW)
  2. Run until we enter CopyFile (g)
  3. Create a breakpoint for NtCreateFile (bp ntdll!NtCreateFile)
  4. Run until we hit NtCreateFile for the source file (g)
  5. Print file name dt ole32!OBJECT_ATTRIBUTES poi(@esp+3*4)
  6. Leave method to check return code (gu), return code is in eax. 0 is success.
  7. Run until we hit NtCreateFile for the destination file (g)
  8. Print file name dt ole32!OBJECT_ATTRIBUTES poi(@esp+3*4)
  9. Leave method to check return code (gu), return code is in eax with c0000005 which is the error we are after.
0:001> bp kernelbase!CopyFileW
0:001> g
Breakpoint 0 hit
Time Travel Position: FBCD:1ED4
eax=28c67690 ebx=02af91b4 ecx=82e96558 edx=02b50000 esi=5a13bdb0 edi=02afa810
eip=76a137f0 esp=02af5028 ebp=02af52b8 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200206
KERNELBASE!CopyFileW:
76a137f0 8bff            mov     edi,edi
0:006> bp ntdll!NtCreateFile
0:006> g
Breakpoint 1 hit
Time Travel Position: FBCE:72E
eax=02af4900 ebx=00000000 ecx=00000005 edx=02af474c esi=80000000 edi=80000000
eip=77c13000 esp=02af45ac ebp=02af4f8c iopl=0         nv up ei ng nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200282
ntdll!NtCreateFile:
Unable to load image C:\Program Files (x86)\Microsoft Office\root\Office16\AppVIsvSubsystems32.dll, Win32 error 0n2
77c13000 e96b11cfdf      jmp     AppVIsvSubsystems32!vfs_hooks::hooked_NtCreateFile (57904170)
0:006> dt ole32!OBJECT_ATTRIBUTES poi(@esp+3*4)
   +0x000 Length           : 0x18
   +0x004 RootDirectory    : (null) 
   +0x008 ObjectName       : 0x02af4864 _UNICODE_STRING "\??\C:\Users\...\AppData\Local\Microsoft\Windows\INetCache\Content.Word\src.pdf"
   +0x00c Attributes       : 0x40
   +0x010 SecurityDescriptor : (null) 
   +0x014 SecurityQualityOfService : (null) 

0:006> gu
Time Travel Position: FBD2:640
eax=00000000 ebx=00000000 ecx=579026f4 edx=02b50000 esi=80000000 edi=80000000
eip=76a1521e esp=02af45dc ebp=02af4f8c iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200246
KERNELBASE!BasepCopyFileExW+0x187e:
76a1521e 898584f9ffff    mov     dword ptr [ebp-67Ch],eax ss:002b:02af4910=00000000
0:006> g
Breakpoint 1 hit
Time Travel Position: FBED:77
eax=02af2b8c ebx=00000000 ecx=00000005 edx=10000044 esi=c0150081 edi=00002020
eip=77c13000 esp=02af25d8 ebp=02af4598 iopl=0         nv up ei ng nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200286
ntdll!NtCreateFile:
77c13000 e96b11cfdf      jmp     AppVIsvSubsystems32!vfs_hooks::hooked_NtCreateFile (57904170)
0:006> dt ole32!OBJECT_ATTRIBUTES poi(@esp+3*4)
   +0x000 Length           : 0x18
   +0x004 RootDirectory    : (null) 
   +0x008 ObjectName       : 0x02af27a8 _UNICODE_STRING "\??\C:\Users\...\AppData\Local\Microsoft\Windows\INetCache\Content.Outlook\XNKF02BJ\src (002).pdf"
   +0x00c Attributes       : 0x40
   +0x010 SecurityDescriptor : (null) 
   +0x014 SecurityQualityOfService : 0x02af3f18 Void
0:006> gu
Time Travel Position: FBFA:1B8
eax=c0000005 ebx=00000000 ecx=579026f4 edx=02b50000 esi=c0150081 edi=00002020
eip=76a178c2 esp=02af2608 ebp=02af4598 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00200246
KERNELBASE!BaseCopyStream+0xd01:
76a178c2 8985e0e5ffff    mov     dword ptr [ebp-1A20h],eax ss:002b:02af2b78=00000000

The nice thing about the x86 WinAPI calling convention is that all parameters are on the stack. To see all passed parameters to an exported method of the signature which has 11 arguments

__kernel_entry NTSTATUS
NTAPI
NtCreateFile (
    OUT PHANDLE FileHandle,
    IN ACCESS_MASK DesiredAccess,
    IN POBJECT_ATTRIBUTES ObjectAttributes,
    OUT PIO_STATUS_BLOCK IoStatusBlock,
    IN PLARGE_INTEGER AllocationSize OPTIONAL,
    IN ULONG FileAttributes,
    IN ULONG ShareAccess,
    IN ULONG CreateDisposition,
    IN ULONG CreateOptions,
    IN PVOID EaBuffer OPTIONAL,
    IN ULONG EaLength
    );

I can break on method enter and then dump the next 12 DWORDs (the first value is the return address) from the stack pointer esp I can dump directly the passed method arguments:

0:006> dd @esp L0n12
02af25d8  76a17b41 02af2b8c 40150080 02af27ec
02af25e8  02af2a00 02af4ef0 00002020 00000000
02af25f8  00000005 10000044 02af29b8 00000010

This translates to

NtCreateFile( 
FileHandle Pointer = 02af2b8c,
DesiredAccess = 40150080 
   GENERIC_WRITE|SYNCHRONIZE|WRITE_DAC|DELETE|FILE_READ_ATTRIBUTES,
ObjectAttributes Pointer = 02af27ec,
IoStatusBlock Pointer = 02af2a00,
AllocationSize Pointer = 02af4ef0,
FileAttributes = 00002020
FILE_ATTRIBUTE_NORMAL|FILE_ATTRIBUTE_VIRTUAL|FILE_ATTRIBUTE_EA|FILE_ATTRIBUTE_UNPINNED,
ShareAccess = 00000000,
CreateDisposition = 5 
   FILE_OVERWRITE_IF,
CreateOptions = 10000044
   FILE_SEQUENTIAL_ONLY|FILE_NON_DIRECTORY_FILE|FILE_CONTAINS_EXTENDED_CREATE_INFORMATION,
EaBuffer Pointer = 02af29b8 
EaLength = 0x10

We are looking at a Windows 10 machine which makes me wonder why I find a FILE_CONTAINS_EXTENDED_CREATE_INFORMATION flag which is according to the docs a Windows 11 only feature. It looks like that Antivirus can scan less when it is clear that this is a local file. It looks like this performance feature was backported to Windows 10 with the April update. To check that we can dump the EaBuffer as a EXTENDED_CREATE_INFORMATION structure.

0:006> dt combase!PEXTENDED_CREATE_INFORMATION @esp+0n10*4
0x02af29b8 
   +0x000 ExtendedCreateFlags : 0n2
   +0x008 EaBuffer         : 0x81811fe8 Void
   +0x00c EaLength         : 0x2382

That sure looks like a valid structure with the EX_CREATE_FLAG_FILE_DEST_OPEN_FOR_COPY flag set. Now we have also found the EaBuffer pointer which is > 2 GB (0x80000000) which is causing NtCreateFile to return (wrongly) an Access Violoation return code.

Conclusions

The issue presented here is not specific to Outlook 32 bit but it can affect all 32 bit applications which make use of CopyFile/MoveFile. I have not studied the effect to Word/PowerPoint/Excel but similar issues might occur. At least you know now how you can find out if you are affected when you use a debugger. An alternative approach would be to enable ETW SysCall tracing which allows you to see every return code of all called kernel methods. Then you can scan for NtCreateFile and a return code of 0xC0000005 which should never happen.

Lets hope that soon a fix is released to make these strange sporadic failures go away.

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK