Bug Hunting #1: Garbled Audio from End to End

posted by Craig Gidney on October 1, 2013

“Hello. Hello? Hello!”

“I don’t hear anything.”

“How about this?” dragging the bottom of the phone along the desk

“Whoah, that’s trippy. *Twangs* for days.”

“Yeah. I have no idea why it’s happening.”

Garbling

I was working on making sure our voip app could interop with another voip app. It was going pretty well, all things considered. It only took a couple hours to get to the point where we could establish a call with encrypted audio packets traveling back and forth. Except the audio data was being garbled somehow by the transition between the apps, and I couldn’t figure out why: it worked fine going from our app to our app, or from their app to their app, but not between our apps.

The received audio did have structure, and it did somehow relate to what was being sent. Silence was turned into silence with pops and crackles here and there, whereas rumbling sounds were played back on the other side as a cacophony of twangs. This indicated that the audio data was being somehow misinterpreted, instead of being totally destroyed by losing the data, decrypting with the wrong key, or something else catastrophic.

Analyzing

The art of debugging well is hard to describe. It’s a mix of figuring out what classes of problems could result in a bug like the one you’re seeing, figuring out how to quickly rule out (or confirm) each of those problems in order to narrow down where the bug could be, and efficiently cycling that process. Sort of a small scale scientific method.

My first guess about the cause of the bug, based on the fact that the audio was related-but-wrong, was a codec mismatch. We use Speex, and I worried that maybe the apps were specifying incompatible parameters. If one app was encoding with a different number of channels or on a different quality setting than expected by the other, maybe that would garble the audio without totally destroying it (i.e. what we were experiencing).

Since we have access to the source of the other app, I could compare their Speex setup to ours line by line. Unfortunately (or is that fortunately?), the setups matched. The only notable difference was that we were explicitly setting the sample rate to 8 kHz whereas they were getting 8 kHz by not setting the sample rate at all (the default is 8 kHz). As far as I could tell or test, the codecs were fine.

Next idea. Maybe we’re losing or introducing data during packet (de)serialization. We’d needed to hack around a few deviations from the packet spec, and those hacks could conceivably be introducing some extra zeroes at the start of packets’ payloads. Alternatively, maybe we’d missed a necessary hack.

I added logging to our app. Unfortunately, since we hadn’t bothered with being able to build the other app, I couldn’t add logging to the other side. Nevertheless, I compared their packing to our parsing and stepped through our deserialization of their packets line by line. Everything seemed to be in order, except of course the audio was steadfast in its insistence on being garbled.

This is basically where I was stuck for an entire day and a half, questioning more and more assumptions while trying out variations of ‘is it the codec?’/’is it the packet serialization?’ and adding more and more logging. One fruitful tweak was reflecting audio: send back what we received, instead of what we recorded (thus bypassing any decoding or encoding of the audio). The bug still manifested in that case, indicating it was almost definitely somewhere in the serialization code.

Logging to Victory

Eventually, it became obvious I needed logs from the other app if I was going to figure this out anytime soon. I asked Jazz (a coworker) to work out how to make a custom build of the other app (which took another half day of fiddling, because nothing ever just works and I wish I could make this sentence sound as cynical as I feel when thinking about it). Once we could make custom builds, we logged the data at each stage in its journey in both apps so I could see the whole process.

It didn’t take long to find the difference:

(Sender) After Speex Encoding, Before Encryption and Authentication:
1DDE5C700039CE70001CE7206F2A7940739078071BCF30959CA10AB3FEAE09FE83692BC579FABD53

(Receiver) After Authentication and Decryption, Before Speex Decoding:
1dde5c700039ce70001ce7206f2a7940e0fe5d82f664b96b2eafef2f549b704a47181e8a37f0cf44

What the heck. The first 32 bytes of encoded audio data are communicated correctly, but the rest has been garbled into random garbage. It almost looks like the encryption key is wrong, except that would make the whole thing random garbage instead of just the first 32 bytes… better check the crypto code anyways.

The payload of each packet is encrypted with AES in counter mode. That’s the mode where you get the ciphertext of a block by xoring the encryption of a counter into your plaintext. The next block will use the next counter value. We didn’t actually implement AES or CTR mode, of course. We just have a method that hides all the details of calling out to an external library:

+(NSData*)cipherCTR:(NSData*)data
             withIv:(NSData*)iv
            withKey:(NSData*)key
         withCipher:(const struct ltc_cipher_descriptor*)cipher
          operation:(enum CipherOperation)operation {
    require(data != nil);
    require(iv != nil);
    require(key != nil);
    require(operation == Decrypt || operation == Encrypt);
    symmetric_CTR ctr;
    
    int r = register_cipher(&aes_desc);
    checkCryptoRegisterOperation(r, @"register_cipher");
    
    int r2 = ctr_start(r, [iv bytes], [key bytes], (int)[key length], 0, CTR_COUNTER_LITTLE_ENDIAN, &ctr);
    checkCryptoOperation(r2, @"ctr_start");
    
    NSMutableData* output = [NSMutableData dataWithLength:[data length]];
    if (operation == Decrypt) {
        int r3 = ctr_decrypt([data bytesNotNull], [output mutableBytes], [data length], &ctr);
        checkCryptoOperation(r3, @"ctr_decrypt");
    } else {
        int r3 = ctr_encrypt([data bytesNotNull], [output mutableBytes], [data length], &ctr);
        checkCryptoOperation(r3, @"ctr_encrypt");
    }
    
    int r4 = ctr_done(&ctr);
    checkCryptoOperation(r4, @"ctr_done");
    
    return output;
}
...
-(NSData*)decryptWithAesInCounterModeWithKey:(NSData*)key andIv:(NSData*)iv {
    return [CryptoUtil cipherCTR:self withIv:iv withKey:key withCipher:&aes_desc operation:Decrypt];
}

At this point I realize I’ve been skipping over checking this method because I expect it to be correct. It’s well tested (using AES test vectors you can find online), and it’s in code that’s well exercised. But the fact that the garbage data starts after the first 32 bytes, at the start of the second AES block, means I’d better check really closely.

Wait. What’s that flag?

int r2 = ctr_start(r, [iv bytes], [key bytes], (int)[key length], 0, CTR_COUNTER_LITTLE_ENDIAN, &ctr);

Oh right, incrementing the counter requires knowing whether the counter bytes are big or little endian. Otherwise everything after the first block would use the wrong… Ugh.

CTR_COUNTER_LITTLEBIG_ENDIAN // (after checking method is not used anywhere else)

ENDIANNNNNNN BUUUUUGGGSSS!!!!!!!!!!!!!!!!!!!!!!

They’re advancing the counter by incrementing from the last byte and carrying towards the first byte. We were advancing the counter by incrementing the first byte and carrying towards the last byte. The initial values would agree, allowing a bit of correct data to make it through so it sounds like a codec issue, but then we would effectively turn 0x0000 + 1 into 0x0100 instead of 0x0001. Thus all data after the first block was being trashed instead of decrypted.

Just another example of the nastiest bugs being something trivial.

Lessons

This bug was entirely my fault, and could easily have been caught a lot earlier.

Write it Down: I remember noting the endian flag was a risk when I wrote the cipherCTR method. But I deferred investigating it and then forgot about it. I should have written it down somewhere. At the very least there should have been a TODO comment. Even better: the todo should have gone into a task manager.

Oddly, better testing would not have caught this bug. It did involve a case we failed to test (none of the unit tests of cipherCTR operated on more than a single block of data), but adding tests for that case wouldn’t have caught the bug. The tests would have just reflected the incorrect endianness.

Invest Sooner: Being able to log the entire pipeline was essential to discovering the problem. It was easy to log our half, but it took a significant time investment to make custom builds of the other app. That time investment was worth it, and being able to make custom builds will likely be essentially in the future anyways. We should have paid that cost sooner.

Discuss on Reddit

My Twitter: @CraigGidney

Comments are closed.

Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Bug Hunting #1: Garbled Audio from End to End