音视频学习之 - H264编码 - JOYK Joy of Geek, Geek News, Link all geek

有了前面[ 音视频学习之 - 基础概念和[ 音视频学习之 - H264结构与码流解析的基础，这篇文章开始写代码，前面根据 AVFoundation 框架做的采集工作流程就不写了，直接从采集的代理方法**captureOutput: didOutputSampleBuffer: fromConnection:**里开始对视频帧就行编码。大致的流程分为三步：

准备编码器，即创建session： VTCompressionSessionCreate ，并设置编码器属性；
开始编码： VTCompressionSessionEncodeFrame
编码完成的回调里处理数据：添加起始码**"\x00\x00\x00\x01" ，添加 sps pps**等。
结束编码，清除数据，释放资源。

准备编码器

创建session ： VTCompressionSessionCreate
设置属性 ：VTSessionSetProperty 是否实时编码输出、是否产生B帧、设置关键帧、设置期望帧率、设置码率、最大码率值等等
准备开始编码 ：VTCompressionSessionPrepareToEncodeFrames

-(void)initVideoToolBox
{
    // cEncodeQueue是一个串行队列
    dispatch_sync(cEncodeQueue, ^{

        frameID = 0;
        int width = 480,height = 640;
        
        //创建编码session
        OSStatus status = VTCompressionSessionCreate(NULL, width, height, kCMVideoCodecType_H264, NULL, NULL, NULL, didCompressH264, (__bridge void *)(self), &cEncodeingSession);
        NSLog(@"H264:VTCompressionSessionCreate:%d",(int)status);
        
        if (status != 0) {
            NSLog(@"H264:Unable to create a H264 session");
            return ;
        }
        
        //设置实时编码输出（避免延迟）
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_ProfileLevel,kVTProfileLevel_H264_Baseline_AutoLevel);
        
        //是否产生B帧(因为B帧在解码时并不是必要的,是可以抛弃B帧的)
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_AllowFrameReordering, kCFBooleanFalse);
        
        //设置关键帧（GOPsize）间隔，GOP太小的话图像会模糊
        int frameInterval = 10;
        CFNumberRef frameIntervalRaf = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &frameInterval);
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_MaxKeyFrameInterval, frameIntervalRaf);
        
        //设置期望帧率，不是实际帧率
        int fps = 10;
        CFNumberRef fpsRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberIntType, &fps);
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_ExpectedFrameRate, fpsRef);
        
        //码率的理解：码率大了话就会非常清晰，但同时文件也会比较大。码率小的话，图像有时会模糊，但也勉强能看
        //码率计算公式，参考印象笔记
        //设置码率、上限、单位是bps
        int bitRate = width * height * 3 * 4 * 8;
        CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRate);
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_AverageBitRate, bitRateRef);
        
        //设置码率，均值，单位是byte
        int bigRateLimit = width * height * 3 * 4;
        CFNumberRef bitRateLimitRef = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bigRateLimit);
        VTSessionSetProperty(cEncodeingSession, kVTCompressionPropertyKey_DataRateLimits, bitRateLimitRef);
        
        //准备开始编码
        VTCompressionSessionPrepareToEncodeFrames(cEncodeingSession);

    });
    
}
复制代码

VTCompressionSessionCreate创建编码对象参数详解：

allocator ：NULL 分配器,设置NULL为默认分配
width ：width
height ：height
codecType ：编码类型,如kCMVideoCodecType_H264
encoderSpecification ：NULL encoderSpecification: 编码规范。设置NULL由videoToolbox自己选择
sourceImageBufferAttributes ：NULL sourceImageBufferAttributes: 源像素缓冲区属性.设置NULL不让videToolbox创建,而自己创建
compressedDataAllocator ：压缩数据分配器.设置NULL,默认的分配
outputCallback ：编码回调，当VTCompressionSessionEncodeFrame被调用压缩一次后会被异步调用.这里设置的函数名是 didCompressH264
outputCallbackRefCon ：回调客户定义的参考值，此处把self传过去，因为我们需要在C函数中调用self的方法，而C函数无法直接调self
compressionSessionOut ：编码会话变量

开始编码

拿到未编码的视频帧 ： CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);
设置帧时间 ：CMTime presentationTimeStamp = CMTimeMake(frameID++, 1000);
开始编码 ：调用 VTCompressionSessionEncodeFrame进行编码

- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    //开始视频录制，获取到摄像头的视频帧，传入encode 方法中
    dispatch_sync(cEncodeQueue, ^{
        [self encode:sampleBuffer];
    });
}
复制代码

- (void) encode:(CMSampleBufferRef )sampleBuffer
{
  //拿到每一帧未编码数据
  CVImageBufferRef imageBuffer = (CVImageBufferRef)CMSampleBufferGetImageBuffer(sampleBuffer);

  //设置帧时间
  CMTime presentationTimeStamp = CMTimeMake(frameID++, 1000);

  //开始编码 
  OSStatus statusCode = VTCompressionSessionEncodeFrame(cEncodeingSession, imageBuffer, presentationTimeStamp, kCMTimeInvalid, NULL, NULL, &flags);

  if (statusCode != noErr) {
        //编码失败
        NSLog(@"H.264:VTCompressionSessionEncodeFrame faild with %d",(int)statusCode);
        
        //释放资源
        VTCompressionSessionInvalidate(cEncodeingSession);
        CFRelease(cEncodeingSession);
        cEncodeingSession = NULL;
        return;
    }
}

复制代码

VTCompressionSessionEncodeFrame编码函数参数详解：

session ：编码会话变量
imageBuffer ：未编码的数据
presentationTimeStamp ：获取到的这个sample buffer数据的展示时间戳。每一个传给这个session的时间戳都要大于前一个展示时间戳
duration ：对于获取到sample buffer数据,这个帧的展示时间.如果没有时间信息,可设置kCMTimeInvalid.
frameProperties ：包含这个帧的属性.帧的改变会影响后边的编码帧.
sourceFrameRefcon ：回调函数会引用你设置的这个帧的参考值.
infoFlagsOut ：指向一个VTEncodeInfoFlags来接受一个编码操作.如果使用异步运行,kVTEncodeInfo_Asynchronous被设置；同步运行,kVTEncodeInfo_FrameDropped被设置；设置NULL为不想接受这个信息.

编码完成后数据处理

判断是否是关键帧 ：是的话， CMVideoFormatDescriptionGetH264ParameterSetAtIndex 获取sps和pps信息，并转换为二进制写入文件或者进行上传
组装NALU数据 ：获取编码后的h264流数据：CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer)，通过首地址、单个长度、总长度通过dataPointer指针偏移做遍历 OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer); 读取数据时有个大小端模式：网络传输一般都是大端模式

/*
    1.H264硬编码完成后，回调VTCompressionOutputCallback
    2.将硬编码成功的CMSampleBuffer转换成H264码流，通过网络传播
    3.解析出参数集SPS & PPS，加上开始码组装成 NALU。提现出视频数据，将长度码转换为开始码，组成NALU，将NALU发送出去。
 */
void didCompressH264(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags, CMSampleBufferRef sampleBuffer)
{
    NSLog(@"didCompressH264 called with status %d infoFlags %d",(int)status,(int)infoFlags);
    //状态错误
    if (status != 0) {
        return;
    }
    
    //没准备好
    if (!CMSampleBufferDataIsReady(sampleBuffer)) {
        NSLog(@"didCompressH264 data is not ready");
        return;
    }
    
    ViewController *encoder = (__bridge ViewController *)outputCallbackRefCon;
    
    //判断当前帧是否为关键帧
    CFArrayRef array = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, true);
    CFDictionaryRef dic = CFArrayGetValueAtIndex(array, 0);
    bool keyFrame = !CFDictionaryContainsKey(dic, kCMSampleAttachmentKey_NotSync);
    
    //判断当前帧是否为关键帧
    //获取sps & pps 数据 只获取1次，保存在h264文件开头的第一帧中
    //sps(sample per second 采样次数/s),是衡量模数转换（ADC）时采样速率的单位
    //pps()
    if (keyFrame) {
        //图像存储方式，编码器等格式描述
        CMFormatDescriptionRef format = CMSampleBufferGetFormatDescription(sampleBuffer);
        
        //sps
        size_t sparameterSetSize,sparameterSetCount;
        const uint8_t *sparameterSet;
        OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 0, &sparameterSet, &sparameterSetSize, &sparameterSetCount, 0);
        
        if (statusCode == noErr) {
            
            //获取pps
            size_t pparameterSetSize,pparameterSetCount;
            const uint8_t *pparameterSet;
            
            //从第一个关键帧获取sps & pps
            OSStatus statusCode = CMVideoFormatDescriptionGetH264ParameterSetAtIndex(format, 1, &pparameterSet, &pparameterSetSize, &pparameterSetCount, 0);
            
            //获取H264参数集合中的SPS和PPS
            if (statusCode == noErr)
            {
                NSData *sps = [NSData dataWithBytes:sparameterSet length:sparameterSetSize];
                NSData *pps = [NSData dataWithBytes:pparameterSet length:pparameterSetSize];
                
                if(encoder)
                {
                    [encoder gotSpsPps:sps pps:pps];
                }
            }
        }
    }
    
    CMBlockBufferRef dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer);
    size_t length,totalLength;
    char *dataPointer;
    OSStatus statusCodeRet = CMBlockBufferGetDataPointer(dataBuffer, 0, &length, &totalLength, &dataPointer);
    if (statusCodeRet == noErr) {
        size_t bufferOffset = 0;
        static const int AVCCHeaderLength = 4;//返回的nalu数据前4个字节不是001的startcode,而是大端模式的帧长度length
        
        //循环获取nalu数据
        while (bufferOffset < totalLength - AVCCHeaderLength) {
            
            uint32_t NALUnitLength = 0;
            
            //读取 一单元长度的 nalu
            memcpy(&NALUnitLength, dataPointer + bufferOffset, AVCCHeaderLength);
            
            //从大端模式转换为系统端模式
            NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);
            
            //获取nalu数据
            NSData *data = [[NSData alloc]initWithBytes:(dataPointer + bufferOffset + AVCCHeaderLength) length:NALUnitLength];
            
            //将nalu数据写入到文件
            [encoder gotEncodedData:data isKeyFrame:keyFrame];
            
            //move to the next NAL unit in the block buffer
            //读取下一个nalu 一次回调可能包含多个nalu数据
            bufferOffset += AVCCHeaderLength + NALUnitLength;
        }
    }
}

//第一帧写入 sps & pps
- (void)gotSpsPps:(NSData*)sps pps:(NSData*)pps
{
    const char bytes[] = "\x00\x00\x00\x01";
    
    size_t length = (sizeof bytes) - 1;    // 最后一位是\0结束符
    
    NSData *ByteHeader = [NSData dataWithBytes:bytes length:length];
    
    [fileHandele writeData:ByteHeader];
    [fileHandele writeData:sps];
    [fileHandele writeData:ByteHeader];
    [fileHandele writeData:pps];
}

- (void)gotEncodedData:(NSData*)data isKeyFrame:(BOOL)isKeyFrame
{
    if (fileHandele != NULL) {
        //添加4个字节的H264 协议 start code 分割符
        //一般来说编码器编出的首帧数据为PPS & SPS
        //H264编码时，在每个NAL前添加起始码 0x00000001,解码器在码流中检测起始码，当前NAL结束。
        const char bytes[] ="\x00\x00\x00\x01";
        //长度
        size_t length = (sizeof bytes) - 1;
        
        //头字节
        NSData *ByteHeader = [NSData dataWithBytes:bytes length:length];
        //写入头字节
        [fileHandele writeData:ByteHeader];
        
        //写入H264数据
        [fileHandele writeData:data];
    }
}
复制代码

结束编码

-(void)endVideoToolBox
{
    VTCompressionSessionCompleteFrames(cEncodeingSession, kCMTimeInvalid);
    VTCompressionSessionInvalidate(cEncodeingSession);
    CFRelease(cEncodeingSession);
    cEncodeingSession = NULL;  
}
复制代码

音视频学习之 - H264编码

准备编码器

开始编码

编码完成后数据处理

结束编码

Recommend

RxSwift-dispose源码解析

RxSwift-KVO底层探索（上）

5 Bad Habits of Absolutely Ineffective Programmers.

Deploying Applications to IBM Cloud Kubernetes via Tekton

[译] 深度剖析 Go 中的 Go 协程 (goroutines)

面试官：在使用mysql数据库时，遇到重复数据怎么处理？

深睿医疗5篇论文被MICCAI2019收录,展示在医疗AI领域的卓越创新能力

40万亿像素，谷歌AI自动绘制果蝇的大脑图谱 - 知乎

南美首个世界级技术框架开源旨在让开发者编程更快乐

Serverless For Frontend 前世今生 - 知乎

About Joyk