iOS 音視頻高級編程：Audio Unit播放FFmpeg解碼的音頻

peix9330 8年前發布 | 33K 次閱讀 FFmpeg iOS開發移動開發

本文檔描述了iOS播放經FFmpeg解碼的音頻數據的編程步驟，具體使用Audio Toolbox框架的Audio Session和Audio Unit框架提供的接口實現，在iOS 7及以上平臺Audio Session已標識為廢棄，改用AVAudioSession實現即可，編程邏輯基本保持一致。所有測試數據均來自iPhone 6、iPhone 6p真機。

1、FFmpeg解碼音頻在iOS播放的編程流程

播放音頻的開發方式和渲染視頻略有區別，音頻略微被動，系統主動回調我們指定的函數，在回調函數中，我們向系統傳遞過來的指針拷貝將要播放的音頻數據，而視頻的播放是我們主動往屏幕的幀緩沖區刷像素數據。那么，iOS上播放FFmpeg解碼后的音頻數據，比如AAC，需要如下編程步驟：

AudioSessionInitialize初始化一個iOS應用的音頻會話對象
配置Audio Session
- 配置屬性
  - kAudioSessionCategory_MediaPlayback指定為音頻播放
  - kAudioSessionProperty_PreferredHardwareIOBufferDuration配置更小的I/O遲延，通常情況不需要設置 。
- 配置屬性變化監聽器（觀察者模式的應用），非最小功能要求，可不實現。
  - kAudioSessionProperty_AudioRouteChange
  - kAudioSessionProperty_CurrentHardwareOutputVolume
- AudioSessionSetActive激活音頻會話
配置Audio Unit
- 描述輸出單元AudioComponentDescription
- 獲取組件AudioComponent
- 核對輸出流格式AudioStreamBasicDescription
- 設置音頻渲染回調結構體AURenderCallbackStruct并指定回調函數，這是真正向音頻設備提供PCM數據的地方
音頻渲染回調函數傳入未播放的音頻數據
釋放資源
FFmpeg解碼流程
音頻重采樣

下面詳細描述每個步驟的操作。

1.1、AudioSessionInitialize初始化音頻會話

AudioSessionInitialize(NULL,
    kCFRunLoopCommonModes,
    sessionInterruptionListener,
    (__bridge void *) (self)

AudioSessionInitialize指定音頻回調函數在特定的RunLoop及相應的RunLoop模式下運行及傳遞給中斷函數的用戶自定義值。回調函數的說明如下。

The interruption listener callback function. The application’s audio session object invokes the callback when the session is interrupted and (if the application is still running) when the interruption ends. Can be NULL. See AudioSessionInterruptionListener.

在調用其他音頻會話相關服務前，必須先調用此函數。

Your application must call this function before making any other Audio Session Services calls. You may activate and deactivate your audio session as needed (see AudioSessionSetActive), but should initialize it only once.

回調函數AudioSessionInterruptionListener的簽名如下：

// Invoked when an audio interruption in iOS begins or ends.
typedef void (*AudioSessionInterruptionListener)( void *inClientData, UInt32 inInterruptionState );

參數inClientData在AudioSessionInitialize中指定。
參數inInterruptionState表明中斷的狀態。

初始化完成后，可通過AudioSessionGetProperty查詢音頻會話相關的信息，比如kAudioSessionProperty_AudioRouteDescription（iOS 5以前使用kAudioSessionProperty_AudioRoute）獲取音頻輸入輸出信息，比如輸入為麥克風、轉出為揚聲器。示例代碼如下。

UInt32 propertySize = sizeof(CFStringRef);
CFStringRef route;
AudioSessionGetProperty(kAudioSessionProperty_AudioRoute,
        &propertySize,
        &route);
NSString *audioRoute = CFBridgingRelease(route);

kAudioSessionProperty_AudioRouteDescription比kAudioSessionProperty_AudioRoute輸出更多信息，如下所示。

kAudioSessionProperty_AudioRoute的輸出
```
AudioRoute: Speaker
```

kAudioSessionProperty_AudioRouteDescription的輸出

AudioRoute: 
{
 "RouteDetailedDescription_Inputs" =     (
             {
         "RouteDetailedDescription_ChannelDescriptions" =             (
                             {
                 "ChannelDescription_Name" = "iPhone Microphone";
             }
         );
         "RouteDetailedDescription_DataSources" =             (
                             {
                 DataSourceID = 1835216945;
                 DataSourceName = Bottom;
                 MicrophoneOrientation = 1651799149;
                 MicrophonePolarPattern = 1869442665;
                 MicrophonePolarPatterns =                     (
                     1869442665
                 );
                 MicrophoneRegion = 1819244402;
             },
                             {
                 DataSourceID = 1835216946;
                 DataSourceName = Front;
                 MicrophoneOrientation = 1718775412;
                 MicrophonePolarPattern = 1668441188;
                 MicrophonePolarPatterns =                     (
                     1869442665,
                     1668441188
                 );
                 MicrophoneRegion = 1970303090;
             },
                             {
                 DataSourceID = 1835216947;
                 DataSourceName = Back;
                 MicrophoneOrientation = 1650549611;
                 MicrophonePolarPattern = 1869442665;
                 MicrophonePolarPatterns =                     (
                     1869442665,
                     1935827812
                 );
                 MicrophoneRegion = 1970303090;
             }
         );
         "RouteDetailedDescription_HiddenDataSources" =             (
                             {
                 DataSourceID = 1634495520;
                 DataSourceName = All;
             }
         );
         "RouteDetailedDescription_ID" = 344;
         "RouteDetailedDescription_IsHeadphones" = 0;
         "RouteDetailedDescription_Name" = "iPhone Microphone";
         "RouteDetailedDescription_NumberOfChannels" = 1;
         "RouteDetailedDescription_PortType" = MicrophoneBuiltIn;
         "RouteDetailedDescription_SelectedDataSource" = 1835216946;
         "RouteDetailedDescription_UID" = "Built-In Microphone";
     }
 );
 "RouteDetailedDescription_Outputs" =     (
             {
         "RouteDetailedDescription_ChannelDescriptions" =             (
                             {
                 "ChannelDescription_Label" = "-1";
                 "ChannelDescription_Name" = Speaker;
             }
         );
         "RouteDetailedDescription_ID" = 345;
         "RouteDetailedDescription_IsHeadphones" = 0;
         "RouteDetailedDescription_Name" = Speaker;
         "RouteDetailedDescription_NumberOfChannels" = 1;
         "RouteDetailedDescription_PortType" = Speaker;
         "RouteDetailedDescription_UID" = Speaker;
     }
 );
}

1.2、配置Audio Session

音頻會話的配置由屬性及屬性值變化監聽器組成，監聽器可按需實現。

1.2.1、配置屬性

音頻會話為必備步驟，I/O緩沖區可使用默認值。

1.2.1.1、音頻回放

播放音樂的場景需設置kAudioSessionProperty_AudioCategory為kAudioSessionCategory_MediaPlayback，其他值可表示音頻處理、錄音等，所有枚舉值如下所示。

/*!
 @enum           AudioSession audio categories states
 @abstract       These are used with as values for the kAudioSessionProperty_AudioCategory property
 to indicate the audio category of the AudioSession.
 @constant       kAudioSessionCategory_AmbientSound
 Use this category for background sounds such as rain, car engine noise, etc.
 Mixes with other music.
 @constant       kAudioSessionCategory_SoloAmbientSound
 Use this category for background sounds.  Other music will stop playing.
 @constant       kAudioSessionCategory_MediaPlayback
 Use this category for music tracks.
 @constant       kAudioSessionCategory_RecordAudio
 Use this category when recording audio.
 @constant       kAudioSessionCategory_PlayAndRecord
 Use this category when recording and playing back audio.
 @constant       kAudioSessionCategory_AudioProcessing
 Use this category when using a hardware codec or signal processor while
 not playing or recording audio.
 */
enum {
    kAudioSessionCategory_AmbientSound               = 'ambi',
    kAudioSessionCategory_SoloAmbientSound           = 'solo',
    kAudioSessionCategory_MediaPlayback              = 'medi',
    kAudioSessionCategory_RecordAudio                = 'reca',
    kAudioSessionCategory_PlayAndRecord              = 'plar',
    kAudioSessionCategory_AudioProcessing            = 'proc'
};

1.2.1.2、配置硬件I/O緩沖區

當需要更小的I/O緩沖區時設置本屬性，指定更小的緩沖區可讓音頻延時變小，但是占用更多CPU資源。然而，設置的值可能不被系統所采用，可由kAudioSessionProperty_CurrentHardwareIOBufferDuration查詢設置后的實際值。文檔說明如下。

Your preferred hardware I/O buffer duration in seconds. Do not set this property unless you require lower I/O latency than is provided by default.

A read/write Float32 value.

The actual I/O buffer duration may be different from the value that you request, and can be obtained from the kAudioSessionProperty_CurrentHardwareIOBufferDuration property.

Set the buffer size, this will affect the number of samples that get rendered every time the audio callback is fired

A small number will get you lower latency audio, but will make your processor work harder

參考代碼：

Float32 preferredBufferSize = 0.0232;
AudioSessionSetProperty(kAudioSessionProperty_PreferredHardwareIOBufferDuration,
        sizeof(preferredBufferSize),
        &preferredBufferSize);

0.0232表示23毫秒。采樣率44100、1024個采樣點，那么，一包的時長就是

1024 / 44100 = 0.0232

@暴走大牙在實踐中發現，在Android上采集音頻，3x23毫秒效果更佳。

@二流提供了另一種理解：

44100 / 1024 = 1秒 43包 
1 / 43 = 0.023

1.2.2、配置屬性變化監聽器

1.2.2.1、音頻輸出變化

注冊kAudioSessionProperty_AudioRouteChange可讓我們知道音頻輸出變化，比如插入耳機，文檔說明如下。

A CFDictionaryRef object containing the reason the audio route changed along with details on the previous and current audio route.

The dictionary contains the keys and corresponding values described in Audio Route Change Dictionary Keys.

The kAudioSessionProperty_AudioRouteChange dictionary is available to your app only by way of the AudioSessionPropertyListener callback function.

示例代碼：

AudioSessionAddPropertyListener(kAudioSessionProperty_AudioRouteChange,
            sessionPropertyListener,
            (__bridge void *) (self)

1.2.2.2、輸出音量變化

注冊kAudioSessionProperty_CurrentHardwareOutputVolume可讓我們知道輸出音量出現變化，比如用戶增大了音量，文檔說明如下。

Indicates the current audio output volume as Float32 value between 0.0 and 1.0. Read-only. This value is available to your app by way of a property listener callback function. See AudioSessionAddPropertyListener.

示例代碼：

AudioSessionAddPropertyListener(kAudioSessionProperty_CurrentHardwareOutputVolume,
            sessionPropertyListener,
            (__bridge void *) (self))

1.2.3、激活音頻會話

傳遞true則AudioSessionSetActive激活音頻會話，反之則禁用它，可進行多次啟、禁用操作。

Activating your audio session may interrupt audio sessions belonging to other applications running in the background, depending on categories and priorities. Deactivating your audio session allows other, interrupted audio sessions to resume.

When another active audio session does not allow mixing, attempting to activate your audio session may fail.

When active is true this call may fail if the currently active AudioSession has a higher priority.

AudioSessionSetActive(YES)

1.3、配置Audio Unit

Audio Unit才是真正進行音頻輸出的執行者。

1.3.1、描述輸出單元

AudioComponentDescription用于描述音頻組件的唯一性和標識符，擁有這些字段：

componentType: OSType，用唯一的4字節碼標識了音頻組件的通用類型。
componentSubType: OSType，表示此音頻組件描述的具體類型。
componentManufacturer: OSType，廠家標識符，只能設置為蘋果公司。
componentFlags: OSType，必須設置為0，除非已知請求的具體值。
componentFlagsMask: OSType，必須設置為0，除非已知請求的具體值。

componentType可設置為如下值：

kAudioUnitType_Output

An output unit provides input, output, or both input and output simultaneously. It can be used as the head of an audio unit processing graph.
kAudioUnitType_MusicDevice

An instrument unit can be used as a software musical instrument, such as a sampler or synthesizer. It responds to MIDI (Musical Instrument Digital Interface) control signals and can create notes.
kAudioUnitType_MusicEffect

An effect unit that can respond to MIDI control messages, typically through a mapping of MIDI messages to parameters of the audio unit’s DSP algorithm.
kAudioUnitType_FormatConverter

A format converter unit can transform audio formats, such as performing sample rate conversion. A format converter is also appropriate for deferred rendering and for effects such as varispeed. A format converter unit can ask for as much or as little audio input as it needs to produce a given output, while still completing its rendering within the time represented by the output buffer. For effect-like format converters, such as pitch shifters, it is common to provide both a realtime and an offline version. OS X, for example, includes Time-Pitch and Varispeed audio units in both realtime and offline versions.
kAudioUnitType_Effect

An effect unit repeatedly processes a number of audio input samples to produce the same number of audio output samples. Most commonly, an effect unit has a single input and a single output. Some effects take side-chain inputs as well. Effect units can be run offline, such as to process a file without playing it, but are expected to run in realtime.
kAudioUnitType_Mixer

A mixer unit takes a number of input channels and mixes them to provide one or more output channels. For example, the kAudioUnitSubType_StereoMixer audio unit in OS X takes multiple mono or stereo inputs and produce a single stereo output.
kAudioUnitType_Panner

A panner unit is a specialized effect unit that distributes one or more channels in a single input to one or more channels in a single output. Panner units must support a set of standard audio unit parameters that specify panning coordinates.
kAudioUnitType_OfflineEffect

An offline effect unit provides digital signal processing of a sort that cannot proceed in realtime. For example, level normalization requires examination of an entire sound, beginning to end, before the normalization factor can be calculated. As such, offline effect units also have a notion of a priming stage that can be performed before the actual rendering/processing phase is executed.
kAudioUnitType_Generator

A generator unit provides audio output but has no audio input. This audio unit type is appropriate for a tone generator. Unlike an instrument unit, a generator unit does not have a control input.

componentSubType可設置為如下值：

kAudioUnitSubType_GenericOutput

An audio unit that responds to start/stop calls and provides basic services for converting to and from linear PCM formats.
kAudioUnitSubType_RemoteIO

An audio unit that interfaces to the audio inputs and outputs of iPhone OS devices. Bus 0 provides output to hardware and bus 1 accepts input from hardware. Called an I/O audio unit or sometimes a Remote I/O audio unit.
kAudioUnitSubType_VoiceProcessingIO

An audio unit that interfaces to the audio inputs and outputs of iPhone OS devices and provides voice processing features. Bus 0 provides output to hardware and bus 1 accepts input from hardware. See the Voice-Processing I/O Audio Unit Properties enumeration for the identifiers for this audio unit’s properties.

示例代碼如下。

AudioComponentDescription description = {0};
description.componentType = kAudioUnitType_Output;
description.componentSubType = kAudioUnitSubType_RemoteIO;
description.componentManufacturer = kAudioUnitManufacturer_Apple;

1.3.2、獲取組件

現在，由前面配置的AudioComponentDescription查找系統的音頻處理插件鏈表是否存在對應的結果，存在才可處理我們將要提供的音頻數據。在可處理的情況下返回AudioComponent，根據此音頻組件創建一個音頻組件實例。

AudioComponent component = AudioComponentFindNext(NULL, &description);
AudioComponentInstanceNew(component, &_audioUnit);

AudioComponentFindNext的功能描述如下：

Finds the next component that matches a specified AudioComponentDescription structure after a specified audio component.

1.3.2、核對輸出流格式

這里主要為了設置AudioStreamBasicDescription成當前設置的采樣率。

AudioStreamBasicDescription _outputFormat;
UInt32 size = sizeof(AudioStreamBasicDescription);
AudioUnitGetProperty(_audioUnit,
        kAudioUnitProperty_StreamFormat,
        kAudioUnitScope_Input,
        0,
        &_outputFormat,
        &size);

_outputFormat.mSampleRate = _samplingRate;
AudioUnitSetProperty(_audioUnit,
        kAudioUnitProperty_StreamFormat,
        kAudioUnitScope_Input,
        0,
        &_outputFormat,
        size);

UInt32 _numBytesPerSample = _outputFormat.mBitsPerChannel / 8;
UInt32 _numOutputChannels = _outputFormat.mChannelsPerFrame;

_samplingRate為設備當前的采樣率，查詢代碼如下：

AudioSessionGetProperty(kAudioSessionProperty_CurrentHardwareSampleRate,
            &size,
            &_samplingRate)

有關AudioStreamBasicDescription各字段的說明如下：

An audio data format specification for a stream of audio.

Fields

mSampleRate

The number of frames per second of the data in the stream, when the stream is played at normal speed. For compressed formats, this field indicates the number of frames per second of equivalent decompressed data.

The mSampleRate field must be nonzero, except when this structure is used in a listing of supported formats (see “kAudioStreamAnyRate”).

mFormatID

An identifier specifying the general audio data format in the stream. See “Audio Data Format Identifiers”. This value must be nonzero.

mFormatFlags

Format-specific flags to specify details of the format. Set to 0 to indicate no format flags. See “Audio Data Format Identifiers” for the flags that apply to each format.

mBytesPerPacket

The number of bytes in a packet of audio data. To indicate variable packet size, set this field to 0. For a format that uses variable packet size, specify the size of each packet using an AudioStreamPacketDescription structure.

mFramesPerPacket

The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.

mBytesPerFrame

The number of bytes from the start of one frame to the start of the next frame in an audio buffer. Set this field to 0 for compressed formats.

For an audio buffer containing interleaved data for n channels, with each sample of type AudioSampleType, calculate the value for this field as follows:

mBytesPerFrame = n * sizeof (AudioSampleType);

For an audio buffer containing noninterleaved (monophonic) data, also using AudioSampleType samples, calculate the value for this field as follows:

mBytesPerFrame = sizeof (AudioSampleType);

mChannelsPerFrame

The number of channels in each frame of audio data. This value must be nonzero.

mBitsPerChannel

The number of bits for one audio sample. For example, for linear PCM audio using the kAudioFormatFlagsCanonical format flags, calculate the value for this field as follows:

mBitsPerChannel = 8 * sizeof (AudioSampleType);

Set this field to 0 for compressed formats.

mReserved

Pads the structure out to force an even 8-byte alignment. Must be set to 0.

You can configure an audio stream basic description (ASBD) to specify a linear PCM format or a constant bit rate (CBR) format that has channels of equal size. For variable bit rate (VBR) audio, and for CBR audio where the channels have unequal sizes, each packet must additionally be described by an AudioStreamPacketDescription structure.

A field value of 0 indicates that the value is either unknown or not applicable to the format.

Always initialize the fields of a new audio stream basic description structure to zero, as shown here:

AudioStreamBasicDescription myAudioDataFormat = {0};

To determine the duration represented by one packet, use the mSampleRate field with the mFramesPerPacket field, as follows:

duration = (1 / mSampleRate) * mFramesPerPacket

In Core Audio, the following definitions apply:

An audio stream is a continuous series of data that represents a sound, such as a song.
A channel is a discrete track of monophonic audio. A monophonic stream has one channel; a stereo stream has two channels.
A sample is single numerical value for a single audio channel in an audio stream.
A frame is a collection of time-coincident samples. For instance, a linear PCM stereo sound file has two samples per frame, one for the left channel and one for the right channel.
A packet is a collection of one or more contiguous frames. A packet defines the smallest meaningful set of frames for a given audio data format, and is the smallest data unit for which time can be measured. In linear PCM audio, a packet holds a single frame. In compressed formats, it typically holds more; in some formats, the number of frames per packet varies.
The sample rate for a stream is the number of frames per second of uncompressed (or, for compressed formats, the equivalent in decompressed) audio.

1.3.3、設置音頻渲染回調函數

AURenderCallbackStruct callbackStruct;
callbackStruct.inputProc = renderCallback;
callbackStruct.inputProcRefCon = (__bridge void *) (self);

AudioUnitSetProperty(_audioUnit,
        kAudioUnitProperty_SetRenderCallback,
        kAudioUnitScope_Input,
        0,
        &callbackStruct,
        sizeof(callbackStruct));

1.4、音頻渲染回調函數傳入未播放的音頻數據

回調函數會傳遞音頻緩沖區列表AudioBufferList給我們，在此先對此進行重置操作。

for (int iBuffer = 0; iBuffer < ioData->mNumberBuffers; ++iBuffer) {
    memset(ioData->mBuffers[iBuffer].mData, 0, ioData->mBuffers[iBuffer].mDataByteSize);
}

然后逐個傳遞已解碼的音頻包給回調函數的AudioBufferList參數。

1.5、釋放資源

結束音頻輸出時，要先停止再逆初始化，流程如下所示。

AudioOutputUnitStop(_audioUnit)
AudioUnitInitialize(_audioUnit);
AudioComponentInstanceDispose(_audioUnit);
AudioSessionSetActive(NO);
AudioSessionRemovePropertyListenerWithUserData(kAudioSessionProperty_AudioRouteChange,
                sessionPropertyListener,
                (__bridge void *) (self));
AudioSessionRemovePropertyListenerWithUserData(kAudioSessionProperty_CurrentHardwareOutputVolume,
                sessionPropertyListener,
                (__bridge void *) (self));

1.6、FFmpeg解碼流程

以FFmpeg 3.0為例，每次循環讀取音頻包數據，解碼時可能有剩余數據，故需要判斷是否解碼完整，否則繼續解碼當前音頻包的剩余數據，接著進行音頻重采樣，示例代碼如下：

av_register_all();
printf("%s Using FFmpeg: %s\n", __FUNCTION__, av_version_info());

AVFormatContext *context = avformat_alloc_context();
int ret;

NSString *path = [[NSBundle mainBundle] pathForResource:@"Forrest_Gump_IMAX.mp4" ofType:nil];
const char *url = path.UTF8String;
avformat_open_input(&context, url, NULL, NULL);
avformat_find_stream_info(context, NULL);
av_dump_format(context, 0, url, 0);
int audioStreamIndex = -1;
for (int i = 0; i < context->nb_streams; ++i) {
    if (AVMEDIA_TYPE_AUDIO == context->streams[i]->codec->codec_type) {
        audioStreamIndex = i;
        break;
    }
}
if (-1 == audioStreamIndex) {
    printf("%s audio stream not found.\n", __FUNCTION__);
    exit(-1);
}
AVStream *audioStream = context->streams[audioStreamIndex];
AVCodec *audioCodec = avcodec_find_decoder(context->audio_codec_id);
avcodec_open2(audioStream->codec, audioCodec, NULL);

AVPacket packet, *pkt = &packet;
AVFrame *audioFrame = av_frame_alloc();
int gotFrame = 0;

while (0 == av_read_frame(context, pkt)) {
    if (audioStreamIndex == pkt->stream_index) {
        // 循環解碼，直到當前包無剩余數據
                avcodec_decode_audio4(audioStream->codec, audioFrame, &gotFrame, pkt);
                if (gotFrame)
                        // 進行音頻重采樣
    }
    av_packet_unref(pkt);
}

1.7、音頻重采樣

根據之前看過的雷霄驊博士的博客，目前，FFmpeg 3.0 avcodec_decode_audio4函數解碼出來的音頻數據是單精度浮點類型，值范圍為[0, 1.0]。iOS可播放Float類型的音頻數據，范圍和FFmpeg解碼出來的PCM不同，故需要進行重采樣。

const int bufSize = av_samples_get_buffer_size(NULL,
        _audioCodecCtx->channels,
        _audioFrame->nb_samples,
        _audioCodecCtx->sample_fmt,
        1);
const NSUInteger sizeOfS16 = 2;
const NSUInteger numChannels = _audioCodecCtx->channels;
int numFrames = bufSize / (sizeOfS16 * numChannels);

SInt16 *s16p = (SInt16 *) _audioFrame->data[0];

if (_swrContext) {
    if (!_swrBuffer || _swrBufferSize < (bufSize * 2)) {
        _swrBufferSize = bufSize * 2;
        _swrBuffer = realloc(_swrBuffer, _swrBufferSize);
    }

    Byte *outbuf[2] = {_swrBuffer, 0};

    numFrames = swr_convert(_swrContext,
            outbuf,
            numFrames * 2,
            (const uint8_t **) _audioFrame->data,
            numFrames);

    if (numFrames < 0) {
        NSLog(@"fail resample audio");
        return nil;
    }

    s16p = _swrBuffer;
}

const NSUInteger numElements = numFrames * numChannels;
NSMutableData *data = [NSMutableData dataWithLength:numElements * sizeof(float)];
vDSP_vflt16(s16p, 1, data.mutableBytes, 1, numElements);
float scale = 1.0 / (float) INT16_MAX;
vDSP_vsmul(data.mutableBytes, 1, &scale, data.mutableBytes, 1, numElements);

_swrContext為重采樣上下文，將給定的音頻源的聲道、聲道布局和采樣率轉換為輸出設備的聲道、聲道布局和采樣率，具體轉換由swr_convert函數完成。然而，上述代碼存在重采樣錯誤，主要體現中播放flt16類型的音頻時，計算出來的numFrames值比AVFrame.nb_samples值大，比如，對于AAC編碼的立體聲音頻，numFrames等于2048，而nb_samples只有1024，這導致播放時出現失真現象，修復代碼在文檔后續部分給出。_swrContext的初始化代碼如下所示。

_swrContext = swr_alloc_set_opts(NULL,
        av_get_default_channel_layout(hw.numOutputChannels),
        AV_SAMPLE_FMT_S16,
        hw.samplingRate,
        av_get_default_channel_layout(audioCodecCtx->channels),
        audioCodecCtx->sample_fmt,
        audioCodecCtx->sample_rate,
        0,
        NULL);

根據FFmpeg注釋，下面代碼完成了平面浮點數采樣格式至交錯的16位帶符號整數、從48kHz至44.1kHz的降采樣與5.1聲道至立體聲的降混合的轉換，當然最終轉換得調用swr_convert函數。

SwrContext *swr = swr_alloc();
av_opt_set_channel_layout(swr, "in_channel_layout",  AV_CH_LAYOUT_5POINT1, 0);
av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO,  0);
av_opt_set_int(swr, "in_sample_rate",     48000,                0);
av_opt_set_int(swr, "out_sample_rate",    44100,                0);
av_opt_set_sample_fmt(swr, "in_sample_fmt",  AV_SAMPLE_FMT_FLTP, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16,  0);

等效代碼為

SwrContext *swr = swr_alloc_set_opts(NULL,  // we're allocating a new context
                      AV_CH_LAYOUT_STEREO,  // out_ch_layout
                      AV_SAMPLE_FMT_S16,    // out_sample_fmt
                      44100,                // out_sample_rate
                      AV_CH_LAYOUT_5POINT1, // in_ch_layout
                      AV_SAMPLE_FMT_FLTP,   // in_sample_fmt
                      48000,                // in_sample_rate
                      0,                    // log_offset
                      NULL);                // log_ctx

1.8、Accelerate框架在重采樣過程的應用

音頻重采樣后，還調用了幾個vDSP_函數，如下所示。這些函數都是Accelerate框架的成員，它提供了音頻、信號處理、圖像處理等應用需要的函數。

vDSP_vflt16(s16p, 1, data.mutableBytes, 1, numElements);
float scale = 1.0 / (float) INT16_MAX;
vDSP_vsmul(data.mutableBytes, 1, &scale, data.mutableBytes, 1, numElements);

vDSP_vflt16將非交錯的16位帶符號整數（non-interleaved 16-bit signed integers）轉換成單精度浮點數。 為什么是16位帶符號整數？ 原因是，這取決于AudioStreamBasicDescription.mBitsPerChannel字段的值。當AudioStreamBasicDescription.mBitsPerChannel為16時，則調用vDSP_vflt16。當AudioStreamBasicDescription.mBitsPerChannel為32時，則調用vDSP_vflt32。

2、程序運行存在的問題

基于前面的實現，進行一些測試。

2.1、MP3播放正常

播放正常，信息如下。

Input #0, mp3, from '1A Hero s Sacrifice.mp3':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    encoder         : Lavf55.19.100
  Duration: 00:07:05.09, start: 0.025057, bitrate: 128 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p, 128 kb/s

執行日志如下。

AudioRoute: Speaker
We've got 1 output channels
Current sampling rate: 44100.000000
Current output volume: 0.687500
Current output bytes per sample: 4
Current output num channels: 2
audio codec smr: 44100 fmt: 6 chn: 2 tb: 0.000000 resample
audio device smr: 44100 fmt: 4 chn: 2

2.2、播放MP4中音頻存在失真現象

簡單起見，只播放MP4中音頻，有明顯失真現象，文件信息如下。

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Forrest_Gump_IMAX.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf56.19.100
  Duration: 00:00:31.21, start: 0.036281, bitrate: 878 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 640x352, 748 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

執行日志。

AudioRoute: Speaker
We've got 1 output channels
Current sampling rate: 44100.000000
Current output volume: 0.687500
Current output bytes per sample: 4
Current output num channels: 2
audio codec smr: 44100 fmt: 8 chn: 2 tb: 0.000023 resample
audio device smr: 44100 fmt: 4 chn: 2

不同于播放MP3文件的是，音頻軌的編碼信息。設備相關信息并不發生變化。下面做些小測試，失真問題基本不會由它們引起，保險起見，還是實測為準：

替換kCFRunLoopDefaultMode為kCFRunLoopCommonModes，依然使用主運行循環，不進行任何屏幕操作，失真現象一樣。
非主運行循環與kCFRunLoopCommonModes。 失真現象一樣

排除CPU資源占用問題后，那么就剩一種情況了， 重采樣時計算有誤 。下面修復 - [KxMovieDecoder handAudio] 方法，此時播放fltp數據類型的音頻不再失真，同時播放S16P類型的MP3也無失真現象。

- (KxAudioFrame *) handleAudioFrame {
// ...
        numFrames = swr_convert(_swrContext,
                                outbuf,
                                _audioFrame->nb_samples * 2,
                                (const uint8_t **)_audioFrame->data,
                                _audioFrame->nb_samples);
// ...

來自：http://www.jianshu.com/p/0d5315bb81ee

本文由用戶 peix9330 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/lib/view/open1473149087820.html

FFmpeg iOS開發移動開發