none
How to treat the audio device position from an IAudioCaptureClient::GetBuffer call when the client uses resampling ( AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM ) ? RRS feed

  • Question

  • Hi

    I would like to know how to synchronize audio when using a capture client and a stream switch occurs.

    If the client was initialized with the AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM flag, and if our target format has a different sample rate than the engine's current mix format, then the device position will advance with a wrong periodicity.

    Say, the sample rate of the engine's current format for the device is 48000, and our target/resampling format is 44100, then the device position will of course advance in 480 frames steps, while the buffers we get handed contain either 441 or 448 frames ( depending on the device ).

    How can we synchronize audio when resampling is included in the capture ?

    I think that we should dismiss the pu64DevicePosition returned from GetBuffer and track the position ourself. We could increase the stream position with the number of samples we get from every succeeded GetBuffer call. The pu64QPCPosition paramter can be used as usual though, because the time stamp of the first sample in each packet remains the same.

    Is that correct, or am i completely wrong ?

    Regards,

    Francis


    Friday, June 28, 2019 4:44 PM

Answers

  • If I understand you correctly:

    • The mix format is 48 kHz
    • You're initializing with a client format of 44.1 kHz and specifying AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM
    • The device position is advancing at a rate of 48 kHz rather than 44.1 kHz?

    ...


    Matthew van Eerde

    Precisely.

    Somehow i knew you would call it a bug. I would not call it that though, because it all makes sense from a Low-Level perspective. Whoever made this flag available for the audio client objects simply hooked the current default format of the device to a resampler object/pipe, and what we get from GetBuffer is the resampled data. Only the data was taken care of, device position and QPC remain untouched. Kind of careless coding if you ask me, but hey ….

    There is one very important thing i have to add. The audio client behavior i described is only there when the client was created from an IMMDevice by calling Activate. All audio client objects created from ActivateAudioInterfaceAsync, no matter if default or non-default, work with the correct device position. So, for everyone reading this:

    If you want to resample audio data on an audio client with the AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM flag, and if you plan to synchronize the audio to something else ( or to itself, on a stream switch for example ), then you have to create your client through ActivateAudioInterfaceAsync. Do NOT use IMMDevice::Activate, because every client, activated by that method, increments the device position with the number of frames of the default periodicity for the current default format.

    I knew that the default renderer and default capturer, created from ActivateAudioInterfaceAsync, are different implementations, as they handle a lot of stuff for us, like Auto-Stream-Routing for example, but i was under the assumption that the non-default clients created from that method would be the same as the clients created from IMMDevice::Activate. That is not the case ! It seems that the non-default clients, created from the ActivateAudioInterfaceAsync method, are also completely different implementations than the ones you get from IMMDevice::Activate.

    I might file that in the Feedback Hub and post it over there on your WordPress site, but i will be off work for the next 3-5 days. Maybe early next week. 

    Regards,

    Francis






    Wednesday, July 3, 2019 8:26 PM

All replies

  • https://docs.microsoft.com/en-us/windows/win32/api/audioclient/nf-audioclient-iaudiocaptureclient-getbuffer

    u64DevicePosition will advance according to the sampling rate in the format passed to IAudioClient::Initialize, regardless of whether there is any sample rate conversion in the audio engine. In the absence of glitches, this is the same as the sum of all the numFramesToRead values from previous calls to IAudioCaptureClient::GetBuffer. In the presence of glitches, it may be higher than that sum. So you should not ignore it, or you will lose synchronization in the face of glitches.

    u64QPCPosition is the QPC time - converted to tenths-of-a-microsecond - that the first sample in the received packet entered the microphone. This will be earlier than the QPC time when the IAudioCaptureClient::GetBuffer call happened, by the amount of latency in the audio engine. The latency will vary from audio device to audio device, but should be fixed for any given IAudioClient::Initialize instance.


    Matthew van Eerde

    Tuesday, July 2, 2019 12:03 AM
    Moderator
  • https://docs.microsoft.com/en-us/windows/win32/api/audioclient/nf-audioclient-iaudiocaptureclient-getbuffer

    u64DevicePosition will advance according to the sampling rate in the format passed to IAudioClient::Initialize, regardless of whether there is any sample rate conversion in the audio engine.

    ...



    Matthew van Eerde

    Nope, that is not what happens atm in WASAPI when the client was initialized with AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM. As said, if your target format is 44100, then you either get 441 or 448 frames per default period, depending on the device. The device position however is incremented with the number of frames per period of the current Windows mix format for the device.

    I tested that with several microphones that have different alignment requirements, some give back 448 frames and some give 441 frames for 44100 samples per second. My problem is, that the current format in Windows for the device is 48000 and u64DevicePosition is getting incremented by 480 frames, while i am getting 441 or 448 frames from each packet. Maybe you should have tested that yourself before answering the question. I am running Windows 10 Pro Build 1809 ( have not yet updated to 1903 ), in case you want to have equal conditions.

    Regards,

    Francis



    Tuesday, July 2, 2019 11:51 AM
  • If I understand you correctly:

    • The mix format is 48 kHz
    • You're initializing with a client format of 44.1 kHz and specifying AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM
    • The device position is advancing at a rate of 48 kHz rather than 44.1 kHz?

    It would seem you have found a bug. Please file a problem report in Feedback Hub and include logs of the problem in action. Once filed, grab a link and post it here.

    https://matthewvaneerde.wordpress.com/2016/09/26/report-problems-with-logs-and-suggest-features-with-the-feedback-hub/


    Matthew van Eerde

    Wednesday, July 3, 2019 5:39 PM
    Moderator
  • If I understand you correctly:

    • The mix format is 48 kHz
    • You're initializing with a client format of 44.1 kHz and specifying AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM
    • The device position is advancing at a rate of 48 kHz rather than 44.1 kHz?

    ...


    Matthew van Eerde

    Precisely.

    Somehow i knew you would call it a bug. I would not call it that though, because it all makes sense from a Low-Level perspective. Whoever made this flag available for the audio client objects simply hooked the current default format of the device to a resampler object/pipe, and what we get from GetBuffer is the resampled data. Only the data was taken care of, device position and QPC remain untouched. Kind of careless coding if you ask me, but hey ….

    There is one very important thing i have to add. The audio client behavior i described is only there when the client was created from an IMMDevice by calling Activate. All audio client objects created from ActivateAudioInterfaceAsync, no matter if default or non-default, work with the correct device position. So, for everyone reading this:

    If you want to resample audio data on an audio client with the AUDCLNT_STREAMFLAGS_AUTOCONVERTPCM flag, and if you plan to synchronize the audio to something else ( or to itself, on a stream switch for example ), then you have to create your client through ActivateAudioInterfaceAsync. Do NOT use IMMDevice::Activate, because every client, activated by that method, increments the device position with the number of frames of the default periodicity for the current default format.

    I knew that the default renderer and default capturer, created from ActivateAudioInterfaceAsync, are different implementations, as they handle a lot of stuff for us, like Auto-Stream-Routing for example, but i was under the assumption that the non-default clients created from that method would be the same as the clients created from IMMDevice::Activate. That is not the case ! It seems that the non-default clients, created from the ActivateAudioInterfaceAsync method, are also completely different implementations than the ones you get from IMMDevice::Activate.

    I might file that in the Feedback Hub and post it over there on your WordPress site, but i will be off work for the next 3-5 days. Maybe early next week. 

    Regards,

    Francis






    Wednesday, July 3, 2019 8:26 PM