WASAPI: Loopback capture and downmixing RRS feed

  • Question

  • I'm recording audio via the WASAPI IAudioClient API. However, I want the output format to be 16-bit PCM rather than e.g. IEEE floating-point audio. I therefore craft a WAVEFORMATEX struct setting my desired parameters (such as format, sampling rate, bits per sample, and number of channels) and pass it to IAudioClient::Initialize().

    IAudioClient::Initialize() returns S_OK and everything is hunky-dory as the output format of IAudioClient::GetBuffer() is indeed in the expected format.

    However, problems arise when I'm capturing on a system where the output device has more than 2 channels. In that case, no audio is coming out.

    How do I properly downmix and/or downsample the captured audio frames with WASAPI?

    Monday, January 9, 2012 7:54 AM

All replies

  • WASAPI will do a float-to-int or an int-to-float conversion for you but that's about it.  If you want to do more sophisticated format translation (like resampling or downmixing) you'll need to turn to a different API, such as Media Foundation which offers the resampler MFT.


    IAudioClient::Initialize should be returning a failing HRESULT if the format is not supported.

    Matthew van Eerde
    Monday, January 9, 2012 4:20 PM
  • IAudioClient::Initialize should be returning a failing HRESULT if the format is not supported.

    That's the funny thing, though. I'm calling GetMixFormat() and then change the WAVEFORMATEX to PCM audio as well as adjust nChannels and nSamplesPerSec, and I then pass the struct to Initialize(), which in turn returns S_OK. I guess the audio client just ignores my values for nChannels and nSamplesPerSec and instead just uses whatever the capture device is?

    Out of curiosity, say I want to manually downmix from 8 channels to 2. I should be able to simply loop through the captured frame buffer and take the two first samples in each 8 frames and that ought to be the left and right speakers (if the default ordering holds true: http://msdn.microsoft.com/en-us/windows/hardware/gg463006.aspx#E6C), right?

    Monday, January 9, 2012 5:33 PM
  • If IAudioClient::Initialize() is returning S_OK, but then you're getting garbage data, that's a bug in WASAPI.  Time permitting I will attempt to repro locally.  Can you send the mix format and the values you're passing to Initialize()?

    Downmixing 7.1 to stereo is more complicated than just extracting the left and right channels.  Consider a movie soundtrack where there are explosions and whatnot across all eight channels, and a dialog primarily on the center channel.  If you extract only the L and R, you'll lose most of the dialog.

    There are various downmix equations that can be used to map 7.1 to 2.0, but they will all include at least some of the center channel on the output L and R.

    Matthew van Eerde
    Monday, January 9, 2012 10:05 PM
  • If I want to capture multichannel digital sound (>=4) audio, I think I am out of luck.

    The problem is if audio device such as AC3 through SDPIF connection or to HDMI audio, the compressed digital audio will not be rendered at all, right? It goes directly to cable.

    The virtual audio loopback on analog audio device is possible because the user mode audio mixer engine. If audio render will not even rendered (decode)  the audio, there will be no audio capture.

    I heard virtual HDMI audio from one of graphics company, I wonder how it is done.

    The audio packets can be captured if digital audio device has a hardware loopback. If we can capture encrypt packets, decrypt the packets, encrypt it (HDCP) forward to the packets to another HDCP device and done with it. 

    • Edited by AVGuys Friday, January 20, 2012 1:28 AM
    Friday, January 20, 2012 1:26 AM