none
WASAPI WAVEFORMATEX rather than WAVEFORMATEXTENSIBLE

    Question


  • I would like to capture sound with 1 or 2 channel  and the format structure needed is WAVEFORMATEX rather than WAVEFORMATEXTENSIBLE.
    The folowing code worked for me in my system:

    WAVEFORMATEX *pwfx;
    hr = pAudioClient->GetMixFormat(&pwfx);


    //create a new or closest match format
    pwfxNew->cbSize = 0;
    pwfxNew->nAvgBytesPerSec = pwfx->nAvgBytesPerSec;
    pwfxNew->nBlockAlign = pwfx->nBlockAlign;
    pwfxNew->nSamplesPerSec = pwfx->nSamplesPerSec;
    pwfxNew->nChannels = (pwfx->nChannels == 1)? 1 : 2;
    pwfxNew->wBitsPerSample = pwfx->wBitsPerSample;
    pwfxNew->wFormatTag = WAVE_FORMAT_PCM;
    hr = pAudioClient->IsFormatSupported( AUDCLNT_SHAREMODE_SHARED, pwfxNew,

    &pwfxClosestMatch);
    if( S_FALSE == hr && sizeof(pwfxNew) == sizeof(pwfxClosestMatch))
    {
     memcpy(pwfxNew, pwfxClosestMatch, sizeof(pwfxClosestMatch));
    }


    //Initialize audioclient with new format
    hr = pAudioClient->Initialize(
            AUDCLNT_SHAREMODE_SHARED,
            AUDCLNT_STREAMFLAGS_LOOPBACK,
            0, 0, pwfxNew, 0
        );

    Is there any chance to fail this logic in any other system?
    Actually I am trying to add sound in to an existing capturing project. Creating video and adding audio is done using "avifil32.dll".  That is why i needed 2 channels and WAVEFORMATEX.

    thanks,

    Sreejith.


    Tuesday, September 11, 2012 11:14 AM

Answers

  • Here's the code from http://blogs.msdn.com/b/matthew_van_eerde/archive/2008/12/16/sample-wasapi-loopback-capture-record-what-you-hear.aspx to change a given PCM format to the corresponding 32-bit floating point format:

        if (bInt16) {
            // coerce int-16 wave format
            // can do this in-place since we're not changing the size of the format
            // also, the engine will auto-convert from float to int for us
            switch (pwfx->wFormatTag) {
                case WAVE_FORMAT_IEEE_FLOAT:
                    pwfx->wFormatTag = WAVE_FORMAT_PCM;
                    pwfx->wBitsPerSample = 16;
                    pwfx->nBlockAlign = pwfx->nChannels * pwfx->wBitsPerSample / 8;
                    pwfx->nAvgBytesPerSec = pwfx->nBlockAlign * pwfx->nSamplesPerSec;
                    break;

                case WAVE_FORMAT_EXTENSIBLE:
                    {
                        // naked scope for case-local variable
                        PWAVEFORMATEXTENSIBLE pEx = reinterpret_cast<PWAVEFORMATEXTENSIBLE>(pwfx);
                        if (IsEqualGUID(KSDATAFORMAT_SUBTYPE_IEEE_FLOAT, pEx->SubFormat)) {
                            pEx->SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
                            pEx->Samples.wValidBitsPerSample = 16;
                            pwfx->wBitsPerSample = 16;
                            pwfx->nBlockAlign = pwfx->nChannels * pwfx->wBitsPerSample / 8;
                            pwfx->nAvgBytesPerSec = pwfx->nBlockAlign * pwfx->nSamplesPerSec;
                        } else {
                            printf("Don't know how to coerce mix format to int-16\n");
                            CoTaskMemFree(pwfx);
                            pAudioClient->Release();
                            return E_UNEXPECTED;
                        }
                    }
                    break;

                default:
                    printf("Don't know how to coerce WAVEFORMATEX with wFormatTag = 0x%08x to int-16\n", pwfx->wFormatTag);
                    CoTaskMemFree(pwfx);
                    pAudioClient->Release();
                    return E_UNEXPECTED;
            }

    The API to use probably depends on what you want to do with the data after you capture it.  For example, you could write a Media Foundation source that wrapped WASAPI loopback capture.  Then Media Foundation could use your source together with the built-in Resampler MFT audio effect to convert the audio sample rate and channel count to meet the needs of any Media Foundation sink.


    Matthew van Eerde

    Friday, September 14, 2012 3:54 PM

All replies

  • Are you capturing data that is coming into the system via the microphone, or are you recording what the system is playing to the speakers?  That is to say, where did you get the IAudioClient?

    Your WAVEFORMATEX manipulation code is not guaranteed to work because there are some dependencies between fields in a WAVEFORMATEX.

    wBitsPerSample is the number of bits in a single "sample" of audio - that is, a single reading of a single channel.

    nChannels is the number of samples in a single "frame" - for example, stereo has two readings, one for the left channel and one for the right.  5.1 has six samples in a single frame, one for each channel.

    nBlockAlign is the number of bytes in a single "frame".  So nBlockAlign = nChannels * wBitsPerSample / 8.

    nSamplesPerSec is the number of readings (for each channel) in a single second.  So 44.1 kHz-sampled audio has nSamplesPerSec = 44100.

    nAvgBytesPerSec is the number of bytes of data collected per second.  So nAvgBytesPerSec = nBlockAlign * nSamplesPerSec.

    This code looks incorrect.  The sizeof clause will always return true, since you're comparing the sizes of two pointers.  Pointers are always the same size as each other.  The memcpy will only copy 4 or 8 bytes.  WAVEFORMATEX is a variable-sized structure; if you want to make a deep copy, you'll need to check the WAVEFORMATEX.cbSize to see how much extra data to copy beyond sizeof(WAVEFORMATEX).

    if( S_FALSE == hr && sizeof(pwfxNew) == sizeof(pwfxClosestMatch))
    {
     memcpy(pwfxNew, pwfxClosestMatch, sizeof(pwfxClosestMatch));
    }

    Once you fix this, though, this still isn't guaranteed to work, because it's entirely possible the audio device doesn't support either stereo or mono. WASAPI does not provide any facilities to mix n-channel audio to stereo, so you'll need to use another API or provide your own mixing.


    Matthew van Eerde

    Tuesday, September 11, 2012 5:10 PM
  • I am recording what the system is playing to the speakers.  I am trying with "loopback-capture" posted by you in your blog. I have combined the "play-silence" code with "loopback-capture".

    Thank you for showing me the errors in code.

    In loopback-capture there is a Float to Int ( 16 bit) conversion. And if the MixFormat is WAVE_FORMAT_IEEE_FLOAT you have changed it to WAVE_FORMAT_PCM (pwfx->wFormatTag = WAVE_FORMAT_PCM;).
     Can we do this in case where the MixFormat is WAVE_FORMAT_EXTENSIBLE and the channel count is <= 2 ? ( by creating new WAVEFORMATEX instead of WAVE_FORMAT_EXTENSIBLE and call IAudioClient::Initialize with the new struct).

    Could you please suggest an API or code sample to mix n-channel audio to stereo.

    Friday, September 14, 2012 10:38 AM
  • Here's the code from http://blogs.msdn.com/b/matthew_van_eerde/archive/2008/12/16/sample-wasapi-loopback-capture-record-what-you-hear.aspx to change a given PCM format to the corresponding 32-bit floating point format:

        if (bInt16) {
            // coerce int-16 wave format
            // can do this in-place since we're not changing the size of the format
            // also, the engine will auto-convert from float to int for us
            switch (pwfx->wFormatTag) {
                case WAVE_FORMAT_IEEE_FLOAT:
                    pwfx->wFormatTag = WAVE_FORMAT_PCM;
                    pwfx->wBitsPerSample = 16;
                    pwfx->nBlockAlign = pwfx->nChannels * pwfx->wBitsPerSample / 8;
                    pwfx->nAvgBytesPerSec = pwfx->nBlockAlign * pwfx->nSamplesPerSec;
                    break;

                case WAVE_FORMAT_EXTENSIBLE:
                    {
                        // naked scope for case-local variable
                        PWAVEFORMATEXTENSIBLE pEx = reinterpret_cast<PWAVEFORMATEXTENSIBLE>(pwfx);
                        if (IsEqualGUID(KSDATAFORMAT_SUBTYPE_IEEE_FLOAT, pEx->SubFormat)) {
                            pEx->SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
                            pEx->Samples.wValidBitsPerSample = 16;
                            pwfx->wBitsPerSample = 16;
                            pwfx->nBlockAlign = pwfx->nChannels * pwfx->wBitsPerSample / 8;
                            pwfx->nAvgBytesPerSec = pwfx->nBlockAlign * pwfx->nSamplesPerSec;
                        } else {
                            printf("Don't know how to coerce mix format to int-16\n");
                            CoTaskMemFree(pwfx);
                            pAudioClient->Release();
                            return E_UNEXPECTED;
                        }
                    }
                    break;

                default:
                    printf("Don't know how to coerce WAVEFORMATEX with wFormatTag = 0x%08x to int-16\n", pwfx->wFormatTag);
                    CoTaskMemFree(pwfx);
                    pAudioClient->Release();
                    return E_UNEXPECTED;
            }

    The API to use probably depends on what you want to do with the data after you capture it.  For example, you could write a Media Foundation source that wrapped WASAPI loopback capture.  Then Media Foundation could use your source together with the built-in Resampler MFT audio effect to convert the audio sample rate and channel count to meet the needs of any Media Foundation sink.


    Matthew van Eerde

    Friday, September 14, 2012 3:54 PM