locked
Adaptive WASAPI capture buffer sampling and WinRT (ARM) considerations

    Question

  • Hey Everyone,

    So I am working on a C++ DLL to process audio coming from a recording input and I have some questions about the proper way the audio samples should be taken and the proper way of adapting to Shared Mode and/or various formats. What I want to do is have each sample range from -1 to 1 but I want to do it in a way that it can adapt to any format and still consistently be between -1 to 1. In my app I am getting weird results after taking this data and processing it so I believe I am not creating these samples the right way. Let's start with the format; I am running this in Exclusive Mode with the following WAVEFORMATEX set...

    pwfx = new WAVEFORMATEX();
    pwfx->wFormatTag = WAVE_FORMAT_PCM;
    pwfx->nSamplesPerSec = 44100;
    pwfx->wBitsPerSample = 16;
    pwfx->nChannels = 2;
    pwfx->nBlockAlign = pwfx->nChannels * pwfx->wBitsPerSample / 8;
    pwfx->nAvgBytesPerSec = pwfx->nBlockAlign * pwfx->nSamplesPerSec;
    pwfx->cbSize = 0;

    Since my current bit rate is 16, each sample block for the left and right channel consists of 2 bytes. Please correct me if I am wrong but I am assuming that in each block of 4 bytes in this case (2 for left and 2 for right), they are interlaced in low-endian form. If 4 bytes were to come from the recording buffer it would be arranged like so...

    byte1, byte2, byte3, byte4

    Left Sample [a:byte1, b:byte2] : Right Sample [a:byte3, b:byte4]

    and I am using the following code to take these bytes and convert them to 16 bit ints (short)...

    short AudioSink::ConvertBytesToShort(BYTE a, BYTE b, bool FirstByteIsLow)
    {
    	// Math Equasion: a * 256 + b -> unsigned short (0-65535)
    	
    	if (FirstByteIsLow)
    		return a | (b << 8);
    	else
    		return (a << 8) | b;
    }

    1.) So my first question is in the WAVEFORMATEX listed above, is the byte arrangement for stereo I described below it the correct way to read the bytes coming from a WASAPI recording buffer with the left channel being made of bytes 1 and 2 and the right being made of bytes 3 and 4 both in Low Endian Format?

    2.) My follow up to that question is now that I have a 16 bit int ranging from -32,768 -> 32,767, how do I get that to be scaled from -1 -> +1? Normally if it was something like -32,768 -> 32,768, I would divide by 32,768 and expect that value. But the positive side never goes that high and I see a lot of people out there on the internet doing it that way. Now -32,768 would give us -1 and 0 would equally give us 0 but if you take the positive max 32,767 and divide by 32,768 you will never make it to 1. Am I wrong? This is a small bit of confusion I've had for a while. What I feel is the right way to get a value from -1 to 1 is do something like this...

    Take the short and add 32,768 making it like an unsigned short that now ranges from 0 - 65,535. Take that value and divide by 65,535 which would give me a number from 0 to 1 now. Multiply by 2 and subtract by 1 should now give a value ranging from -1 to +1.

    3.) Of the above, which way do you prefer and which way is the more correct answer?

    4.) Another question about the format tag is I feel like the tag I want here is WAVE_FORMAT_PCM but if I enter a shared mode and it defaults to WAVE_FORMAT_EXTENDABLE, what affect does this have on the data coming in from the recording buffer?

    Now what if I want to change my format to Shared Mode and let the computer determine a 32 bit floating point format (my PC defaulted to WAVE_FORMAT_EXTENDABLE, 48KHz, 32 bit, 2 Channel).  One thing I attempted to do was take now 8 bytes at a time, 4 for the left channel and 4 for the right channel and convert those bytes into floats. This didn't seem to work as expected, however maybe it had something to do with the fact this was 5:30 in the morning and I wasn't waking up FYI but that's beside the point :) This is the function I wrote then...

    float AudioSink::ConvertBytesToFloat(BYTE a, BYTE b, BYTE c, BYTE d, bool IsLowEndian)
    {
    	if (IsLowEndian)
    		return a | (b << 8) | (c << 16) | (d << 24);
    	else
    		return d | (c << 8) | (b << 16) | (a << 24);
    }

    So my goal is to take a sample from a WASAPI recording buffer in any format, bit-rate and channel count, and based on this info, produce samples that range from -1 to +1. If any of you have any ideas how to adapt to these changes at the start of the capture and the processing of the samples, I'd love to hear your feedback and ideas.

    Oh man, almost forgot my most important question or two in one that I wanted to ask....

    4.) In WinRT (ARM), will this data come in from WASAPI the same way? Does the endian-ness change to Big Endian? What other potential considerations will I need to look out for and what is the best way to take this DLL and package it up with my Metro app and when making it available on the Marketplace while supporting ARM, x86 and x64.

    5.) Another last one I almost forgot is how does the cbsize of the WAVEFORMATEX affect me reading in the samples?

    Thanks Everyone, You Rock!

    Matthew




    Monday, August 06, 2012 9:40 PM

Answers

  • Hello Matthew,

    Unfortunately most of your questions are out of scope for me to be able to answer. Maybe someone in the community can help you understand how the wave format works.

    However You might want to look here for some potential guidance:

    WARNING this is an external link to a 3rd party site and is not warranted by Microsoft. Use this link at your own risk.

    http://mark-dot-net.blogspot.com/2010/09/convert-16-bit-pcm-to-ieee-float.html

    Here are the answers to your Windows RT questions:

    Q. In WinRT (ARM), will this data come in from WASAPI the same way?
    A. Yes

    Q. Does the endian-ness change to Big Endian?
    A. No, ARM is bi-endian and we only use little.

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Tuesday, August 07, 2012 1:23 AM
    Moderator

All replies

  • Hello Matthew,

    Unfortunately most of your questions are out of scope for me to be able to answer. Maybe someone in the community can help you understand how the wave format works.

    However You might want to look here for some potential guidance:

    WARNING this is an external link to a 3rd party site and is not warranted by Microsoft. Use this link at your own risk.

    http://mark-dot-net.blogspot.com/2010/09/convert-16-bit-pcm-to-ieee-float.html

    Here are the answers to your Windows RT questions:

    Q. In WinRT (ARM), will this data come in from WASAPI the same way?
    A. Yes

    Q. Does the endian-ness change to Big Endian?
    A. No, ARM is bi-endian and we only use little.

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Tuesday, August 07, 2012 1:23 AM
    Moderator
  • Maybe you could try rendering a float stream with values -1, 0 and +1 and capture from loopback with 16bit PCM stream to see how the values are handled in the Audio Engine. I remember there's a filter in Win7 that caps values that are near the limits. You can read more here System Effects Audio Processing Objects


    Wednesday, October 31, 2012 10:10 AM
  • What a coincidence, this thread. While not entirely related, you guys at MSFT could at least put various issues and restrictions with ARM into the documentation. I've spent a while figuring out why my apps stopped working, just to find out that the SurfaceRT (and maybe other ARM tablets) are hardlocked to 48KHz, and WASAPI won't convert the rate either.
    Wednesday, October 31, 2012 4:46 PM
  • Hello Tom,

    Its not locked to 48k per se. Most devices will support 48k and 44.1k. The actual sample rate may change depending on the OEM / user preference. I would agree that most of the time the default will be 48k. Unfortunately you can't programmatically change the sample rate. You are also correct that WASAPI does not have a default sample rate converter. The Media Engine does support sample rate conversion but its use is very limited and may be more difficult to implement than just creating your own sample rate converter.

    -James

    Wednesday, October 31, 2012 6:44 PM
  • Any reason why there's not a SRC in WindowsRT? The desktop version of Windows does seem to be able to do this.
    Thursday, November 01, 2012 1:38 AM
  • Hi,James,

    I want to change the sample rate,but I didn't find any way.should I do it ? Iwant to change sample rate to 16k and channel to single .thanks.

    -Wuxian

    Thursday, November 15, 2012 5:38 AM