locked
WASAPI Resampling RRS feed

  • Question

  • I'm writing a Pro Audio application using exclusive-mode WASAPI.  The application's audio stream uses both an input and an output device, each of which are set to use their default formats (retrieved via PKEY_AudioEngine_DeviceFormat).  The internal audio DSP pipeline uses a constant format of 48000 kHz stereo floating point, however both the input and output devices can have any variable number of channels, sample rates, and formats (PCM or IEEE_FLOAT).  To compensate, I want to resample the input device's format to my own format, and then resample that format to the output device's format when all DSP is complete.

    I'm trying to use the Audio Resampler DSP API found here.  I'm using the IMFTransform object which is a part of the Windows Media Foundation API.  I've run into a few issues - one, I can't find any explanation as to how this API works.  It's written as if it's asynchronous and sends info to a driver somewhere (command queues, locks on the buffers), but it's mentioned several times that all processing is done synchronously, IE within the same thread of my own process (which is what I want).  Naturally, this confusion makes me wonder if this is the correct API for what I'm doing - real time audio processing.

    The second is actually getting the resampler to obey - IMFTransform::ProcessOutput always returns MF_E_TRANSFORM_NEED_MORE_INPUT.  I've created the media types using the WAVEFORMATEXTENSIBLE structures for the devices and my own custom WAVEFORMATEXTENSIBLE that best describes the internal implementation of the DSP pipeline.  This is done using MFInitMediaTypeFromWaveFormatEx.  The IMFTransform objects (two of them - one for input resampling, one for output resampling) accept both their input and output media types, and only have a problem when it's actually time to do the resampling (IMFTransform::ProcessOutput is the only part in the process where I ever get an error).

    The third issue is that in order to resample there's a lot of buffer copying that has to happen and, if I understand this example correctly, a lot of unnecessary memory allocations / deletions.

    Is this the correct API to use for this application's purpose?  What do I have to do to resolve the MF_E_TRANSFORM_NEED_MORE_INPUT error?  Can I use my own buffers (plain float arrays and the WASAPI client buffers) on the IMFTransform objects?





    Monday, April 28, 2014 5:18 AM

Answers

  • The minimum size is up to the transform.

    This particular transform doesn't document its size requirements, so you have to keep feeding it data until it declares itself satisfied and willing to produce an output sample.

    You can call IMFTransform::GetOutputStatus and check the MFT_OUTPUT_STATUS_SAMPLE_READY flag to see if you've given it enough data.


    Matthew van Eerde

    • Marked as answer by Austin Borger Monday, April 28, 2014 8:41 PM
    Monday, April 28, 2014 6:30 PM

All replies

  • Here's an outline of how the Media Foundation Transform processing pipeline works:

    http://msdn.microsoft.com/en-us/library/windows/desktop/aa965264(v=vs.85).aspx

    You call IMFTransform::ProcessOutput to ask the resampler to give you output. MF_E_TRANSFORM_NEED_MORE_INPUT is the transform's way of telling you it needs more input before it can give you any more output. Call IMFTransform::ProcessInput to give it more input.


    Matthew van Eerde

    Monday, April 28, 2014 4:04 PM
  • Can the transform take a buffer of 480 samples?  Is there a minimum buffer size it needs to do the conversion?

    I directly followed the example I linked - I called ProcessInput with a IMFSample that had a single IMFMediaBuffer with 480 sample frames in it, which didn't return an error.  Immediately after, I called ProcessOutput, which returned the error code.

    Monday, April 28, 2014 5:32 PM
  • The minimum size is up to the transform.

    This particular transform doesn't document its size requirements, so you have to keep feeding it data until it declares itself satisfied and willing to produce an output sample.

    You can call IMFTransform::GetOutputStatus and check the MFT_OUTPUT_STATUS_SAMPLE_READY flag to see if you've given it enough data.


    Matthew van Eerde

    • Marked as answer by Austin Borger Monday, April 28, 2014 8:41 PM
    Monday, April 28, 2014 6:30 PM
  • Well, I can't give it more data than I already have because of latency requirements.  Is there an API that doesn't have this sort of requirement, or is this just a general limitation of a resampling DSP?
    Monday, April 28, 2014 7:00 PM
  • It's certainly possible to come up with resampler DSP code that has different latency characteristics than the Microsoft Resampler DSP MFT.

    You could give it more data than you already have, for example by priming the input with silence. You are correct that this will add latency. I don't know what your latency requirements are, so I can't comment on whether this is the best solution for your situation.


    Matthew van Eerde

    Monday, April 28, 2014 7:50 PM
  • Alright, I guess I'll have to try something else.
    Monday, April 28, 2014 8:33 PM
  • From a theoretical DSP standpoint:

    You can change the channel count at will with zero latency, since you're just multiplying each frame by a matrix.

    Changing the sample rate, though, requires adding a lowpass filter or you get aliasing. In particular, if you convert from X kHz to Y kHz, you need to apply a lowpass filter at min(X, Y)/2 kHz.

    The lowpass filter is what introduces the latency.

    A lowpass filter with lots of poles and zeros uses lots of CPU, outputs higher quality audio, but has a longer length, which means more latency.

    A lowpass filter with few poles and zeros uses less CPU, outputs lower quality audio, and has a shorter length, which means less latency.


    Matthew van Eerde

    Monday, April 28, 2014 9:33 PM
  • 10 ms should be plenty, though.

    How are you building the IMFSample? Are you setting the duration? It doesn't do any good to call ProcessInput with a zero-size buffer. Are you calling IMFMediaBuffer::SetCurrentLength(...) prior to IMFSample::AddBuffer()?


    Matthew van Eerde

    Monday, April 28, 2014 9:44 PM
  • Perhaps not.  I'll try again and see where that gets me.

    On your point about the lowpass filter though - I think that raises a good point, and I'm going to default to 44.1kHz (and 441 buffer frames) so that no filtering needs to be done (I'm just not going to support anything lower than 44.1kHz, which seems to be the lowest sample rate these days anyway).  However, since this is a pro audio application, I expect filters to be all over the place.  To my ears, FL Studio seems to operate filters just fine at 512 buffer frames in ASIO - is 441 enough for a high quality filter?

    Tuesday, April 29, 2014 10:51 PM
  • Be careful: IAudioCaptureClient::GetBuffer will give you a frame count, whereas IMFMediaBuffer needs a byte count. You can go from a frame count to a byte count by multiplying by pWfx->nBlockAlign.

    Also make sure to set the time and duration, both in hns.


    Matthew van Eerde

    Tuesday, April 29, 2014 11:32 PM