none
Which API is best for real-time audio streaming

    Question

  • Hello all,

    I want to find the optimal audio API to do real-time audio recording and playback. I thought about DirectShow and Kernel Streaming but I dont have any experience with those. My requirements are:

    1. capture and playback must not interfere with other application's audio
    2. must be available on all standard soundcards
    3. lowest possible latency
    4. can capture and playback at the highest sample rate and resolution the soundcard supports
    5. should be available on Windows 2000 and higher

    So far I have been using MME waveIn functions for capture and DirectSound for playback. However I experienced some problems with these interfaces:
    1. The sample rate is very inaccurate on many soundcards. Even setting the same rate for capture and playback on the same device results in deviations of several 100Hz between both sometimes. I have read that this problem is caused by incorrect resampling code from an earlier Microsoft SDK example that is still used in many soundcard drivers. So does anyone know if this problem still occurs using other interfaces than MME and Directsound?
    2. In Vista each connector on the soundcard is represented by a different audio device. After opening one device it seems to be impossible to open another in the same application, even after the first device has been closed.
    3. Also in Vista device notifications for plug and play detection do not work correctly with MME and DirectSound because the device events are sent to the application some time before the devices actually become available/unavailable in the MME and DirectSound device lists.
    4. Also in Vista opening an audio device sometimes fails if another application is playing a sound at that moment.

    I have no experience using DirectShow for audio streaming. I have read that DirectShow filters are only a wrapper for MME and DirectSound devices. Does this mean it is impossible to get any better results using DirectShow instead? Would there be any advantage in using DirectShow?
    Is Kernel Streaming related to DirectShow in any way? Does anyone know how if Kernel Streaming would be a choice for me considering the previous points? Are there any other APIs I should think about?

    Thanks a lot in advance.
    Saturday, June 27, 2009 10:04 AM

Answers

  • I have read that DirectShow filters are only a wrapper for MME and DirectSound devices. Does this mean it is impossible to get any better results using DirectShow instead?
    This is true.  The DirectShow capture and render filters simply pass the data off to DirectSound and MME API's.

    What are you're latency requirements?  On 2K and XP you have a minimum output latency of 20ms thanks to the kernel mixer, which sits in front of DirectSound and MME.  There is no way to get the latency below this while still being friendly with other applications using the device.

    [MVP] http://www.chrisnet.net/code.htm microsoft.public.win32.programmer.directx.audio
    Monday, June 29, 2009 1:31 PM
  • I have been using ASIO in the past. It's API would fit my needs perfectly, however in this case it is not an option because it's not available on every system and it blocks other applications.
    Is there no way to use Kernel Streaming in a non-exclusive way?

    Some time ago I have been using the waveIn-API for capture since it had much lower latency than DirectSoundCapture, which was only a wrapper around waveIn at that time. Does anyone know if this is still true for more current (pre-Vista) DirectX-versions?
    Kernel Streaming isn't always exclusive, it seems to depend on the driver implementation.  There is no way to use it in a guaranteed non exclusive manner however.

    waveIn typically has higher latency than DirectSoundCapture.  DirectSound is your best option for low latency audio I/O on XP/2K while still "getting along" with other apps.  Capture latency can be as low as 10ms, playback output latency will be around 20-30ms.

    http://www.chrisnet.net/code.htm
    Wednesday, July 01, 2009 2:57 PM
  • A few points for you:

    • starting Vista WASAPI is the native API
    • pre-Vista lowest latency should be with DirectSound , both WaveIn and DirectShow's Audio Capture Filter are wrappers and you are likely to have issues with low latency capture (under 20 ms)
    • DirectShow's audio palyback is available through both DirectSound (default) and WaveOut, and DircetShow's DirectSound Renderer Filter is good and flexible enough to be used for playback without going too deep to DirectSound API (DirectShow's advantage here is also ease of using audio decoders and effects available compatible with this API)

    In my opinion you will be generally good going with DirectShow, provided that you allow some latency in audio capture. To be better than that you will need DirectSoundCapture based option for pre-Vista systems and WASAPI option for Vista systems.
    http://alax.info/blog/tag/directshow
    Saturday, June 27, 2009 10:53 AM

All replies

  • A few points for you:

    • starting Vista WASAPI is the native API
    • pre-Vista lowest latency should be with DirectSound , both WaveIn and DirectShow's Audio Capture Filter are wrappers and you are likely to have issues with low latency capture (under 20 ms)
    • DirectShow's audio palyback is available through both DirectSound (default) and WaveOut, and DircetShow's DirectSound Renderer Filter is good and flexible enough to be used for playback without going too deep to DirectSound API (DirectShow's advantage here is also ease of using audio decoders and effects available compatible with this API)

    In my opinion you will be generally good going with DirectShow, provided that you allow some latency in audio capture. To be better than that you will need DirectSoundCapture based option for pre-Vista systems and WASAPI option for Vista systems.
    http://alax.info/blog/tag/directshow
    Saturday, June 27, 2009 10:53 AM

  • 1. capture and playback must not interfere with other application's audio

    3. lowest possible latency

    Your requirement for not interfering with other application's audio capture and playback, is competing with your requirement for low latency.




    1. The sample rate is very inaccurate on many soundcards. Even setting the same rate for capture and playback on the same device results in deviations of several 100Hz between both sometimes. I have read that this problem is caused by incorrect resampling code from an earlier Microsoft SDK example that is still used in many soundcard drivers.

    The sample rate issue was in kmixer (a Microsoft component) but it was fixed in Vista.  It’s not the manufacture’s fault.  The ratio for 44.1k to 48k conversion is 160/147.  Evidently the astute professionals on the Microsoft sound team were confident that strict adherence to the laws of mathematics was not a prerequisite in order for Microsoft customers to enjoy a rich user experience. :D



    I have read that DirectShow filters are only a wrapper for MME and DirectSound devices.

    Nope.  That was only true for Windows 9X VXDs.


    Are there any other APIs I should think about?


     

    Look at the DirectKS Sample Application.  It works on everything from Windows 98 to Windows 7.  It has the lowest latency but you can’t share the device with other applications.

     

    http://www.microsoft.com/whdc/archive/directks.mspx

     


    Sunday, June 28, 2009 6:57 PM
  • I have read that DirectShow filters are only a wrapper for MME and DirectSound devices. Does this mean it is impossible to get any better results using DirectShow instead?
    This is true.  The DirectShow capture and render filters simply pass the data off to DirectSound and MME API's.

    What are you're latency requirements?  On 2K and XP you have a minimum output latency of 20ms thanks to the kernel mixer, which sits in front of DirectSound and MME.  There is no way to get the latency below this while still being friendly with other applications using the device.

    [MVP] http://www.chrisnet.net/code.htm microsoft.public.win32.programmer.directx.audio
    Monday, June 29, 2009 1:31 PM

  • I have read that DirectShow filters are only a wrapper for MME and DirectSound devices.


    Oops.  Wtf was I thinking?  The DirectShow filters are a wrapper for MME or DirectSound devices.

     

    CLSID_AudioRender uses the waveOut API and CLSID_DSoundRender is/ was a client of the SysAudio driver.


    Are there any other APIs I should think about?


    ASIO.

     

    http://en.wikipedia.org/wiki/Audio_Stream_Input/Output

     




    Monday, June 29, 2009 7:20 PM
  • ASIO.

     

    http://en.wikipedia.org/wiki/Audio_Stream_Input/Output

     


    ASIO is the lowest latency interface there is.  Unfortunately few vendors of consumer hardware provide ASIO drivers, so you're stuck using ASIO4ALL as a go between for ASIO and kernel streaming.  Either way neither tend play nice with other applications using the audio hardware at the same time.  Both ASIO and kernel streaming run in an exclusive mode, meaning only one application can be using the hardware at any given time.


    [MVP] http://www.chrisnet.net/code.htm microsoft.public.win32.programmer.directx.audio
    Monday, June 29, 2009 8:15 PM
  • Thank you for your replies so far.

    Here is some more info on my requirements:

    Latencies around 20ms would be fine for me. I'm thinking about to put in exclusive access as an option to get lower latencies if the hardware supports it.
    I want to stream the audio as raw as possible without using any effects or unnecessary conversions.
    For Vista I have decided to use WASAPI which seems to provide what I need, so I only need to consider Win2k/XP now.

    I have been using ASIO in the past. It's API would fit my needs perfectly, however in this case it is not an option because it's not available on every system and it blocks other applications.
    Is there no way to use Kernel Streaming in a non-exclusive way?

    Some time ago I have been using the waveIn-API for capture since it had much lower latency than DirectSoundCapture, which was only a wrapper around waveIn at that time. Does anyone know if this is still true for more current (pre-Vista) DirectX-versions?
    Wednesday, July 01, 2009 1:38 PM
  • I have been using ASIO in the past. It's API would fit my needs perfectly, however in this case it is not an option because it's not available on every system and it blocks other applications.
    Is there no way to use Kernel Streaming in a non-exclusive way?

    Some time ago I have been using the waveIn-API for capture since it had much lower latency than DirectSoundCapture, which was only a wrapper around waveIn at that time. Does anyone know if this is still true for more current (pre-Vista) DirectX-versions?
    Kernel Streaming isn't always exclusive, it seems to depend on the driver implementation.  There is no way to use it in a guaranteed non exclusive manner however.

    waveIn typically has higher latency than DirectSoundCapture.  DirectSound is your best option for low latency audio I/O on XP/2K while still "getting along" with other apps.  Capture latency can be as low as 10ms, playback output latency will be around 20-30ms.

    http://www.chrisnet.net/code.htm
    Wednesday, July 01, 2009 2:57 PM
  • The sample rate issue was in kmixer (a Microsoft component) but it was fixed in Vista.  It’s not the manufacture’s fault.  The ratio for 44.1k to 48k conversion is 160/147.  Evidently the astute professionals on the Microsoft sound team were confident that strict adherence to the laws of mathematics was not a prerequisite in order for Microsoft customers to enjoy a rich user experience. :D
    160/147 would be the correct ratio, so what did kmixer actually use? Is there any way to bypass the incorrect sample rate conversion, like by selecting kmixer's internal sample rate? But how could I find out this rate?
    Or is there at least any way to detect the exact deviation between the rate I select and the rate I actually get?
    Wednesday, July 01, 2009 4:11 PM