format of raw audio data RRS feed

  • Question

  • Hello,

    I want to use the audio raw data of the kinect directly in a real-time c++-Application. I used the code from the AudioCaptureRaw-Console without the resampler and the wma-converter. Can anyone tell me which format the buffer of the CaptureClient has?

    CHECKHR(pCaptureClient->GetBuffer(&buffer, &numFramesAvailable, &flags, NULL, NULL));

    I know the output format of the AudioCaptureRaw-Console is the first 4Bytes are the first frame of the 1. channel the 2nd 4 th frist frame of the 2. channel and so on, but is the buffer of the CaptureClient already the same format? If I try this I only get a white noise as output signal.

    thanks, Anna

    Wednesday, April 17, 2013 12:52 PM

All replies

  • Hi Anna,

    The data format of a wave audio stream is defined by the WAVEFORMATEX structure:
    { public ushort wFormatTag;
       public ushort nChannels;
       public uint nSamplesPerSec;
       public uint nAvgBytesPerSec;
       public ushort nBlockAlign;
       public ushort wBitsPerSample;
       public ushort cbSize;

    More details on a structure’s members are explained in the Microsoft references at
    A complete list of WAVE_FORMAT_XXX formats (WAVE_FORMAT_PCM for one or two channel PCM data) can be found in the Mmreg.h header file.

    For Kinect sensor:

    { wFormatTag = 1,
       nChannels = 1,
       nSamplesPerSec = 16000,
       nAvgBytesPerSec = 32000,
       nBlockAlign = 2,
       wBitsPerSample = 16,
       cbSize = 0

    Share your code, so we can help you  ;)

    Friday, April 26, 2013 12:32 AM
  • The Waveformatex structure of the Audio-Raw Data ist


    nChannels = 4,

    nSamplesPerSec = 16000,

    nAvgBytesPerSec = 256000,

    nBlockAlign = 16,

    wBitsPerSample = 32,

    cbSize = 22


    I want to use the raw-data of the 4 Microphones. For this I used the AudioClient and the CaptureClient the way it is presented in the AudioCaptureRaw-Console. The Poblem is, how do I know how the data in the Bytestream I get from the CaptrureClient is interleaved and which Bytes belong to which sample of which  channel.

    Monday, April 29, 2013 1:56 PM
  • A note:
    the Kinect's audio stream is a 16-bit PCM format, sampled at 16 kHz.  => wBitsPerSample = 16 .
    About the nBlockAlign (nChannels*wBitsPerSample)/8 the right value should be 8.
    Therefore, for 4-channels, nAvgBytesPerSec = 128000 (nSamplesPerSec * nBlockAlign)  => buffer = 16k,

    Try first wFormatTag = 1 or = WAVE_FORMAT_PCM instead WAVEFORMATEXTENSIBLE and cbSize = 0 instead 22. (Today I can't see the AudioCaptureRaw-Console example)

    Let me know if with this parameters you get the right recording or you
     "get a white noise as output signal."

    The channel data should be interleaved in that order within each block.

    Monday, April 29, 2013 3:33 PM
  • I can't change the paramterers. I initialise the Kinect like this: (Where CreateFirstConnected() and GetMatchingAudioDevice() are the same Functions as in the audio raw console, parameters

    INuiSensor* pNuiSensor; 
    IMMDevice *device;
    IAudioClient *pAudioClient;
    IAudioCaptureClient *pCaptureClient;
    hr2 = CreateFirstConnected(&pNuiSensor);
    this->device->Activate(__uuidof(IAudioClient), CLSCTX_INPROC_SERVER, NULL, reinterpret_cast<void **>(&this->pAudioClient));
    CHECKHR(pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_NOPERSIST, hnsRequestedDuration, 0, pwfx, NULL));
    // Get the size of the allocated buffer.

    so the pwfx format is given by the audioclient of the Kinect. I get the right bitstream, but I don't know how to read it.

    Wednesday, May 8, 2013 9:37 AM