Answered How to Make Sample in AudioCaptureRaw(C++)?

  • Wednesday, March 28, 2012 12:19 PM
     
     

    the AudioCaptureRaw can capture the sound source from MicrophoneArray.

    Type of Input Data is BYTE[].

    How To Make a Sample, From Byte[]?


    plz plz plz X 100000000000000^1000000^10000^10^10^1000^999

    • Edited by _yi Wednesday, March 28, 2012 12:20 PM
    • Edited by _yi Wednesday, March 28, 2012 12:21 PM
    • Edited by _yi Wednesday, March 28, 2012 12:29 PM
    •  

All Replies

  • Wednesday, March 28, 2012 4:25 PM
    Owner
     
      Has Code

    The byte stream is PCM format. 

    You can look at the WAVEFORMAT_EX structure to see the details (In this sample it's stored in _MixFormat in the CWASAPICapture class.

    If you're capturing from the Kinect audio device in this sample, you're capturing the raw audio from the array mic, so you'll see that there are 4 channels, 16000 samples per second, bits per sample is 32. 

    So, 2 bytes for each sample, samples are interleaved in sets of 4, giving us 8 total bytes and 4 samples for each time quantum.

    The samples themselves are stored basically as a short, but the 1st byte is the low order bits.

    To get an individual sample, you can use the following code:

    short audioSample = static_cast<short>(pBuffer[i] | (pBuffer[i+1] << 8));

    This assumes that pBuffer is pointing at the byte array, and that i is even.
    To access the ith sample in the nth channel, you would want:

    short audioSample = static_cast<short>(pBuffer[ i * 8 + m * 2] | (pBuffer[i * 8 + m * 2 + 1] << 8));
    Note, if you're picking up raw data as demoed in that sample, you are not getting the BeamForming, noise suppression, or AEC. If you want those pieces of advanced functionality, you need to use the DMO.

    For lots of audio processing you actually want a float rather than a short... If you need that, you simply cast the sample value to a short and divide by the range (65535).


  • Thursday, March 29, 2012 5:47 AM
     
      Has Code

    I try executing code below, before captureBuffer is closed.   

    In main function,

      int m =0; //select channel
      for(int i=0; i<captureBufferSize/8;i=i+2)
      {
       
           short audioSample = static_cast<short>(captureBuffer[ i * 4 + m * 2] | (captureBuffer[i * 4 + m * 2 + 1] << 8));
           printf("%d ", audioSample);
      }  
    
      delete []captureBuffer;

    but the result doesn't seem to look like a sample like this.

    What is Microsoft Employee's opinion about this?

     





    • Edited by _yi Thursday, March 29, 2012 6:14 AM
    • Edited by _yi Thursday, March 29, 2012 6:15 AM
    • Edited by _yi Thursday, March 29, 2012 6:19 AM
    • Edited by _yi Thursday, March 29, 2012 6:20 AM
    •  
  • Friday, March 30, 2012 5:49 AM
    Owner
     
     

    FYI, I had a slight bug in one of the code samples... I had a *4, where I should have had a *8. 

    What do you mean when you say they don't look like samples?  Are you concerned that the data stream isn't matching what you believe should be there, or something about formatting, output, presentation, interpretation, etc... ? :)

  • Monday, April 02, 2012 5:25 AM
     
      Has Code
                    int m =0; //select channel
    		for(int i=0; i<3000;i=i+1)
    		{
    					
    			short audioSample = static_cast<short>(captureBuffer[ i * 8 + m * 2] | (captureBuffer[i * 8 + m * 2 + 1] << 8));
    			printf("%d ", audioSample);
    		}
    		
    
    		delete []captureBuffer;


    result :

    Code in this post is related to the second picture.

    two bytes from starting point make a ch0 sample indexed 0, next two bytes make a ch1 sample indexed 0, ... , next two bytes make a ch3 sample indexed 0, next two bytes is a ch0 sample indexed 1, .... this is referenced from Microsoft Employee's reply,

    theoritically, i calculate the sample should be low value near 0, but the sample are -29282, 301347....

    the result is unexpected,  

    i imagine the sample must be "12 200 300 230 20 -49 -50 -100 -20 40 30 5 120 130 60 30 -10 -40 -50" becuase recorded surroundings is silence .

    Or the sample, -29282, 30147 , is right(correct).

    how do you think?





    So any help you can give me is really appreciated!

    Greetings

    • Edited by _yi Monday, April 02, 2012 5:47 AM
    • Edited by _yi Monday, April 02, 2012 5:50 AM
    • Edited by _yi Monday, April 02, 2012 5:54 AM
    • Edited by _yi Monday, April 02, 2012 5:56 AM
    • Edited by _yi Monday, April 02, 2012 5:56 AM
    • Edited by _yi Monday, April 02, 2012 6:08 AM
    • Edited by _yi Monday, April 02, 2012 6:08 AM
    • Edited by _yi Monday, April 02, 2012 6:22 AM
    •  
  • Monday, April 02, 2012 5:34 PM
    Owner
     
     
    One thing to double check... Make sure that your volume on the mic is set to "3".  This will result in a 0 db gain, which is what you want...  You could easily be seeing noise amplified by the operating system.
  • Tuesday, April 03, 2012 9:38 AM
     
     


    Sorry about my little knowledge.

    i open out.wav in matlab;

    the data is different with printed data I make.

    i read the out.wav using matlab.

    this graph is not larger than 0.05; too low;

    but my data is usually near max value.;

    i know this data become double type. but consider that +max value is 0.5;

    32000 is almostly 0.5 in double type;

    Can you give me a code to make samples as double type? following matlab did.





    • Edited by _yi Tuesday, April 03, 2012 9:39 AM
    • Edited by _yi Tuesday, April 03, 2012 9:42 AM
    • Edited by _yi Tuesday, April 03, 2012 9:44 AM
    • Edited by _yi Tuesday, April 03, 2012 9:46 AM
    •  
  • Tuesday, April 03, 2012 4:16 PM
    Owner
     
     Answered Has Code

    Okay, figured it out...  The IAudioClient interface is actually using a slightly different format than the native format you get back from the Kinect if you're using the beam forming functionality.

    If you examine the WAVEFORMATEX that is returned by the client, you'll see that it's returning 32 bit samples rather than 16, which is the source of the confusion.

    int m = 0; // select channel
    for (int i = 0; i < BufferSize / 16; i +=1)
    {
      int iSample = i * 16 + m * 4;
      long audioSample = static_cast<long>(
                         CaptureBuffer[iSample]
                       | (CaptureBuffer[iSample+1] << 8)
                       | (CaptureBuffer[iSample+2] << 16)
                       | (CaptureBuffer[iSample+3] << 24) 
                       );
      printf("%d, %f\n", audioSample, (double) audioSample/MAXLONG);
    
    }

  • Wednesday, April 04, 2012 2:31 PM
     
     

    I have TESTED code you make.

    but, result is same, i have gotten value of near 0.5(-0.5);

    i already have spoken to you in previous post about the true that value is near 0.5.

     there are no difference between silence and noisy environment as signal,

    value is always  near 0.5(-0.5) whatever(whichever?) It is noisy or silence;

    Signal should have been low value in silence environment.

    Signal should have been large value in noisy environment;

    Ideal processing,



    have you ever considerd about 32bit floating point(IEEE standrad)?

    actually i don't know about 32bit floating point(IEEE).

    But, Adobe's audition(commercial sound analysis program) says that sample type is 32bit floating point(IEEE).




    I thank you so much about continuous reply
    • Edited by _yi Wednesday, April 04, 2012 2:42 PM
    • Edited by _yi Wednesday, April 04, 2012 3:00 PM
    • Edited by _yi Wednesday, April 04, 2012 3:04 PM
    • Edited by _yi Wednesday, April 04, 2012 3:06 PM
    • Edited by _yi Wednesday, April 04, 2012 3:07 PM
    • Edited by _yi Wednesday, April 04, 2012 3:12 PM
    • Edited by _yi Wednesday, April 04, 2012 3:14 PM
    •  
  • Wednesday, April 04, 2012 6:14 PM
    Owner
     
     Answered Has Code

    Tracked it down... Once again, an assumption led me astray.  The AudioCaptureRaw sample actually stores stuff in WAVE_FORMAT_IEEE_FLOAT.

    This can be seen by casting the WAVEFORMATEX structure to WAVEFORMATEXTENSIBLE (which is valid because the wFormatTag is the format tag for WAVEFORMATEXTENSIBLE), and examining the SubFormat guid.  In this case, it's {00000003-0000-0010-8000-00AA00389B71}.  These guids are defined in mmreg.h, and looking it up there shows us that this is WAVE_FORMAT_IEEE_FLOAT.  That means that the easiest (and correct :) ) way of accessing the nth sample in the mth stream is:

    float * pFloatSample = ((float *) CaptureBuffer) + n * 4 + m;
    printf("%f", *pFloatSample);
    

    Note, we do the cast of CaptureBuffer BEFORE doing any arithmetic, so that the pointer arithmetic moves at the correct cadence.
    _yi, please double check/verify that you see this work correctly.

    • Marked As Answer by _yi Thursday, April 05, 2012 6:29 AM
    •  
  • Thursday, April 05, 2012 6:37 AM
     
     

    I got correct samples, i am done comparing them with matlab data.

    Result is same perfectly. 

    thank you a lot. (     '______')


    • Edited by _yi Thursday, April 05, 2012 6:44 AM
    •  
  • Thursday, April 05, 2012 6:38 AM
    Owner
     
     
    Thx for sticking with us. :)