How to Make Sample in AudioCaptureRaw(C++)?
-
Wednesday, March 28, 2012 12:19 PM
the AudioCaptureRaw can capture the sound source from MicrophoneArray.
Type of Input Data is BYTE[].
How To Make a Sample, From Byte[]?
plz plz plz X 100000000000000^1000000^10000^10^10^1000^999
All Replies
-
Wednesday, March 28, 2012 4:25 PMOwner
The byte stream is PCM format.
You can look at the WAVEFORMAT_EX structure to see the details (In this sample it's stored in _MixFormat in the CWASAPICapture class.
If you're capturing from the Kinect audio device in this sample, you're capturing the raw audio from the array mic, so you'll see that there are 4 channels, 16000 samples per second, bits per sample is 32.
So, 2 bytes for each sample, samples are interleaved in sets of 4, giving us 8 total bytes and 4 samples for each time quantum.
The samples themselves are stored basically as a short, but the 1st byte is the low order bits.
To get an individual sample, you can use the following code:
short audioSample = static_cast<short>(pBuffer[i] | (pBuffer[i+1] << 8));
This assumes that pBuffer is pointing at the byte array, and that i is even.
To access the ith sample in the nth channel, you would want:short audioSample = static_cast<short>(pBuffer[ i * 8 + m * 2] | (pBuffer[i * 8 + m * 2 + 1] << 8));
Note, if you're picking up raw data as demoed in that sample, you are not getting the BeamForming, noise suppression, or AEC. If you want those pieces of advanced functionality, you need to use the DMO.
For lots of audio processing you actually want a float rather than a short... If you need that, you simply cast the sample value to a short and divide by the range (65535).
- Proposed As Answer by Chris White _MSFT_Microsoft Employee, Owner Wednesday, March 28, 2012 4:26 PM
- Edited by Chris White _MSFT_Microsoft Employee, Owner Friday, March 30, 2012 5:43 AM had a slight error in the code
- Unproposed As Answer by Chris White _MSFT_Microsoft Employee, Owner Tuesday, April 03, 2012 4:17 PM
-
Thursday, March 29, 2012 5:47 AM
I try executing code below, before captureBuffer is closed.
In main function,
int m =0; //select channel for(int i=0; i<captureBufferSize/8;i=i+2) { short audioSample = static_cast<short>(captureBuffer[ i * 4 + m * 2] | (captureBuffer[i * 4 + m * 2 + 1] << 8)); printf("%d ", audioSample); } delete []captureBuffer;
but the result doesn't seem to look like a sample like this.
What is Microsoft Employee's opinion about this?
-
Friday, March 30, 2012 5:49 AMOwner
FYI, I had a slight bug in one of the code samples... I had a *4, where I should have had a *8.
What do you mean when you say they don't look like samples? Are you concerned that the data stream isn't matching what you believe should be there, or something about formatting, output, presentation, interpretation, etc... ? :)
-
Monday, April 02, 2012 5:25 AM
int m =0; //select channel for(int i=0; i<3000;i=i+1) { short audioSample = static_cast<short>(captureBuffer[ i * 8 + m * 2] | (captureBuffer[i * 8 + m * 2 + 1] << 8)); printf("%d ", audioSample); } delete []captureBuffer;
result :
Code in this post is related to the second picture.
two bytes from starting point make a ch0 sample indexed 0, next two bytes make a ch1 sample indexed 0, ... , next two bytes make a ch3 sample indexed 0, next two bytes is a ch0 sample indexed 1, .... this is referenced from Microsoft Employee's reply,
theoritically, i calculate the sample should be low value near 0, but the sample are -29282, 301347....
the result is unexpected,
i imagine the sample must be "12 200 300 230 20 -49 -50 -100 -20 40 30 5 120 130 60 30 -10 -40 -50" becuase recorded surroundings is silence .
Or the sample, -29282, 30147 , is right(correct).
how do you think?
So any help you can give me is really appreciated!
Greetings
- Edited by _yi Monday, April 02, 2012 5:47 AM
- Edited by _yi Monday, April 02, 2012 5:50 AM
- Edited by _yi Monday, April 02, 2012 5:54 AM
- Edited by _yi Monday, April 02, 2012 5:56 AM
- Edited by _yi Monday, April 02, 2012 5:56 AM
- Edited by _yi Monday, April 02, 2012 6:08 AM
- Edited by _yi Monday, April 02, 2012 6:08 AM
- Edited by _yi Monday, April 02, 2012 6:22 AM
-
Monday, April 02, 2012 5:34 PMOwnerOne thing to double check... Make sure that your volume on the mic is set to "3". This will result in a 0 db gain, which is what you want... You could easily be seeing noise amplified by the operating system.
-
Tuesday, April 03, 2012 9:38 AM
Sorry about my little knowledge.i open out.wav in matlab;
the data is different with printed data I make.
i read the out.wav using matlab.
this graph is not larger than 0.05; too low;
but my data is usually near max value.;
i know this data become double type. but consider that +max value is 0.5;
32000 is almostly 0.5 in double type;
Can you give me a code to make samples as double type? following matlab did.
-
Tuesday, April 03, 2012 4:16 PMOwner
Okay, figured it out... The IAudioClient interface is actually using a slightly different format than the native format you get back from the Kinect if you're using the beam forming functionality.
If you examine the WAVEFORMATEX that is returned by the client, you'll see that it's returning 32 bit samples rather than 16, which is the source of the confusion.
int m = 0; // select channel for (int i = 0; i < BufferSize / 16; i +=1) { int iSample = i * 16 + m * 4; long audioSample = static_cast<long>( CaptureBuffer[iSample] | (CaptureBuffer[iSample+1] << 8) | (CaptureBuffer[iSample+2] << 16) | (CaptureBuffer[iSample+3] << 24) ); printf("%d, %f\n", audioSample, (double) audioSample/MAXLONG); }- Marked As Answer by Chris White _MSFT_Microsoft Employee, Owner Tuesday, April 03, 2012 4:16 PM
-
Wednesday, April 04, 2012 2:31 PM
I have TESTED code you make.
but, result is same, i have gotten value of near 0.5(-0.5);
i already have spoken to you in previous post about the true that value is near 0.5.
there are no difference between silence and noisy environment as signal,
value is always near 0.5(-0.5) whatever(whichever?) It is noisy or silence;
Signal should have been low value in silence environment.
Signal should have been large value in noisy environment;
Ideal processing,
have you ever considerd about 32bit floating point(IEEE standrad)?
actually i don't know about 32bit floating point(IEEE).
But, Adobe's audition(commercial sound analysis program) says that sample type is 32bit floating point(IEEE).
I thank you so much about continuous reply- Edited by _yi Wednesday, April 04, 2012 2:42 PM
- Edited by _yi Wednesday, April 04, 2012 3:00 PM
- Edited by _yi Wednesday, April 04, 2012 3:04 PM
- Edited by _yi Wednesday, April 04, 2012 3:06 PM
- Edited by _yi Wednesday, April 04, 2012 3:07 PM
- Edited by _yi Wednesday, April 04, 2012 3:12 PM
- Edited by _yi Wednesday, April 04, 2012 3:14 PM
-
Wednesday, April 04, 2012 6:14 PMOwner
Tracked it down... Once again, an assumption led me astray. The AudioCaptureRaw sample actually stores stuff in WAVE_FORMAT_IEEE_FLOAT.
This can be seen by casting the WAVEFORMATEX structure to WAVEFORMATEXTENSIBLE (which is valid because the wFormatTag is the format tag for WAVEFORMATEXTENSIBLE), and examining the SubFormat guid. In this case, it's {00000003-0000-0010-8000-00AA00389B71}. These guids are defined in mmreg.h, and looking it up there shows us that this is WAVE_FORMAT_IEEE_FLOAT. That means that the easiest (and correct :) ) way of accessing the nth sample in the mth stream is:
float * pFloatSample = ((float *) CaptureBuffer) + n * 4 + m; printf("%f", *pFloatSample);Note, we do the cast of CaptureBuffer BEFORE doing any arithmetic, so that the pointer arithmetic moves at the correct cadence.
_yi, please double check/verify that you see this work correctly.- Marked As Answer by _yi Thursday, April 05, 2012 6:29 AM
-
Thursday, April 05, 2012 6:37 AM
I got correct samples, i am done comparing them with matlab data.
Result is same perfectly.
thank you a lot. ( '______')
- Edited by _yi Thursday, April 05, 2012 6:44 AM
-
Thursday, April 05, 2012 6:38 AMOwnerThx for sticking with us. :)

