locked
Advice needed for microphone recording and simple processing

    Question

  • I've a scenario where I need to record audio with the microphone and then perform some simple processing. The MediaCapture API provides functionality to record audio in a certain codec into a stream, file or a custom sink. WMA and other codecs are supported, but I'm missing good old PCM (WAV) as available MediaEncodingProfile. So is there a way to get the raw PCM data from the MediaCapture API for post processing or should I look into XAudio2 and write a custom WinMD component for this?

    What I need to do: record from mic, being able to change pitch while playing the recording and render out a file with a certain pitch. I saw that XAudio has some nice built-in effects, but I'm wondering if it's possible to render the output into a file / buffer instead of a device.

    Thanks.

    Monday, May 7, 2012 7:11 PM

Answers

All replies

  • Hello Rene,

    At this time we have not released guidance on how to create a custom profile and do not include PCM as a in box audio profile for the Media Capture element. This is a very highly requested feature that we are considering but have yet to make a determination.

    If you need to get raw PCM data from an audio device to process it in some near real time fashion I would highly recommend that you take a look a the WASAPI APIs (core audio) for use in a WinRT environment (link below). You also have the option of using Media Foundation and writing a custom sink or MFT that can process your audio in the standard Media Foundation pipeline. I would recommend that you study both options and make a careful determination what is going to work best for your business needs.

    Keep in mind that you should never try to do near real time audio processing from within the context of a managed application. This is due to the issues surrounding nondeterministic finalization inherent in all managed languages. A link to my blog post on the subject is listed below.

    Real-time communication sample
    http://code.msdn.microsoft.com/windowsapps/Simple-Communication-Sample-eac73290

    Calling the Format SDK, DirectShow, Media Foundation or the WASAPI from managed code (C#, VB.net)
    http://blogs.msdn.com/b/mediasdkstuff/archive/2009/04/01/calling-the-format-sdk-directshow-media-foundation-or-the-wasapi-from-managed-code-c-vb-net.aspx

    Win32 and COM for Metro style apps (multimedia)
    http://msdn.microsoft.com/en-us/library/windows/apps/hh452756.aspx

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Monday, May 7, 2012 10:46 PM
    Moderator
  • Thanks for the quick answer James! 

    If I had $100 to vote for any WinRT Media feature, I'd probably spend $80 for a PCM EncodingProfile. :)

    I don't need real real-time processing and I'm aware of the implications of managed code, but I don't have a problem using native libs directly. It's all good.

    So do you think XAudio2 isn't a solution? I think it has a nice API and I could easily change the pitch. My only question is if I can render a recording out.

    From what I see, I think the best solution currently is a custom MFT which applies the pitch modification. With that I can manipulate the pitch of the recording while playing. But how to render it out to a file with a custom pitch _after_ it was recorded and not while recording with the MediaCapture? The user needs to be able to change the pitch after the voice was recorded and then save that final result to disk. I see how this can be done with the MediaElement while the recording is played using a custom MFT which is added to the MediaElement. The pitch value can be submitted to the MFT via a parameter (hopefully this parameter passing is fast enough for real-time adjustments). I also see how this pitch shift can be achieved with the MediaCapture API during the recording process to a file / stream. But the recording needs to be neutral without any pitch change. This has to be done as a post-processing step and I don't see how to achieve this with the WinRT Media API.
    Does this make sense? :)

    Tuesday, May 8, 2012 9:32 AM
  • Hello Rene,

    I apologize but I'm not sure I follow. If you want to transcode the file you can use the transcoding APIs and use a custom MFT to modify the pitch and write the file out. Again we don't have a default PCM encoding profile so this might not work as you expect. Please take a look at the links below and let me know if I'm missing your question. I will certainly do what I can to help you get this figured out.

    MediaTranscoder
    http://msdn.microsoft.com/en-us/library/windows/apps/windows.media.transcoding.mediatranscoder.aspx

    Transcoding media sample
    http://code.msdn.microsoft.com/windowsapps/Media-Transcode-Sample-f7ba5269

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Tuesday, May 8, 2012 11:07 PM
    Moderator
  • Thanks for the answer James. 

    I think this approach works. I write a custom MFT PitchEffect with a parameter to control the pitch. I record using a MediaCapture to WMA (or other codec) without any effect. Then for playback I use a MediaElement with the PitchEffect applied and for rendering out a file with the PitchEffect applied I use the MediaTranscoder.

    The pitch should be controlled with a Slider control, this would trigger a change of the pitch parameter of the PitchEffect. Is such a MFT parameter value change possible in real-time using WinRT? Scenario: Recording is played and user can change the pitch while it's playing using a Slider.


    Wednesday, May 9, 2012 9:35 AM
  • Q. Is such a MFT parameter value change possible in real-time using WinRT?

    A. Yes you should be able to add a custom public property to the MFT to modify the pitch in near real time. Keep in mind that you will need to use MoCOM to do this and it may be more complicated than you think. The following post might help to get you started.

    How to pass IMFAttributes/IPropertySet to MediaCapture.AddEffectAsync() or MediaElement.AddVideoEffect()?
    http://social.msdn.microsoft.com/Forums/en-US/winappswithcsharp/thread/f140c786-032a-4892-b502-baae0127a5cb

    I hope this helps,

    James


    Windows Media SDK Technologies - Microsoft Developer Services - http://blogs.msdn.com/mediasdkstuff/

    Wednesday, May 9, 2012 8:44 PM
    Moderator
  • Thanks. Helps a lot. :)
    Tuesday, May 15, 2012 1:47 PM
  • Hey Rene!

    i have a similar scenario with me. but i am facing exception while capturing "Audio Only" with MediaCaptureAPI. the CaptureElement seems no working with "Audio Only". if u were able to capture "Audio only". can u please share a piece of code with me...?

    currently I am using the following code but getting an Exception(HRESULT: 0xC00D36D5).

    MediaCapture captureMgr = new MediaCapture();   
       
    MediaCaptureInitializationSettings captureSettings = new MediaCaptureInitializationSettings();
        captureSettings
    .StreamingCaptureMode = StreamingCaptureMode.Audio;
        await captureMgr
    .InitializeAsync(captureSettings);
       
    this.CaptureVideoElement.Source = captureMgr;// exception is thrown here...
        await captureMgr
    .StartPreviewAsync();

    Wednesday, September 12, 2012 1:57 PM
  • Hi Fastian ,

    I have also same problem. Actually I have changed my platform from c# to c++ development for this issue. What I think WASAPI is the only hope for capturing RAW data in c++.

    Thursday, November 8, 2012 1:23 PM