locked
can WMA encoder encodes one IMFSample? RRS feed

  • Question

  • I am trying to write a live video conference project, and now I am working on the audio capture and playback part in local computer. If I just use the IMFSample obtained from the source reader to playback without codec, it can work well. However, when I try to add the codec, there arises a problem with the audio encoder.

    Let me first introduce the process of my program generally.

    Microphone ->media source->Source reader->Sample->encode->decode->playback

     

     

    For the media source, I used asynchronously SourceReader::ReadSample() function, and dealing with each sample in the Callback function OnReadSample().  For the CODEC, I use Media Foundation Transform (MFT)  sdk. I referenced to an WMA encoder sample code on http://msdn.microsoft.com/en-us/library/dd206741(VS.85).aspx.

    For each Sample obtained from the source reader, I gave it to the MFT::processInput(IMFSample * sample) and it returned S_OK, then I tried to obtain the output encoded sample. However, the call of MFT::processOutput(IMFSample ** sample) inside the encoder always returned MF_E_TRANSFORM_NEED_MORE_INPUT.  This problem has puzzled me for several days, can you be so kind to give me some help? I try to summarize my question into the following:

    1.       For the media type negotiation, I set the media type obtained from the source reader as the input type of the encoder, and then call MFT:: GetOutputAvailableType(), set the returned type as the output type of the encoder without altering anything. Is this correct?? I am wondering that how does the encoder know the output bitrate as I didn’t configure it.  Does this result in the failure of the encoder?

    2.       My idea is very simple, I just want the encoder to encode the data in one MFSample and then the decoder to decode this MFSample, and then send the decoded MFSample data to playback.  Is my idea too naïve?  Can the WMA encoder encode one MFSample independently? Or should I set some attribute to enable this?

    3.       I also saw something document about the leaky bucket, does this have some relation to the failure of encoder as I didn’t use it?

    Please give me your advice or references that I can study further on this issue.   Thank you in advance.

    Monday, August 30, 2010 6:18 AM

Answers

  • I haven't done any work with audio yet but I'm assuming that the encoder behaviour is fairly similar to video. Instead of dealing with keyframes, you have to provide the minimum number of audio sample required by the encoder to produce an output. It possibly is a function of the audio sampling frequency.
    Monday, August 30, 2010 7:49 AM

All replies

  • Encoded video is processed through a Finite State Machine. In order to generate encoded output, at least two key-frames must have passed through the encoder. This allows the encoder to encode the frames encoded between keyframes. I had the same problem when I was decoding H264. I expected a frame output after a single frame input.

    Try putting 30 more frames or so into the encoder and keep trying to call ProcessOutput. If your keyframe distance is more than 30 frames, put even more in.

    Monday, August 30, 2010 6:46 AM
  • Thanks so much for reply. But now the problem is Audio encoder not Video encoder. I used WMA encoder, can you give me some help?

    Monday, August 30, 2010 6:51 AM
  • I haven't done any work with audio yet but I'm assuming that the encoder behaviour is fairly similar to video. Instead of dealing with keyframes, you have to provide the minimum number of audio sample required by the encoder to produce an output. It possibly is a function of the audio sampling frequency.
    Monday, August 30, 2010 7:49 AM
  • Actually I have completed the video part and I just encode each sample into WMV format and decode each sample into RGB format, and it can work! I thought this would also work for audio, but unfortunately no! I don't know where is wrong.

    As you metioned that maybe I should input more samples, where can I know that the exactly number of the sample distance? I look through the MSDN documents and find nothing helpful.

     

    Monday, August 30, 2010 8:19 AM
  • Your sample duration/distance is related to the digital audio sampling frequency. For example, CD quality audio is 44kHz with 16bit precision. That's 44,000 samples per second. Or you could look at it as a sample duration of 25 micro-seconds. Media Foundation's time scale is measured in 100 nano-second units so your sample duration and distance for 44kHz audio is 250 units.

    Like I said, I haven't done audio with MF yet. I'm guessing that an audio sample isn't a single sample point. The sample buffer probably contains a continuous audio sample which could be one millisecond long. I would recommend reading through the audio/video capture programming guide to you can get a better idea of sample processing http://msdn.microsoft.com/en-us/library/dd317912.aspx.

    Monday, August 30, 2010 9:54 PM