ColorImageFormat.RawYuvResolution640x480Fps15 is not 32 bit but 16 bit per pixel


  • Exploring the Kinect for Windows SDK, I have discovered that the C# tooltip for ColorImageFormat.RawYuvResolution640x480Fps15 states:

    "YUV data (32 bits per pixel, layout corresponding to D3DFMT_LIN_UYVY). Resolution of 640 by 480 at 15 Frames per second."

    If this was true, then the size of the image data would be 640*480*(4 bytes) = 1228800

    Indeed, the size of the image data for all of the other ColorImageFormat at 640 by 480 is this size (1228800 bytes).

    However, when we collect the data under RawYuvResolution640x480Fps15, the size if the data is actually 614400 bytes.

    This indicates that ColorImageFormat.RawYuvResolution640x480Fps15 is not 32 bit but 16 bit per pixel.

    Is there any explanation for the discrepancy?

    As UYVY is one of the YUV 4:2:2 formats, could someone clarify how the data is laid out?

    • Edited by tom.anderson Tuesday, July 24, 2012 10:35 AM added image
    Tuesday, July 24, 2012 10:28 AM

All replies

  • Hello,

    The Raw YUV is encoded in "macro pixels" of 32-bits.  Each macro pixel is the data for 2 pixels which is why the math works out to 16-bits per pixel (even though it's actually 32-bits per macro pixel).

    A good description of the format can be found here:

    I hope this helps.

    -- Jon

    Friday, July 27, 2012 6:57 AM
  • Not to cause trouble, but the article you posted says: "Effective bits per pixel : 16". Although there are 32 bits per macropixel, the U, V each are used once while each Y is used twice per pixel.

    I've tried the conversions listed on but I'm not getting it right somewhere.

    Both the YUV 4:2:2 to RGB conversion listed here. (longer article) and (brief article) says: "To convert... 4:2:2 YUV to RGB, convert the YUV data to 4:4:4 YUV, and then convert from 4:4:4 YUV to RGB." Any hints on the conversion from 4:2:2 to 4:4:4?

    By the way I noticed that the RgbResolution640x480Fps30 seems to have a bit of a color issue around edges, such as a white edge. There will be red and blue striping, which indicates speed over quality. RawYuvResolution640x480Fps15 on the other hand provides a clearer image but at lower frame rate.

    Friday, July 27, 2012 1:41 PM
  • Although it seemed to have a bug previously, I got the YUV444 to RGB888 conversion working. I'll post it here for anyone interested.

            // wikipedia
             private byte[] YUV444toRGB888(byte y, byte u, byte v)
                 var c = y - 16;
                 var d = u - 128;
                 var e = v - 128;
                 byte r = clamp((298 * c + 409 * e + 128) >> 8);
                 byte g = clamp((298 * c - 100 * d - 208 * e + 128) >> 8);
                 byte b = clamp((298 * c + 516 * d + 128) >> 8);
                 byte a = (byte)255;
                 return new byte[] { r, g, b, a };

    And the code for a clamp is something like:

            private byte clamp(int p)
                return p < 0 ? (byte)0 : p > 255 ? (byte)255 : (byte)p;

    • Edited by twocs Friday, July 27, 2012 5:44 PM Code block cleanup
    Friday, July 27, 2012 5:39 PM
  • Hi Tom,

    Here's a diagram of the data stream and an explanation:

    The UYVY | UYVY | UYVY chunks are the individual 8-bit values of Y, U, and V.

    When you want to convert to RGB you process the stream in 4-byte values per 2 pixels.  There is a "Y" value for each pixel.  The "U" and "V" values are shared so you use the same U and V for both pixels.

    Once you've duplicated the U and V terms, you now have a YUV value for each of the 2 pixels which can be converted to RGB using the formula:

    r = clamp(y + 1.402 * (v - 128));
    g = clamp(y - 0.344 * (u - 128) - 0.714 * (v - 128));
    b = clamp(y + 1.772 * (u - 128));

    If you're familiar with YUV or you investigate online you'll come across the terms YCbCr.  The formula above is for YCbCr, most times when a digital format is described as "YUV" it's actually YCbCr.

    The idea behind this pixel format is to store a "Y" term (luminance) for each pixel and share the "U" and "V" terms for both pixels.  The reason this is done is because the human eye can distinguish the luminance or brightness of a pixel a little better than the color values.  Because of this we share the color terms across 2 pixels.  If you were to set the color terms of the formula to 0 you'll notice it turns into r = b= g = y. This result would yield a grayscale image.

    To answer your other question about quality of the RGB feed, yes, at the core it's an issue of speed over quality. However, like most things in life the answer isn't quite that simple.  I've provided details in another thread here:

    I hope this helps,

    -- Jon

    Friday, July 27, 2012 6:20 PM
  • Wouldn't you also use the interpolation to convert from 4:2:2 to 4:4:4, since the pixels in the Y2 position are not actually lined up with the UV in the macropixels but slightly offset?
    Saturday, July 28, 2012 7:04 AM
  • I'm not sure I understand.  The U and V values for the macropixel are most likely averages of the resulting 2 pixel values after the camera hardware has converted from Bayer to RGB and then to YUV.  It could be that the camera hardware is doing something tricky here, but at that level anything is possible and is most likely proprietary. :)

    I just realized you could mean that the U and V values correspond to the left-most pixel in the macro pixel, so doing a linear interpolation of the current and next macro pixels U and V values may provide a better result.  I don't think that is the way the format is specified.  I have always interpreted it to mean the U and V values are averaged for the 2 pixels in each macro pixel, however I suppose you'd have slightly better values since every other value would be a "true" value -- instead of every value being an average.

    -- Jon

    Tuesday, July 31, 2012 4:04 AM
  • Like I pointed out, the MSDN article says we first need to get to 4:4:4 before applying the formulas to get to RGB: "ToConverting 4:2:2 YUV to 4:4:4 YUV requires horizontal upconversion by a factor of two.

    In the 4:2:2, the data is arranged like U0 Y0 V0 Y1 - U1 Y2 V1 Y3 - U3 Y4 V3 Y5 - etc...

    For upconversion, the values of the even pixels are calculated using the even Y values with the surrounding U and V. However, the values of the odd pixels should be calculated according similarly to the vertical upconversion:

    clip((9 * (Cin[0] + Cin[1]) - (Cin[0] + Cin[2]) + 8) >> 4);

    But the part that is confusing me is that in vertical upconversion it is the luma values (Y). I'm thinking that the upconversion for 4:2:2 to 4:4:4 should be for the chroma values, but the MSDN article doesn't give the formulas.

    Does this mean we will calculate something like this:

    U0out = U0in

    U1out = clip((9 * (U0in + U1in) - (U0in + U3in) + 8) >> 4);

    U2out = U1in

    • Edited by twocs Saturday, August 04, 2012 2:10 AM
    Saturday, August 04, 2012 1:25 AM