Seeking advice about the best approach to storing the data from the Kinect 2.0 camera to disk. RRS feed

  • General discussion

  • I am seeking advice about the best (quickest performance) approach to storing the data from the Kinect 2.0 (public preview SDK) camera to disk.


    I am interested in storing the raw image data (compressed), the ushort data for the depth and data about the first (only) tracked body (namely about the 25 joints – orientation/position).


    My questions are at the bottom of this mail but as background, here are a few approaches I have tried or investigated for recording:

    1. 1)      Create a file and write to it using a binary writer.  I flush the writer to disk ever 60 seconds.  I am writing out the byte[] for color image, ushort[] for depth data and I serialize the Kinect information then write to binary reader. 
      • ·         I have to compress the image in memory before writing the bytes, so as to write the smaller image to the file.
      • ·         As the Kinect body is NOT serializable, I had to create a copy of the Kinect Body class into my own custom class which was serializable.
      • ·         Not sure how to compress the depth data so the entire data is stored.
      • ·         Writing to the file happened in real time – not post processing.
      • ·         Advantage – binary data seems to write to disk fast and is smaller in size then text.  Disadvantage, no async possible.
    2. 2)      My next approach was to serialize everything at once to disk using JSON.  
    • ·         Writing to the file happened in a similar manner except instead of a binary writer, I used stream writer.  So it all happens in real time.  When I started to add the data for the color and depth it slowed things way down.
    • ·         I use Newtownsoft to serialize to json.
    • ·         Advantage – human readable, Newtownsoft performs serialization fast.  Disadvantage – file is text so larger than binary, serialization is relatively expensive timewise.  No async.  Very slow when depth and color are added to the mix in real time.
      1. 3)      The next approach I am considering is using async programming and to store each image / depth / body to disk, one by one in an asynchronous manner to keep performance high, then at the end of a recording session, carrying out some post-process to bring all the information together. 
        • ·         The async directive would allow the jpeg conversion to not slow things down in the front.
        • ·         I would store as each frame arrives is a) a jpg image for each color frame, b) a binary file of the depth data, c) a binary file of info.
        • ·         I would then iterate each “frame” which would have a sequence number linking the color/depth/body as a single “frame” and put it in my final output file.  Again this could be done asynchronously.


    I work in a managed world right now – so I am looking for a solution in c# that is almost as fast as c++.


    How do these approaches compare to how Kinect studio works?  Does it write data to file in real time or post process it?  Does it compress the images?  Can the depth data be compressed/decompressed?  Why is the body not serializable?  Is binary format the best for all types of data (color/depth/body)?


    Thanks in advance!


    Friday, September 5, 2014 3:05 PM

All replies

  • With any managed language, there are going to be trade-off's between GC or performance. With the amount of data that will be consumed, unmanaged c++ is the only way to ensure the GC does not interfere and introduce more latency that what will already happen.

    KStudio uses managed wrappers that call into native c++ functions. These .dll's are in the KStudio tools folder of the sdk. Note, there are no redist versions of these binaries. They are only available for development purposes only. If you need to record clips publically, then you will need to write your own wrapper that will record the data as you have outlined.

    To address some of your questions, KS does things in realtime. There is minimal compression (compression vs latency = file space). Slower machines(cpu/hard drives/ram) will significantly result in higher performance requirements. Depth can be by using a lossless compression, each value is a 4byte value and so should uncompress to the same. Body is not a structure, you want to used Joint/Joint orientation data objects.

    Carmine Sirignano - MSFT

    Tuesday, September 9, 2014 6:52 PM
  • Ok - so from what I'm reading, developing a recorder in C++ seems to be the best way to get performance gains.

    I would rather develop my own wrapper anyway, but just as an idea - what sort of things where you managing in C++ that you exposed in your wrapper?  Things like writing the information to disk?  I will open and take a look at it.  Is the source for the Kinect Studio available to download anywhere so i can understand how you guys went about the same task of recording?

    Ok - so there is obviously a lot that needs handled in real time.  Can you elaborate on your recorder and how it handles the data.  I'd suggested a couple of approaches but my main approach was to write compressed image/depth and body information to disk on EVERY frame, then bring it all together at the end.  Do you think this approach would be slower than writing all to the one file for every frame and then flushing it to disk at various points?

    Thanks for your advice.

    Wednesday, September 10, 2014 3:33 PM