none
Face features algorithm and PCA RRS feed

  • Question

  • Hello,

    I'm doing a master's degree project about facial recognition and I'm using the Kinect 2.

    I'm extracting from the HD face source the 94 shape units and I'm using them as features for training a classifier for face recognition.

    I need to know for my project which algorithm the kinect used to extracting the shape units?

    I've noticed that the first 10 elements in the shape units are PCA, I need to know how the kinect choose the 10 strongest features as PCA?

    thanks,

    Dror.

    Wednesday, August 5, 2015 5:25 PM

All replies

  • This is interesting.

    The Shape Units/deformations are created when you go through the FaceModel building process. I know in general the way it works is that the face model process gets a snapshot of your face at various angles, by rotating your frontal view face left, right, up and down. These snapshots are then run through the algorithm and mathced against a "MEAN" face. The snapshots are then differentiated against the "MEAN" face to create the deformations/ShapeUnits.

    Once the algorithm has been trained, the deformations are the differential results from the FaceModel that was built. This algorithm on how the Kinect team did this AFAIK is not released to the public. Matthew Simari was one of the engineers who worked on the FaceModel, he's also the one who's speaking on the HD Face Section in the jumpstart videos. Maybe you can reach out to him and ask.

    It's interesting that you're going to use these coefficients to do your own training. I would think that using the 1300 Face Vertices would be more data, and more efficient in terms of building your own model, but again I guess that depends on your situation and the model you're trying to build. May I ask why you need the entropy used figure out the 10 strongest features as PCA?


    Sr. Enterprise Architect | Trainer | Consultant | MCT | MCSD | MCPD | SharePoint TS | MS Virtual TS |Windows 8 App Store Developer | Linux Gentoo Geek | Raspberry Pi Owner | Micro .Net Developer | Kinect For Windows Device Developer |blog: http://dgoins.wordpress.com


    Friday, August 7, 2015 8:01 PM
  • Thanks for your help!

    My project is about identify people using facial recognition algorithms based on machine learning techniques.

    I need to record database of different persons with the kinect and for each person I need to save a unique features vector, then I want to use only the strongest features ant try to distinct between different persons (I need to choose the best features to get best results, therefore I need to use PCA), then when I will record a man from the database my system need to tell which person he is, all that will be based on machine learning techniques (K-means, SVM, NN etc.).

    Therefore, I think that I prefer to use the Shape Units/deformations as features vector instead of using the 1300 Face Vertices (and then extracting features from the Face Vertices).

    I hope you understand what I'm saying...

    How do I contact Matthew Simari?

    thanks again,

    Dror.


    Sunday, August 9, 2015 6:26 PM
  • Yes I more than follow what you're saying.

    I actually did a presentation on this for the MVP V-Conf earlier this year. I'll post the details once I find it online...

    Have you looked into Project Oxford? This does very similar tasks:

    https://www.projectoxford.ai/

    While this does not use the Kinect, you can take pictures with the Kinect, upload them to Project Oxford, and use the Face Recognition mechanism along with other Azure ML mechanisms. An example is the Twins web site and how old are you web sites.


    Sr. Enterprise Architect | Trainer | Consultant | MCT | MCSD | MCPD | SharePoint TS | MS Virtual TS |Windows 8 App Store Developer | Linux Gentoo Geek | Raspberry Pi Owner | Micro .Net Developer | Kinect For Windows Device Developer |blog: http://dgoins.wordpress.com


    Sunday, August 9, 2015 6:41 PM
  • Matthew's email is on the email link I sent earlier.

    To add to this a little more, so yes you can use the Shape Units/deformations as features, and you may need to figure out the best classifiers your self using KNN or Clustering, and then run those results through SVM or ANN to get a more accurate model for identifying differences between people.

    I was suggesting the 1300+ HD Face points because they allow you to get more than 94 points. You can get all the 94 points + more distinct areas around the eyes, nose, eyebrows, mouth, cheek, chin and see how all those points around the 94 points are affected to get more precise data.

    I assume project oxford uses edge detection and the database of different people to do their pattern reco. Using Depth with Computer Vision and databases can give you way more detail and accuracy.

    In my presentation I show how you can do this with the Kinect, and AzureML with R modules to detect facial expressions.


    Sr. Enterprise Architect | Trainer | Consultant | MCT | MCSD | MCPD | SharePoint TS | MS Virtual TS |Windows 8 App Store Developer | Linux Gentoo Geek | Raspberry Pi Owner | Micro .Net Developer | Kinect For Windows Device Developer |blog: http://dgoins.wordpress.com

    Sunday, August 9, 2015 6:57 PM
  • Thank you very much!

    Project Oxford looks great but I need to the analysis on Matlab, anyway I want to see your presentation.

    I don't want to use a lot of data so I don't want to use the 1300 HD Face points, it's true that I will get more information but I need to do the project as simple as possible, I remind you it's a master's degree project not a commercial project.

    I didn't find Matthew's email, can you send me again?

    thanks again,

    Dror.

    Monday, August 10, 2015 5:23 AM
  • Although the internal algorithm of the HD Face Tracker is not published it looks to me those 1300 points are not actual tracking points.

    My guess is that internally the tracker outputs the Animation Units per frame after referencing 2D tracking points (not exposed in the API) and the depth buffer. (and when calibration has run it also outputs the set of Shape Units)
    These values are then fed through a blendshape/morphable model which blends the individual shapes into the 1300 vertices of the mesh.

    The fact that MS has stated to not rely on the mesh directly for rendering, as it may change in the future, also seems to comply with this.

    In other words the AU's are the most raw and minimal set of data available for use in your machine learning training. The points are derived from the AU's and besides more processing time will not give you more data.


    Then again this is just my observation and speculation, but I have been in the (facial) motion capture industry for quite some years, using many solutions and algorithms :)


    Brekel

    Monday, August 10, 2015 8:50 AM
    Moderator
  • Yes Brekel I will somewhat agree with you.

    When I refer to the 1342 vertices, I'm specifically talking about the points in the camera space. Specifically, once you obtain these, you can get the depth values, and then determine the facial features, expressions, and in the Dror's case, slight differeneces between people for identification.

    Last year we tried using the AnimationUnits, and a combination of the ShapeUnits, and learned they were specifically for Rig, and modelling 3-D Animation. Not really for determining or using ML to classify and pattern facial features. I'm sure you can figure out a way to do it, but Depth was way easier.

    That's my technical oppinion based on the results we saw.


    Sr. Enterprise Architect | Trainer | Consultant | MCT | MCSD | MCPD | SharePoint TS | MS Virtual TS |Windows 8 App Store Developer | Linux Gentoo Geek | Raspberry Pi Owner | Micro .Net Developer | Kinect For Windows Device Developer |blog: http://dgoins.wordpress.com

    Tuesday, August 11, 2015 5:49 PM
  • See here for the presentation on Kinect with AzureML


    Sr. Enterprise Architect | Trainer | Consultant | MCT | MCSD | MCPD | SharePoint TS | MS Virtual TS |Windows 8 App Store Developer | Linux Gentoo Geek | Raspberry Pi Owner | Micro .Net Developer | Kinect For Windows Device Developer |blog: http://dgoins.wordpress.com

    • Marked as answer by Dror Birnbaum Thursday, August 20, 2015 5:56 PM
    • Unmarked as answer by Dror Birnbaum Thursday, August 20, 2015 5:56 PM
    Tuesday, August 18, 2015 6:34 PM
  • Thank you very much!

    MAML looks like a very useful tool.

    Dror. 

    Thursday, August 20, 2015 5:56 PM
  • Does it possible to detect several faces simultaneously using HD Face source? I mean, does it possible to record the 1300+ vertices of the faces of several people simultaneously?

    thanks,

    Dror.

    Monday, August 24, 2015 3:57 PM