none
Automatic Feature Selection on Machine Learning Algorithms RRS feed

  • Question

  • Hi,

    I was trying to understand things about how the machine learning algorithms available for example in

    Initialize Model>Classification>Two-Class Logistic Regression

    together with

    Train Model

    work to make the Features Selection. I ve worked with traditional statistics, and made a handmade features analysis,  p-values, Wald test, Factor Analysis, Correlations, and so on...

    I've seen the Features Selection Modules, which make it possible to analyze the features, and it's fine.

    But would like to know if it's possible to know what is done in the leaning processes.

    In this page

    https://msdn.microsoft.com/en-us/library/azure/dn905912.aspx

    I 've read:

    "...some machine learning algorithms use some kind of feature selection or dimensionality reduction as part of the learning process . When you use these learners, you can skip the feature selection process and let the algorithm decide the best inputs."

    My Question is:

    May the user always trust that the best choice is done?(ok, I'm suspicious) How could the user know what is done  to try anything else to try to improve the results?

    Thanks!!!


    rciani



    Thursday, June 30, 2016 10:07 PM

Answers

  • There is no automatic feature selection for these algorithms by default. You can use the following to reduce the number of features using dimensionality reduction

    https://msdn.microsoft.com/library/azure/8be18eb5-ddd8-4d12-8573-7ae10d5f72fb

      
    Friday, July 1, 2016 6:14 AM
    Moderator
  • No worries.  Principal Component Analysis attempts to find the features which are most important to explaining changes in your dataset - so that the unimportant features can be disregarded and the model can be simplified.

    You might want to check out some of the videos posted on the topic.  It looks like there are several strong candidates returned by this search

    https://www.youtube.com/results?search_query=principal+component+analysis

    That would probably help in understanding what the module's doing and why the parameters are significant.

    jmp

    • Marked as answer by Renato_rciani Monday, July 11, 2016 4:36 PM
    Sunday, July 10, 2016 5:05 PM
    Moderator
  • Thought you knew this stuff - thanks for the clarification/confirmation.  From the page you linked above, "these learners" which enable you to "skip the feature selection process" are documented a little further down in that same page.  If you use one of them, you can skip feature selection as a discrete step in your experiment.

    The salient text is below my signature.

    jmp

    Machine Learning Methods that Use Feature Selection

    Some learners in Azure Machine Learning Studio also provide parameters that can be used to optimize feature selection when training. If you are using a method that has its own heuristic for choosing features, it is often better to rely on that heuristic rather than pre-selecting features.

    Boosted Decision Tree Classification Models and Boosted Decision Tree Regression Models

    In these modules, internally a feature summary is created and features with weight 0 are not used by any tree splits.

    When you review the results of the model, make a note of these unused columns as they are likely candidates for removal.

    Parameter sweeping is recommended to optimize selection.

    Logistic Regression Models

    The modules for multiclass and binary logistic regression support L1 and L# regularization.

    Regularization is a way of adding constraints when training to manually specify some aspect of the learned model. Regularization is generally used to avoid overfitting.

    Machine Learning Studio supports regularization for the L1 or L2 norms of the weight vector in linear classification algorithms.

    • L1 regularization is useful if the goal is to have a model that is as sparse as possible.

    • L2 regularization prevents any single coordinate in the weight vector from growing too much in magnitude, so it is useful if the goal is to have a model with small overall weights.

    • L1-regularized logistic regression is more aggressive about assigning a weight of 0 to features, and therefore useful in identifying features that can be removed.

    Tuesday, July 12, 2016 12:17 PM
    Moderator

All replies

  • There is no automatic feature selection for these algorithms by default. You can use the following to reduce the number of features using dimensionality reduction

    https://msdn.microsoft.com/library/azure/8be18eb5-ddd8-4d12-8573-7ae10d5f72fb

      
    Friday, July 1, 2016 6:14 AM
    Moderator
  • Thanks for your reply!

    I've been exploring that methods, and the page which I posted the link above made me a little bit confused.

    Is It possible to know which methods  apply the features selections automatically?

    Tks!!!

    rciani


    Thursday, July 7, 2016 3:09 PM
  • No worries.  Principal Component Analysis attempts to find the features which are most important to explaining changes in your dataset - so that the unimportant features can be disregarded and the model can be simplified.

    You might want to check out some of the videos posted on the topic.  It looks like there are several strong candidates returned by this search

    https://www.youtube.com/results?search_query=principal+component+analysis

    That would probably help in understanding what the module's doing and why the parameters are significant.

    jmp

    • Marked as answer by Renato_rciani Monday, July 11, 2016 4:36 PM
    Sunday, July 10, 2016 5:05 PM
    Moderator
  • Ok!!! 

    I'm in experienced with PCA, doing analysis with SAS, SPSS, R... that was not exactly my doubt.

    I think I didn't myself clear.

    My question is: Is there any method implemented on Azure Machine Learning, which perform the feature selection automatically??(in some way that I can't figured out, by the way). This excerpt

     "...When you use these learners, you can skip the feature selection process and let the algorithm decide the best inputs"

    from 

    https://msdn.microsoft.com/en-us/library/azure/dn905912.aspx 

    gave that doubt.

    Thank you!!!



    rciani



    Monday, July 11, 2016 4:47 PM
  • Thought you knew this stuff - thanks for the clarification/confirmation.  From the page you linked above, "these learners" which enable you to "skip the feature selection process" are documented a little further down in that same page.  If you use one of them, you can skip feature selection as a discrete step in your experiment.

    The salient text is below my signature.

    jmp

    Machine Learning Methods that Use Feature Selection

    Some learners in Azure Machine Learning Studio also provide parameters that can be used to optimize feature selection when training. If you are using a method that has its own heuristic for choosing features, it is often better to rely on that heuristic rather than pre-selecting features.

    Boosted Decision Tree Classification Models and Boosted Decision Tree Regression Models

    In these modules, internally a feature summary is created and features with weight 0 are not used by any tree splits.

    When you review the results of the model, make a note of these unused columns as they are likely candidates for removal.

    Parameter sweeping is recommended to optimize selection.

    Logistic Regression Models

    The modules for multiclass and binary logistic regression support L1 and L# regularization.

    Regularization is a way of adding constraints when training to manually specify some aspect of the learned model. Regularization is generally used to avoid overfitting.

    Machine Learning Studio supports regularization for the L1 or L2 norms of the weight vector in linear classification algorithms.

    • L1 regularization is useful if the goal is to have a model that is as sparse as possible.

    • L2 regularization prevents any single coordinate in the weight vector from growing too much in magnitude, so it is useful if the goal is to have a model with small overall weights.

    • L1-regularized logistic regression is more aggressive about assigning a weight of 0 to features, and therefore useful in identifying features that can be removed.

    Tuesday, July 12, 2016 12:17 PM
    Moderator