none
Latent Dirichlet Allocation - Feature Topic Matrix - Top words - Unable to interpretate topics - LDA RRS feed

  • Question

  • Hi,

    I am using LDA to get 50 topics from a column of a dataset (200.000 rows) that contain text (already preprocessed). Everything seems to work well but when I get to the Feature Topic Matrix and order the words (by using a Python script), these are not representative at all of any topic (actually, top words are weird words, sometimes containing spelling mistakes). 

    Does anyone have the same problem? Am I missing something?

    Many thanks in advance.

    Tuesday, August 13, 2019 11:27 AM

All replies

  • Hello Eduardo,

    Is it possible to provide the dataset used in your experiment along with the settings used with LDA module to replicate the issue?

    Also, does the spelling mistakes show up in feature topic matrix when you do not use the python script?

    Using the default settings of LDA along with the sample "Book Reviews from Amazon" dataset the output does not show any discrepancy.

    -Rohit

    Friday, August 16, 2019 8:24 AM
    Moderator
  • Hello Rohit,

    Yes, I can provide you with the dataset and the settings. Should I share the experiment with you? Or sendind you the dataset?

    Many thanks,

    Eduardo.

    Tuesday, August 20, 2019 12:25 PM
  • Hello Eduardo,

    You can share the sample dataset and settings to us at this email id AzCommunity[at]microsoft[dot]com to replicate the same.

    -Rohit

    Thursday, August 22, 2019 5:05 AM
    Moderator