Context Sensitive Search Engine with Classification

Proposed Answer Context Sensitive Search Engine with Classification

  • martes, 27 de diciembre de 2011 2:43
     
     

    Problem Description - We have developed a Context sensitive search module which takes help from a Thesaurus and Dictionary. It displays the result  in a classified way ( because of Keyword Indexing) as per the context.

     

    Q1. The dictionary and Thesaurus we are using are very  small and experimental. Where do we get exhaustive dictionary and Thesaurus which we can maintain inside SQL server or is there a facility in SQL Server already ?

    Q2. The module is capable of indexing readable files - how do we use the audio to text converting modules or other existing Microsoft tools to index Video and audio files ?

    Q3. Is there any known algorithm of classification of documents ( we are using Keyword density and relevance)  with the help of Keywords and / or context ?

    Please tell us what are the modern research going on in this area.

     

    Thanks in advance

    Sabya

    www.goace.com

Todas las respuestas

  • martes, 27 de diciembre de 2011 5:01
     
     Respuesta propuesta

    Q3. Is there any known algorithm of classification of documents ( we are using Keyword density and relevance)  with the help of Keywords and / or context ?

     

    Not so sure about the first two questions but Neural network (kind of expensive in terms of training, prediction and understanding) and Logistic regression (less expensive than Neural network w/o the hidden layer) algorithms works very well with text classification. We usually work with SVM which beats both of these algorithms in terms of accuracy for text classification. Sadly, MS doesn't have SVM(support vector machine) so you need to use a third party tool or build it in house.

    Also, in text classification exercise, you cannnot rely on one algorithm or one classification method to give you the desired output. Basically, you have to follow the classic waterfall approach of combining various classification methods and algorithms starting from stem texts to combining various algorithms and ranking based on $adjusted probability and even leaving some area for human reviews.

    hth,

    Rok

     

     

    hth,

    Rok


    please remember to mark as answered if the post helped resolve the issue.



    • Editado rok1 martes, 27 de diciembre de 2011 5:11
    • Editado rok1 martes, 27 de diciembre de 2011 5:11
    • Editado rok1 martes, 27 de diciembre de 2011 5:12
    • Editado rok1 martes, 27 de diciembre de 2011 5:14
    • Propuesto como respuesta Jerry NeeModerator martes, 27 de diciembre de 2011 9:44
    • Marcado como respuesta Jerry NeeModerator jueves, 05 de enero de 2012 10:12
    • Desmarcado como respuesta Tatyana YakushevEditor viernes, 06 de enero de 2012 17:01
    •  
  • martes, 27 de diciembre de 2011 5:11
     
     

    Q3. Thanks Rok1 for replying. What about NLP ( Natural Language Processing) algorithms ? In SVM - can you process like a natural Language ?

    Our far reaching objective is to build a "Software Robot" which learns like a human child from the millions of documents available on the internet - the software should be able to link them and conclude a decision, get it verified by repeated mentioning of the same thing in different documents etc.. Does SVM helps in building these facilities ?

     

    I am not aware about SVM - but, I will bounce back with more questions, once I study SVM.

    Sabya

  • martes, 27 de diciembre de 2011 9:59
     
     

    Thanks Rok1 for the added valuable note.

    I liked your answer but could not Mark it as THE answer - as this is only answering one of the three questions. Sorry about it. Also - you did not comment / reply / answer my question on NLP and "Learning software Robot"as yet.

    Regards

    Sabya

  • martes, 27 de diciembre de 2011 17:10
     
     

    What about NLP ( Natural Language Processing) algorithms ? In SVM - can you process like a natural Language ?

    -YES 

     

    I liked your answer but could not Mark it as THE answer - as this is only answering one of the three questions. Sorry about it. Also - you did not comment / reply / answer my question on NLP and "Learning software Robot"as yet.

    - No problem, I'm just trying to help with whatever knowledge I have.

     

    Good Luck !!

    Rok,


    please remember to mark as answered if the post helped resolve the issue.
    • Editado rok1 martes, 27 de diciembre de 2011 17:12
    •  
  • miércoles, 28 de diciembre de 2011 3:13
     
     

    Great.

    Thanks a lot.

  • viernes, 06 de enero de 2012 7:06
     
     

    Hello All,

     

    I find that the question is not getting attention as it is Marked as "Answered". But, I still have the following two questions UN-Answered.

    Problem Description - We have developed a Context sensitive search module which takes help from a Thesaurus and Dictionary. It displays the result  in a classified way ( because of Keyword Indexing) as per the context.

     

    Q1. The dictionary and Thesaurus we are using are very  small and experimental. Where do we get exhaustive dictionary and Thesaurus which we can maintain inside SQL server or is there a facility in SQL Server already ?

    Q2. The module is capable of indexing readable files - how do we use the audio to text converting modules or other existing Microsoft tools to index Video and audio files ?

    Any help will be appreciated.

    Regards

    Sabya

  • martes, 10 de enero de 2012 10:18
     
     

    Hello Sabya,

    Answer 2: AFAIK, SQL Server does not support audio to text or video to text capabilities.

    You might want to consider an out of box solution


    Please vote as helpful or mark as answer, if it helps