Context Sensitive Search Engine with Classification

Proposed Answer Context Sensitive Search Engine with Classification

  • 27 ธันวาคม 2554 2:43
     
     

    Problem Description - We have developed a Context sensitive search module which takes help from a Thesaurus and Dictionary. It displays the result  in a classified way ( because of Keyword Indexing) as per the context.

     

    Q1. The dictionary and Thesaurus we are using are very  small and experimental. Where do we get exhaustive dictionary and Thesaurus which we can maintain inside SQL server or is there a facility in SQL Server already ?

    Q2. The module is capable of indexing readable files - how do we use the audio to text converting modules or other existing Microsoft tools to index Video and audio files ?

    Q3. Is there any known algorithm of classification of documents ( we are using Keyword density and relevance)  with the help of Keywords and / or context ?

    Please tell us what are the modern research going on in this area.

     

    Thanks in advance

    Sabya

    www.goace.com

ตอบทั้งหมด

  • 27 ธันวาคม 2554 5:01
     
     คำตอบที่เสนอ

    Q3. Is there any known algorithm of classification of documents ( we are using Keyword density and relevance)  with the help of Keywords and / or context ?

     

    Not so sure about the first two questions but Neural network (kind of expensive in terms of training, prediction and understanding) and Logistic regression (less expensive than Neural network w/o the hidden layer) algorithms works very well with text classification. We usually work with SVM which beats both of these algorithms in terms of accuracy for text classification. Sadly, MS doesn't have SVM(support vector machine) so you need to use a third party tool or build it in house.

    Also, in text classification exercise, you cannnot rely on one algorithm or one classification method to give you the desired output. Basically, you have to follow the classic waterfall approach of combining various classification methods and algorithms starting from stem texts to combining various algorithms and ranking based on $adjusted probability and even leaving some area for human reviews.

    hth,

    Rok

     

     

    hth,

    Rok


    please remember to mark as answered if the post helped resolve the issue.



    • แก้ไขโดย rok1 27 ธันวาคม 2554 5:11
    • แก้ไขโดย rok1 27 ธันวาคม 2554 5:11
    • แก้ไขโดย rok1 27 ธันวาคม 2554 5:12
    • แก้ไขโดย rok1 27 ธันวาคม 2554 5:14
    • เสนอเป็นคำตอบโดย Jerry NeeModerator 27 ธันวาคม 2554 9:44
    • ทำเครื่องหมายเป็นคำตอบโดย Jerry NeeModerator 5 มกราคม 2555 10:12
    • ยกเลิกการทำเครื่องหมายเป็นคำตอบโดย Tatyana YakushevEditor 6 มกราคม 2555 17:01
    •  
  • 27 ธันวาคม 2554 5:11
     
     

    Q3. Thanks Rok1 for replying. What about NLP ( Natural Language Processing) algorithms ? In SVM - can you process like a natural Language ?

    Our far reaching objective is to build a "Software Robot" which learns like a human child from the millions of documents available on the internet - the software should be able to link them and conclude a decision, get it verified by repeated mentioning of the same thing in different documents etc.. Does SVM helps in building these facilities ?

     

    I am not aware about SVM - but, I will bounce back with more questions, once I study SVM.

    Sabya

  • 27 ธันวาคม 2554 9:59
     
     

    Thanks Rok1 for the added valuable note.

    I liked your answer but could not Mark it as THE answer - as this is only answering one of the three questions. Sorry about it. Also - you did not comment / reply / answer my question on NLP and "Learning software Robot"as yet.

    Regards

    Sabya

  • 27 ธันวาคม 2554 17:10
     
     

    What about NLP ( Natural Language Processing) algorithms ? In SVM - can you process like a natural Language ?

    -YES 

     

    I liked your answer but could not Mark it as THE answer - as this is only answering one of the three questions. Sorry about it. Also - you did not comment / reply / answer my question on NLP and "Learning software Robot"as yet.

    - No problem, I'm just trying to help with whatever knowledge I have.

     

    Good Luck !!

    Rok,


    please remember to mark as answered if the post helped resolve the issue.
    • แก้ไขโดย rok1 27 ธันวาคม 2554 17:12
    •  
  • 28 ธันวาคม 2554 3:13
     
     

    Great.

    Thanks a lot.

  • 6 มกราคม 2555 7:06
     
     

    Hello All,

     

    I find that the question is not getting attention as it is Marked as "Answered". But, I still have the following two questions UN-Answered.

    Problem Description - We have developed a Context sensitive search module which takes help from a Thesaurus and Dictionary. It displays the result  in a classified way ( because of Keyword Indexing) as per the context.

     

    Q1. The dictionary and Thesaurus we are using are very  small and experimental. Where do we get exhaustive dictionary and Thesaurus which we can maintain inside SQL server or is there a facility in SQL Server already ?

    Q2. The module is capable of indexing readable files - how do we use the audio to text converting modules or other existing Microsoft tools to index Video and audio files ?

    Any help will be appreciated.

    Regards

    Sabya

  • 10 มกราคม 2555 10:18
     
     

    Hello Sabya,

    Answer 2: AFAIK, SQL Server does not support audio to text or video to text capabilities.

    You might want to consider an out of box solution


    Please vote as helpful or mark as answer, if it helps