Clustering Vectors of Different Lengthes

Jawab Clustering Vectors of Different Lengthes

  • Tuesday, August 07, 2012 10:30 PM
     
     
    Hello everybody, 

    I have set of different material histories which are collected as they had happened in a vector.  So the length of the vectors are different. I have wanted to do a clustering on them, but since all the clustering methods I worked are using the same number of attributes, I am a bit confused to do the clustering on them. There are also other characteristics that should be used for clustering. 

    Thanks, 
    Ivan

All Replies

  • Wednesday, August 08, 2012 5:41 AM
    Answerer
     
     

    Can you give us examples of what you are storing in the vectors?


    Tatyana Yakushev [PredixionSoftware.com]

  • Wednesday, August 08, 2012 6:18 AM
     
     
    I have a set of different materials to be tested. There are about 100 family of these materials but within each family there are various conditions which are not characterized in this stage. Instead I want to do a set of experiments on these materials to see how they react. The experiments are continues and every time there is a change, the sensors will recognize that and store the material condition in a vector. So there are the family information and the history of the experiment. Since the length of the vectors are different, I cannot use the conventional clustering methods; Also because those experimental characteristics are categorical (not numeric), I cannot transform the vectors to a number easily. I was thinking about using the "Levenshtein Distance", but I wasn't sure about the efficiency of the approach.
    • Edited by ivan65 Wednesday, August 08, 2012 6:49 AM
    •  
  • Wednesday, August 08, 2012 4:36 PM
    Answerer
     
     

    I still don't understand what you have in your data.

    Many data mining algorithms work on rows with same number of columns because they assume that values in the same column can be compared (e.g. in column Age, you only have Age information).

    It looks like column 1 might store one value for one row and very different value on another row. Is that correct?

    What common information can you extract from all of your rows?


    Tatyana Yakushev [PredixionSoftware.com]

  • Thursday, August 09, 2012 5:21 PM
     
     
    I have a number of columns which are the specifications of these materials and the number of columns are the same for all materials, like hardness, density and etc. Beside these information I have a history of materials that are collected over time, like what specific condition has been changed steeply because of uncontrollable factors. Therefore the history of the materials would be different from each other and all of them are almost categorical. We know the history of the materials are important in clustering but we don't know how. So I want to use these histories along with material specification to cluster them. I can do the clustering method for the specification part because all of them have the same number of columns, but the history of material can have different length. I am not sure how to break down the history and convert it to the same number of columns as for all of them, or is there a better way to handle this problem, like clustering methods that can handle different number of columns!!!
  • Thursday, August 09, 2012 11:33 PM
    Answerer
     
     Answered

    The way to do something like this with Microsoft SQL Server is by using nested tables.

    Nested tables are often used for association rules (shopping basket analysis) and time series. I've never used nested tables for segmentation.


    Tatyana Yakushev [PredixionSoftware.com]

    • Marked As Answer by ivan65 Friday, August 10, 2012 5:52 PM
    •