Clustering Vectors of Different Lengthes
-
7 สิงหาคม 2555 22:30Hello everybody,
I have set of different material histories which are collected as they had happened in a vector. So the length of the vectors are different. I have wanted to do a clustering on them, but since all the clustering methods I worked are using the same number of attributes, I am a bit confused to do the clustering on them. There are also other characteristics that should be used for clustering.
Thanks,
Ivan
ตอบทั้งหมด
-
8 สิงหาคม 2555 5:41ผู้ตอบ
Can you give us examples of what you are storing in the vectors?
Tatyana Yakushev [PredixionSoftware.com]
-
8 สิงหาคม 2555 6:18I have a set of different materials to be tested. There are about 100 family of these materials but within each family there are various conditions which are not characterized in this stage. Instead I want to do a set of experiments on these materials to see how they react. The experiments are continues and every time there is a change, the sensors will recognize that and store the material condition in a vector. So there are the family information and the history of the experiment. Since the length of the vectors are different, I cannot use the conventional clustering methods; Also because those experimental characteristics are categorical (not numeric), I cannot transform the vectors to a number easily. I was thinking about using the "Levenshtein Distance", but I wasn't sure about the efficiency of the approach.
- แก้ไขโดย ivan65 8 สิงหาคม 2555 6:49
-
8 สิงหาคม 2555 16:36ผู้ตอบ
I still don't understand what you have in your data.
Many data mining algorithms work on rows with same number of columns because they assume that values in the same column can be compared (e.g. in column Age, you only have Age information).
It looks like column 1 might store one value for one row and very different value on another row. Is that correct?
What common information can you extract from all of your rows?
Tatyana Yakushev [PredixionSoftware.com]
-
9 สิงหาคม 2555 17:21I have a number of columns which are the specifications of these materials and the number of columns are the same for all materials, like hardness, density and etc. Beside these information I have a history of materials that are collected over time, like what specific condition has been changed steeply because of uncontrollable factors. Therefore the history of the materials would be different from each other and all of them are almost categorical. We know the history of the materials are important in clustering but we don't know how. So I want to use these histories along with material specification to cluster them. I can do the clustering method for the specification part because all of them have the same number of columns, but the history of material can have different length. I am not sure how to break down the history and convert it to the same number of columns as for all of them, or is there a better way to handle this problem, like clustering methods that can handle different number of columns!!!
-
9 สิงหาคม 2555 23:33ผู้ตอบ
The way to do something like this with Microsoft SQL Server is by using nested tables.
Nested tables are often used for association rules (shopping basket analysis) and time series. I've never used nested tables for segmentation.
Tatyana Yakushev [PredixionSoftware.com]
- ทำเครื่องหมายเป็นคำตอบโดย ivan65 10 สิงหาคม 2555 17:52