I'm exploring text mining to make some sense out of some (legal)documents. I created dictionary (stemmed words) and created vectors used it as a nested table with my training data as a case table and created clusters using scalable/non scalable EM(for I believe text attributes can only be discreet)
I can see the population distribution of the variables among clusters and its probabilities but I cannot tell exactly what each cluster is? I'm lost to the point where I want to find out what each cluster makes it different from others. Since I cannot k-mean discreet attributes and attributes are sparsely distributed. I don't know what should be my next step in mining?
I'll appreciated any advice or suggestion that can help me make any sense out it.