I've got a dilemma which I hope someone has a solution to.

Let's say we're building a data mining model to predict aircraft reliability. In the training table we've got a column (among many others) with a unique aircraft ID, and then a column for the type (737,747) and then a column for the series (100,200,300). I.E. A 737-800 series would be "737" and "800".

There is in essence a parent-child relationship between these 2 columns. 737's should share a common set of reliability factors, and then those factors might be further defined by the series number (for instance, the 737 might have very reliable radar except for the 500 series). The series is analogous to what model year a car is. What I want to make sure doesn't happen is for the system to correlate a 747-400 and a 737-400 because they are the same series. They are totally independent if the model number is different.

My only idea was to merge the columns and have a single value "737-100". But it would seem then that the model won't have any idea that a "737-100" and "737-200" should have a lot more in common than a "737-100" because the values will be completely different.

I was hoping to find some sort of parent-child hint in the column properties but found none.

What solutions have other people tried? It sure seems that there should be an elegant solution for something like, but I'm missing it.

Geof