Answered by:
How to model nested objects in an Azure Search Index (Was: How to represent a list of code and descriptions in search documents)

Question
-
I have a material that represents a document, each material can be included in more or more modules collections:
Modules are represented by a Module Code and have a Description, while I can represent the code as a list I was wondering if there was a way to keep the correlation between the code and description intact.
The json (other fields omitted) for the field representation looks like this:
"tags": [
{
"moduleCode": "GBM48",
"moduleDescription": "Construction In General"
},
{
"moduleCode": "GBM51",
"moduleDescription": "Civil Engineering"
}
]This is being rejected on upload (400), any guidance appreciated (it may be way out of scope).
Regards,
Mark
- Edited by Bruce Johnston - MSFTModerator Tuesday, April 5, 2016 8:57 PM Making title more precise since this is a FAQ
Wednesday, November 26, 2014 4:19 PM
Answers
-
Hi Mark,
You're seeing the 400 error because Azure Search does not support fields containing arrays of objects. The only collection type we support currently is Collection(Edm.String).
Assuming you want to be able to search for documents associated with a given moduleDescription, there are two approaches we would recommend, depending on the frequency of updates to your index and the structure of your data:
- If your index is updated very rarely (in particular, if the moduleDescription for a given moduleCode changes rarely if ever), AND if there are relatively few documents per moduleCode and few moduleCodes per document, then you should consider duplicating all the other document properties for each unique moduleCode. In other words, denormalize documents and modules into a single index. In this case you'd have two string fields, one for moduleCode and one for moduleDescription.
- If either of the above conditions does not hold (i.e. -- module data is updated frequently or the relationship between documents and modules has high cardinality), then you should consider having two separate indexes for documents and modules. The documents index would have a Collection(Edm.String) field containing all the moduleCodes associated with each document, while the modules index would have a string field for moduleCode and another for moduleDescription. Although Azure Search doesn't support joins, you can achieve the same effect in this case by issuing two queries and joining the results by moduleCode on the client side.
The first solution is trading off index update speed and complexity in favor of query speed. The second solution trades off some query performance in favor of ease of index updates. Which you choose depends on your circumstances. We would advise testing both approaches with realistic data first, especially if you have a lot of data.
Hope this helps,
-Bruce
- Marked as answer by Bruce Johnston - MSFTModerator Thursday, November 27, 2014 2:17 AM
Thursday, November 27, 2014 2:17 AMModerator
All replies
-
Hi Mark,
You're seeing the 400 error because Azure Search does not support fields containing arrays of objects. The only collection type we support currently is Collection(Edm.String).
Assuming you want to be able to search for documents associated with a given moduleDescription, there are two approaches we would recommend, depending on the frequency of updates to your index and the structure of your data:
- If your index is updated very rarely (in particular, if the moduleDescription for a given moduleCode changes rarely if ever), AND if there are relatively few documents per moduleCode and few moduleCodes per document, then you should consider duplicating all the other document properties for each unique moduleCode. In other words, denormalize documents and modules into a single index. In this case you'd have two string fields, one for moduleCode and one for moduleDescription.
- If either of the above conditions does not hold (i.e. -- module data is updated frequently or the relationship between documents and modules has high cardinality), then you should consider having two separate indexes for documents and modules. The documents index would have a Collection(Edm.String) field containing all the moduleCodes associated with each document, while the modules index would have a string field for moduleCode and another for moduleDescription. Although Azure Search doesn't support joins, you can achieve the same effect in this case by issuing two queries and joining the results by moduleCode on the client side.
The first solution is trading off index update speed and complexity in favor of query speed. The second solution trades off some query performance in favor of ease of index updates. Which you choose depends on your circumstances. We would advise testing both approaches with realistic data first, especially if you have a lot of data.
Hope this helps,
-Bruce
- Marked as answer by Bruce Johnston - MSFTModerator Thursday, November 27, 2014 2:17 AM
Thursday, November 27, 2014 2:17 AMModerator -
Well this is a very serious problem in Azure Search , i believe and i think that nested objects - at least - should be supported out of the box.
I am evaluating Azure Search - i currently am a ElasticSearch user - and i am not able to find a solution in the situations where i need the simplest relation between 2 entities, one being the main entity and the other holds some sort of categorization fields eg categories that the main entity belongs
It can be "solved" by using multiple indexes as you mentioned but this transfers all the responsibility of sorting, faceting etc - meaning all the actions that the Search infrastructure should perform - to the client causing rather huge problems and making the solution not viable from day one (furthermore just imagine having 2 clients in your system using the same query)
I would like to see Microsoft considering very seriously supporting this feature .
A flat index schema or joining results on client is suitable only for a small number of problems where a flat data structure is the only one existing .
Thank you
Stelios
Saturday, December 6, 2014 8:56 PM -
Thanks for the feedback. This feature is not currently planned for GA, but we may consider it after GA since the scenario is not uncommon. There is an item tracking it on the Azure Search feedback forum:
Please help us prioritize it by voting for it.
Thanks,
-Bruce
Monday, December 8, 2014 1:41 AMModerator -
almost 4 years later and we still don't have this feature....
There is indeed a harmony to the universe.
Friday, July 20, 2018 4:52 PM -
Feature is available as of May 2019 https://docs.microsoft.com/en-us/azure/search/search-howto-complex-data-typesWednesday, May 29, 2019 12:49 AM