sticky
How to model nested objects in an Azure Search Index (Was: How to represent a list of code and descriptions in search documents)

    Question

  • I have a material that represents a document, each material can be included in more or more modules collections:

    Modules are represented by a Module Code and have a Description, while I can represent the code as a list I was wondering if there was a way to keep the correlation between the code and description intact.

    The json (other fields omitted) for the field representation looks like this:

    "tags": [
        {
          "moduleCode": "GBM48",
          "moduleDescription": "Construction In General"
        },
        {
          "moduleCode": "GBM51",
          "moduleDescription": "Civil Engineering"
        }
      ]

    This is being rejected on upload (400), any guidance appreciated (it may be way out of scope).

    Regards,

    Mark


    Wednesday, November 26, 2014 4:19 PM

Answers

  • Hi Mark,

    You're seeing the 400 error because Azure Search does not support fields containing arrays of objects. The only collection type we support currently is Collection(Edm.String).

    Assuming you want to be able to search for documents associated with a given moduleDescription, there are two approaches we would recommend, depending on the frequency of updates to your index and the structure of your data:

    1. If your index is updated very rarely (in particular, if the moduleDescription for a given moduleCode changes rarely if ever), AND if there are relatively few documents per moduleCode and few moduleCodes per document, then you should consider duplicating all the other document properties for each unique moduleCode. In other words, denormalize documents and modules into a single index. In this case you'd have two string fields, one for moduleCode and one for moduleDescription.
    2. If either of the above conditions does not hold (i.e. -- module data is updated frequently or the relationship between documents and modules has high cardinality), then you should consider having two separate indexes for documents and modules. The documents index would have a Collection(Edm.String) field containing all the moduleCodes associated with each document, while the modules index would have a string field for moduleCode and another for moduleDescription. Although Azure Search doesn't support joins, you can achieve the same effect in this case by issuing two queries and joining the results by moduleCode on the client side.

    The first solution is trading off index update speed and complexity in favor of query speed. The second solution trades off some query performance in favor of ease of index updates. Which you choose depends on your circumstances. We would advise testing both approaches with realistic data first, especially if you have a lot of data.

    Hope this helps,

    -Bruce

    Thursday, November 27, 2014 2:17 AM
    Moderator

All replies

  • Hi Mark,

    You're seeing the 400 error because Azure Search does not support fields containing arrays of objects. The only collection type we support currently is Collection(Edm.String).

    Assuming you want to be able to search for documents associated with a given moduleDescription, there are two approaches we would recommend, depending on the frequency of updates to your index and the structure of your data:

    1. If your index is updated very rarely (in particular, if the moduleDescription for a given moduleCode changes rarely if ever), AND if there are relatively few documents per moduleCode and few moduleCodes per document, then you should consider duplicating all the other document properties for each unique moduleCode. In other words, denormalize documents and modules into a single index. In this case you'd have two string fields, one for moduleCode and one for moduleDescription.
    2. If either of the above conditions does not hold (i.e. -- module data is updated frequently or the relationship between documents and modules has high cardinality), then you should consider having two separate indexes for documents and modules. The documents index would have a Collection(Edm.String) field containing all the moduleCodes associated with each document, while the modules index would have a string field for moduleCode and another for moduleDescription. Although Azure Search doesn't support joins, you can achieve the same effect in this case by issuing two queries and joining the results by moduleCode on the client side.

    The first solution is trading off index update speed and complexity in favor of query speed. The second solution trades off some query performance in favor of ease of index updates. Which you choose depends on your circumstances. We would advise testing both approaches with realistic data first, especially if you have a lot of data.

    Hope this helps,

    -Bruce

    Thursday, November 27, 2014 2:17 AM
    Moderator
  • Well this is a very serious problem in Azure Search , i believe and i think that nested objects - at least - should be supported out of the box.

    I am evaluating Azure Search - i currently am a ElasticSearch user - and i am not able to find a solution in the situations where i need the simplest relation between 2 entities, one being the main entity and the other holds some sort of categorization fields eg categories that the main entity belongs

    It can be "solved" by using multiple indexes as you mentioned but this transfers all the responsibility of sorting, faceting etc - meaning all the actions that the  Search infrastructure  should perform - to the client causing rather huge problems and making the solution not viable from day one (furthermore just imagine having 2 clients in your system using the same query)

    I would like to see  Microsoft considering very seriously  supporting this feature .

    A flat index schema  or joining results on client is suitable only for a small number of problems where a flat data structure is the only one  existing .

    Thank you

    Stelios

    Saturday, December 6, 2014 8:56 PM
  • Thanks for the feedback. This feature is not currently planned for GA, but we may consider it after GA since the scenario is not uncommon. There is an item tracking it on the Azure Search feedback forum:

    http://feedback.azure.com/forums/263029-azure-search/suggestions/6670910-modelling-complex-types-in-indexes

    Please help us prioritize it by voting for it.

    Thanks,

    -Bruce

    Monday, December 8, 2014 1:41 AM
    Moderator
  • almost 4 years later and we still don't have this feature....

    There is indeed a harmony to the universe.

    Friday, July 20, 2018 4:52 PM