locked
Error OutputDataConversionError.RequiredColumnMissing with Cosmos DB output RRS feed

  • Question

  • Hello,

    Since a while, when I restart my Azure Stream Analytics job, I get the following error:

    [10:50:27 AM] Source 'devicedata' had 1 occurrences of kind 'OutputDataConversionError.RequiredColumnMissing' between processing times '2019-09-23T02:50:27.4415797Z' and '2019-09-23T02:50:27.4415797Z'.

    I'm not sure if that is something related to compatibility level as before that thing did not happen. Please note that is happening with 1.1 and also 1.2.

    I noticed that error happens when I stop a job and then start it. But when I first create the job without it exists before, the issue does not happen. From the moment I stop and start it, that error happens every time.

    Here is the query:

    SELECT * INTO [devicedata] FROM [an-iothub]

    Now to give a context, I have devices sending "weight" and "GPS" data to an Azure IoT hub, the Azure Stream Analytics handles those input to output them in a Cosmos DB. The partition key of this Cosmos DB is deviceId.

    How can I avoid to have this error and get my data correctly written in Cosmos DB?

    Also, I only want data be written in Cosmos DB and not device twin changes events, etc.

    Thanks a lot in advance.




    • Edited by NOA-Louis Monday, September 23, 2019 6:24 AM
    Monday, September 23, 2019 6:20 AM

Answers

  • Hello all,

    The issue is now solved, here is a follow up of what happened and how it has been solved.

    A part of the answer from the Azure support:

    -- Start of the answer of the Azure support

    "Typically this kind of issue will happen if the input schema do not contains the partition column in cosmosDB side. For example,if the “deviceID” is the partition key in the cosmosDB side. Then from the input side, a column with same name “deviceID” is required(Case sensitive)

    Please see below result from my lab:
    The sample data for my input:
    If I use query like yours:

    SELECT * INTO cosmos FROM input

    I will get same error like yours.
    Based on the error message. The cause is in the query I did not specified a column as partition ID.
    So I modify my query as below and using Compatibility level 1.2 :

    SELECT IoTHub.ConnectionDeviceId as deviceID, IoTHub.MessageId, data INTO cosmos FROM input

    The error will not happen and data be copied to target cosmosDB successfully."

    -- End of the answer of the Azure support

    So after that answer, I've changed my compatibility to 1.2 and edited the query like so:

    SELECT IoTHub.ConnectionDeviceId as deviceId, gps, weight INTO [devicedata] FROM [an-iothub]

    I was now able to get the data written into my Cosmos DB, however it was also writing device twin events, which is not I want. The initial query with the "Select *" statement was working perfectly before and was handling only the data that don't have a "null" value and neither the device twin events. Which means I believe here Azure has an abstraction/layer at their side for this "Select *" statement. But somehow, that one without any "partitionKey" does not work after you restart your Stream Analytics job, whereas it did work before for me for months, then I suppose Azure has changed something in their infrastructure here.

    After having understood that, then I rewritten my Stream Analytics query with the "partitionKey" required by my data lake (Cosmos DB), but also with a "WHERE" statement to ensure that I only filter the event data and not the device twin events. In my case, I know that every of my device sends a "factoryResetId" in their event data, so I simply need to check that data is greater or equal to 0.

    SELECT IoTHub.ConnectionDeviceId as deviceId, EventProcessedUtcTime, EventEnqueuedUtcTime, factoryResetId, gps, weight INTO [devicedata] FROM [an-iothub] WHERE factoryResetId >= 0

    Finally, I edited my back-end service queries that grab those telemetries (GPS and weights) to ensure that the weight value is a number, and my GPS value is an object. For the curious people, it looks like that for example:

    SELECT TOP 15 c.id, c._ts, c.weight FROM c WHERE IS_NUMBER(c.weight) AND c.deviceId = @deviceId AND c.factoryResetId = @resetFactoryId ORDER BY c._ts DESC
    SELECT TOP 15 c.id, c._ts, c.gps FROM c WHERE IS_OBJECT(c.gps) AND c.deviceId = @deviceId AND c.factoryResetId = @resetFactoryId ORDER BY c._ts DESC

    Thanks to the Microsoft / Azure team support for having provided such helpful guidance and helping to solve that issue.

    I hope that follow up will help other fellows if they end up in that case too.




    • Edited by NOA-Louis Sunday, October 6, 2019 2:11 AM
    • Marked as answer by NOA-Louis Sunday, October 6, 2019 2:11 AM
    Sunday, October 6, 2019 2:07 AM

All replies

  • Hello,

    This article outlines the different error types, causes, and diagnostic log details for input and output data errors.

    OutputDataConversionError.RequiredColumnMissing

    • Cause: The column required for the output doesn't exist. For example, a column defined as Azure Table PartitionKey does't exist.
    • Portal notification provided: Yes
    • Diagnostic log level: Warning
    • Impact: All output data conversion errors including missing required column are handled according to the Output Data Policy setting.
    • Log details
    • Name of the column and either the record identifier or part of the record.

    Hope this helps.      

    ----------------------------------------------------------------------------------------

    Do click on "Mark as Answer" and Upvote on the post that helps you, this can be beneficial to other community members.

    Monday, September 23, 2019 6:51 AM
  • Hi Louis,

     I found a related issue on Stack Overflow, where this issue was caused by case sensitivity in column names.  Does the following apply to you?

    If you're specifying fields in your query (ie Select Name, ModelNumber ...) rather than just using Select * ... the field names are converted to lowercase by default when using Compatibility Level 1.0, which throws off Cosmos DB. In the portal if you open your Stream Analytics job and go to 'Compatibility level' under the 'Configure' section and select v1.1 or higher that should fix the issue. You can read more about the compatibility levels in the Stream Analytics documentation here: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-compatibility-level

    Hope this helps.

    Monday, September 23, 2019 6:52 AM
  • Hello,

    Thanks for the answer. However, I've already read this Stack Overflow thread, also this thread.

    Also, I've already followed that docs regarding the different type of errors. That is how I understood it might come from the partition key, that is why I specified that my partition key is "deviceId" in the first message.

    Here I'm using "*" in my Select. Also, I've already tried the compatibility 1.1 and 1.2, and it's still not changing anything.



    • Edited by NOA-Louis Monday, September 23, 2019 8:43 AM
    Monday, September 23, 2019 7:47 AM
  • Hello everyone,

    That issue is still persisting, is there anyone can help to solve it please?

    Sunday, September 29, 2019 11:49 PM
  • Hello,

    This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket, else could you send an email to AzCommunity@Microsoft.com with your Subscription ID and thread link to this post, and I will enable a one-time free support request for your subscription. 

    Please reference this forum thread in the subject: “Error OutputDataConversionError.RequiredColumnMissing with Cosmos DB output”. Thank you for your persistence.

    Monday, September 30, 2019 5:00 AM
  • Hello,

    Thank you. I just sent an email to "AzCommunity@Microsoft.com" with the required information.

    In pleasure to read you.

    Monday, September 30, 2019 6:03 AM
  • Hi Louis,

    Thank you for reaching out.

    We have enabled for one-time free technical support.

    I hope you have received one on one support to work towards a resolution on this matter.

    Monday, September 30, 2019 6:08 AM
  • Thanks a lot. Process is underway.

    Monday, September 30, 2019 6:23 AM
  • Hi Louis,

    Thanks for the update, Once the issue is sorted out with the support, please do share the resolution, which might be beneficial to other community members reading this thread. 

    Monday, September 30, 2019 6:25 AM
  • Hello all,

    The issue is now solved, here is a follow up of what happened and how it has been solved.

    A part of the answer from the Azure support:

    -- Start of the answer of the Azure support

    "Typically this kind of issue will happen if the input schema do not contains the partition column in cosmosDB side. For example,if the “deviceID” is the partition key in the cosmosDB side. Then from the input side, a column with same name “deviceID” is required(Case sensitive)

    Please see below result from my lab:
    The sample data for my input:
    If I use query like yours:

    SELECT * INTO cosmos FROM input

    I will get same error like yours.
    Based on the error message. The cause is in the query I did not specified a column as partition ID.
    So I modify my query as below and using Compatibility level 1.2 :

    SELECT IoTHub.ConnectionDeviceId as deviceID, IoTHub.MessageId, data INTO cosmos FROM input

    The error will not happen and data be copied to target cosmosDB successfully."

    -- End of the answer of the Azure support

    So after that answer, I've changed my compatibility to 1.2 and edited the query like so:

    SELECT IoTHub.ConnectionDeviceId as deviceId, gps, weight INTO [devicedata] FROM [an-iothub]

    I was now able to get the data written into my Cosmos DB, however it was also writing device twin events, which is not I want. The initial query with the "Select *" statement was working perfectly before and was handling only the data that don't have a "null" value and neither the device twin events. Which means I believe here Azure has an abstraction/layer at their side for this "Select *" statement. But somehow, that one without any "partitionKey" does not work after you restart your Stream Analytics job, whereas it did work before for me for months, then I suppose Azure has changed something in their infrastructure here.

    After having understood that, then I rewritten my Stream Analytics query with the "partitionKey" required by my data lake (Cosmos DB), but also with a "WHERE" statement to ensure that I only filter the event data and not the device twin events. In my case, I know that every of my device sends a "factoryResetId" in their event data, so I simply need to check that data is greater or equal to 0.

    SELECT IoTHub.ConnectionDeviceId as deviceId, EventProcessedUtcTime, EventEnqueuedUtcTime, factoryResetId, gps, weight INTO [devicedata] FROM [an-iothub] WHERE factoryResetId >= 0

    Finally, I edited my back-end service queries that grab those telemetries (GPS and weights) to ensure that the weight value is a number, and my GPS value is an object. For the curious people, it looks like that for example:

    SELECT TOP 15 c.id, c._ts, c.weight FROM c WHERE IS_NUMBER(c.weight) AND c.deviceId = @deviceId AND c.factoryResetId = @resetFactoryId ORDER BY c._ts DESC
    SELECT TOP 15 c.id, c._ts, c.gps FROM c WHERE IS_OBJECT(c.gps) AND c.deviceId = @deviceId AND c.factoryResetId = @resetFactoryId ORDER BY c._ts DESC

    Thanks to the Microsoft / Azure team support for having provided such helpful guidance and helping to solve that issue.

    I hope that follow up will help other fellows if they end up in that case too.




    • Edited by NOA-Louis Sunday, October 6, 2019 2:11 AM
    • Marked as answer by NOA-Louis Sunday, October 6, 2019 2:11 AM
    Sunday, October 6, 2019 2:07 AM
  • Hi Louis,

    Glad to know that your issue has resolved. And thanks for sharing the solution, which might be beneficial to other community members reading this thread. 

    Monday, October 7, 2019 5:19 AM