extracting JSON message from avro file


  • I have a 3rd party application that I have configured to send messages in json with gzip encoding to an Azure Event Hub.  I have the Event Hub capturing those messages to a Data Lake Store.  Event Hub stores these as .avro files.

    What I'm struggling with is how to extract that information out of the .avro files.  There is a usql sample solution for .avro that I was able to get to work with a sample .avro file, however it's not structured the same as the one from Even Hub.  The json message I'm trying to pull out is in the Body tag of the avro file, but the Body field is defined as bytes.  When I open the file in notepad ++ I can actually see the json messages all on one line wrapped in the .avro tags.  I tried to use xmlspy avro reader, but it leaves the json message in digits:

    I'm looking for some direction on how to get the messages back out of the .avro file.  Below is what xmlspy shows.

    Friday, June 8, 2018 1:16 PM

All replies

  • If it is possible, can you please give us a sample of your avro file for us to reproduce the error?

    Wednesday, June 20, 2018 6:56 PM
  • I don't see a way to upload a small file so:

    Objavro.codecnullavro.schemaì{"type":"record","name":"EventData","namespace":"Microsoft.ServiceBus.Messaging","fields":[{"name":"SequenceNumber","type":"long"},{"name":"Offset","type":"string"},{"name":"EnqueuedTimeUtc","type":"string"},{"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},{"name":"Properties","type":{"type":"map","values":["long","double","string","bytes","null"]}},{"name":"Body","type":["null","bytes"]}]} l˜21Ò®C¢öT­x6̨ñ²Žô4314224760&6/6/2018 3:50:14 PM  ì{"notifications":[{"notificationType":"locationupdate","subscriptionName":"azureIP","eventId":1,"locationMapHierarchy":"XXXX","locationCoordinate":{"x":137.8636,"y":79.42174,"z":0.0,"unit":"FEET"},"geoCoordinate":{"latitude":-999.0,"longitude":-999.0,"unit":"DEGREES"},"confidenceFactor":32.0,"apMacAddress":"00:00:00:00:00:00","associated":true,"username":"","ipAddress":["","0000:0000:0000:0000:0000:0000:0000:0000"],"ssid":"display","band":"IEEE_802_11_B","floorId":746482908230907215,"floorRefId":"746482908230907215","entity":"WIRELESS_CLIENTS","deviceId":"00:00:00:00:00:00","lastSeen":"2018-06-06T10:50:14.879-0500","rawLocation":{"rawX":-999.0,"rawY":-999.0,"unit":"FEET"},"tagVendorData":null,"locComputeType":"RSSI","manufacturer":"Intel Corporate","maxDetectedRssi":{"apMacAddress":"00:00:00:00:00:00","band":"IEEE_802_11_B","slot":0,"rssi":-45,"antennaIndex":0,"lastHeardInSeconds":3},"timestamp":1528300214879}]}ô4314225832&6/6/2018 3:50:14 PM  ¬

    Wednesday, June 20, 2018 8:05 PM
  • Thank you for sending it. I will update when I have an answer
    Thursday, June 21, 2018 6:21 PM
  • SO, I was able to get the Body portion of my message out by using Encoding.  The Body field from Avro contains my json.  When I wrote this to a file, it seemed to add an extra " next to each " that was in the json portion which means I can't read this yet as a json file.  I am next hoping to select this json in the @cnt field below, run it through a json extractor and query just the fields I need (see code below), but I'm running into an error on @jsonnodes "Expression cannot be constant folded with a resolution of Use a constant expression or a CONST parameter.

    1) Is this even the right way to go or should I land the json and figure out how to remove the extra "

    2) If I can go this way, any direction on how to get this to work would be great.

    @cnt =
        Encoding.UTF8.GetString(Body) AS Json
    FROM @rs
        //GROUP BY topic
    @jsonnodes = 
            notificationType string
            ,subscriptionName string
            ,eventId long
    FROM @cnt
    USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();
    @jsondata =
       SELECT notificationType,
         FROM @jsonnodes;

    Friday, June 22, 2018 1:37 PM
  • Have you already looked at this page which explains about the error you are getting and how to resolve it?
    Monday, June 25, 2018 7:08 PM