locked
Hirarchical JSON Queries on Twitter Data Stream RRS feed

  • Question

  • Hi,

    The ability to query hierarchical JSON data, and the Azure Table Storage output option are great additions to Stream Analytics.

    I'm currently experimenting with querying into data streams from Twitter.

    Stuff like this works fine:

    -- Get statistics for location
    select
        min (id) as Id,
        [user].location as Location,
        count([user].location) as Total,
        avg([user].followers_count) as AvgFollowers,
        avg([user].favourites_count) as AvgFavourites,
        avg([user].friends_count) as AvgFriends,
        avg([user].listed_count) as AvgListed
    from tweetstream
    where [user].location is not null and [user].location != ''
    group by [user].location, TumblingWindow (minute, 1)
    having Total > 10
    
    -- Get number of tweets by name
    select [user].screen_name, count (id) as Tweets
    from tweetstream
    group by [user].screen_name, TumblingWindow (minute, 1)
    having Tweets > 1
    
    -- Select based on text in tweet
    select text
    from tweetstream
    where text like '%food%'

    What I am having issues with is repeating data, such as selecting the hashtags from a tweet.

    There can be zero or more hashtags in a tweet, the JSON looks like this:

    {
    	"created_at":"Fri Nov 28 21:42:41 +0000 2014",
    	"id":538447914268639232,
    	// Deleted
    	"entities":
    	{
    		"hashtags":[
    		{
    			"text":"fall",
    			"indices":[33,38]
    		},
    		{
    			"text":"confessionsofaprchic",
    			"indices":[39,60]
    		}],
    		"trends":[],
    		"urls":[],
    		"user_mentions":[],
    		"symbols":[]
    	},
    	// Deleted
    }

    Is there a way to select the hashtags for a tweet?

    If not, is it something that will be possible in the future?

    Regards,

    Alan


    Free e-book: Windows Azure Service Bus Developer Guide.

    Wednesday, February 25, 2015 12:59 PM

Answers

  • Today you can't access contents of the array in the query. We will be extending query language to allow flattening of arrays for further processing.

    For example, in your scenario, you will be able to transform rows with tweets containing array of hashtags into multiple rows with individual hashtags.

    This should be available soon.

    • Marked as answer by Zafar Abbas Tuesday, March 10, 2015 5:45 AM
    Thursday, March 5, 2015 12:26 AM