locked
Need many specific Pk/RK rows. Possible to batch them without incurring a table scan? RRS feed

  • Question

  • I have many specific entries in Azure Table storage I'd like to retrieve.   Each entry is identified by a unique PK/RK, and many (but not all) are in the same PK

    Is it possible to do a batch transaction without incurring a table scan to fetch all specified RKs within a Partition?

    Is it more efficient to query and fetch each PK/RK independently?

    Friday, October 5, 2012 5:37 PM

Answers

  • Sorry, the limits I was talking about are simply the limits of what your development environment can handle. There's nothing hardcoded (though make sure your ServicePointManager.DefaultConnectionLimit is set high enough, if you're using .NET), but there are practical limits in terms of open sockets, threads, file descriptors (depending on the OS), and different runtimes have their own limitations. There's no substitute for actually doing the experiment, so I'd recommend just trying it.
    • Marked as answer by ChrisLaMont Saturday, October 6, 2012 4:01 PM
    Friday, October 5, 2012 8:21 PM

All replies

  • Nope. If you specify an "or" query with multiple row keys, it will result in a partition scan, and if you specify an "or" query with multiple partition keys, it will result in a full table scan.

    Parallel queries for each pk/rk pair is generally the most efficient way to do this, but I can imagine scenarios in which a partition is small enough that a partition scan is better overall. (There's generally a limit to how many queries your client will be able to execute in parallel, so combining some into a partition scan may work out better for small partitions.)

    Friday, October 5, 2012 5:54 PM
  • Thanks Steve!  I see you're now at Aditi now... I'm excited to follow your career.

    Do you have any guidance on what I should look for w.r.t the limits you mention?  

    .. Where are the limits set? (Azure Datacenter or my client)

    .. Are the limitations immediately removed when the operation completes?  (do I have to wait for a cleanup)?

    I plan to pre-emptively defer my IO operation than to bombard the system with queries that will timeout, or error out.

    Any ideas or code on how to accomplish this?

    Friday, October 5, 2012 6:12 PM
  • Sorry, the limits I was talking about are simply the limits of what your development environment can handle. There's nothing hardcoded (though make sure your ServicePointManager.DefaultConnectionLimit is set high enough, if you're using .NET), but there are practical limits in terms of open sockets, threads, file descriptors (depending on the OS), and different runtimes have their own limitations. There's no substitute for actually doing the experiment, so I'd recommend just trying it.
    • Marked as answer by ChrisLaMont Saturday, October 6, 2012 4:01 PM
    Friday, October 5, 2012 8:21 PM
  • Thank you.  I will report back with the results.

    Also, I just posted a new question you might be interested in: Is it possible to use IF-MATCH to make Azure conditionally write a value to storage?

    http://social.msdn.microsoft.com/Forums/en-US/windowsazuredata/thread/630175f9-8553-4354-8273-ba0748e3b77d
    Saturday, October 6, 2012 4:06 PM