none
Finding Missing Date Ranges RRS feed

  • Question

  • I have a table that is for utility bills. Simplified, each line has an identifier for the property the bill refers to, has a from date and a to date defining the period covered by the bill and a bill amount. Sometimes the bill amount is null. My problem is that there are some periods which are not covered by a line with a bill amount. That is if I group the table by property number and date ranges ascending by from date using only lines which have a non-null bill amount then I get a nice table with no duplicated periods, however there are (a few) gaps.

    Now, sometimes the gap is covered exactly or partially by a from to date row in the original data that has a null bill amount. These are legitimate rows and the data is structured in such a way that where this is the case then an adjacent row with a bill amount will include the charge for the period that has a null bill amount. So, I need a table that accounts for all days that are covered in the data without duplication. However, there will sometimes be a real gap in the data, that is a period that is not covered at all and I need to identify those real gaps too.

    Here's what seems to me a possible approach. Step Two is the part I need help with,I think everything else more or less follows after that.

    Step One: First create a table using a grouping query which consolidates all lines with a bill amount by the from to date. I have this.


    SELECT Bill_Summary.[Property No], Bill_Summary.[Bill Account No], Bill_Summary.[Bill Date], Bill_Summary.[Property Address], Bill_Summary.[Bill From Date], Bill_Summary.[Bill To Date], [Bill_Summary]![Bill To Date]-[Bill_Summary]![Bill From Date] AS [Days USe], Bill_Summary.Consumption, Bill_Summary.[Total Milk Charge], Bill_Summary.[Total Honey Charge], Bill_Summary.[Total Milk Standing Charge], Bill_Summary.[Total Honey Standing Charge], Bill_Summary.[Bill Amount]
    FROM Bill_Summary
    GROUP BY Bill_Summary.[Property No], Bill_Summary.[Bill Account No], Bill_Summary.[Bill Date], Bill_Summary.[Property Address], Bill_Summary.[Bill From Date], Bill_Summary.[Bill To Date], [Bill_Summary]![Bill To Date]-[Bill_Summary]![Bill From Date], Bill_Summary.Consumption, Bill_Summary.[Total Milk Charge], Bill_Summary.[Total Honey Charge], Bill_Summary.[Total Milk Standing Charge], Bill_Summary.[Total Honey Standing Charge], Bill_Summary.[Bill Amount]
    HAVING (((Bill_Summary.[Bill Amount]) Is Not Null))
    ORDER BY Bill_Summary.[Property No], Bill_Summary.[Bill From Date];


    Step Two: Then identify from the created table for each property where there are gaps (So result would be a list, Property No, Start date of Gap, End Date of Gap). How do I do that?

    Step Three: Then go back to the original data and look for lines (or possibly more than a single line) that exactly fills that gap (or partially fill its, trickier, but never overfills - there can be no overlaps). If I knew how to do Step Two I think I have some clue how to do this step - at least for exact matches for the gaps.

    Step Four: Union query the results of steps one and three to create the table that accounts for all days that are covered in the data without duplication.

    Step Five: Like step two, identify the real gaps in the final table. Get a list of them by Property No and From Date - To date.

    The above uses Access which is the tool I have available to me in the work environment I have to do this.

    That's it.

    Thanks in advance for any help.

    androo2351


    • Edited by androo2351 Wednesday, March 9, 2016 4:49 PM emphasis
    Wednesday, March 9, 2016 1:55 PM

Answers

  • These 2 queries will find the gaps.   The first gives a list of all the TO & FROM dates.   The 2nd finds where there is a single date.

        qryBillingDates --

    SELECT Bill_Summary.[Property No], Bill_Summary.[Bill Account No],  Bill_Summary.[Bill From Date] AS [BillingDate]

    FROM Bill_Summary

    UNION ALL SELECT Bill_Summary.[Property No], Bill_Summary.[Bill Account No],  Bill_Summary.[Bill To Date] AS [BillingDate]

    FROM Bill_Summary;

    SELECT qryBillingDates.[Property No], qryBillingDates.[Bill Account No],  [BillingDate], Count([BillingDate]) AS DateCount

    FROM qryBillingDates

    GROUP BY qryBillingDates.[Property No], qryBillingDates.[Bill Account No],  [BillingDate]

    HAVING Count([BillingDate]) = 1;

    EDIT -- The above is based upon the data you posted where you repeat the same date that is [Bill To Date] as the following [Bill From Date] so that each date show be used twice.


    Build a little, test a little


    Wednesday, March 9, 2016 9:33 PM
  • You can detect discontinuous ranges by left outer joining two instances of a table and returning rows where any required column to the right of the join in the result set is Null, and the row is not the latest in each subset.  The following is an example which returns rows where the subsequent row for the property in question has a BillFromDate which does not equate to the current row's BillToDate or the BillAmount in the subsequent row is Null, using a simplified version of the result table of your first query, named Bills and with columns PropertyNo, BillFromDate,BillToDate and BillAmount.

    SELECT B1.*
    FROM Bills As B1
    LEFT JOIN
        (SELECT PropertyNo, BillFromDate AS NextFromDate
         FROM Bills
         WHERE BillAmount IS NOT NULL) AS B2
    ON B1.PropertyNo = B2.PropertyNo
    AND B1.BillToDate = B2.NextFromDate
    WHERE B1.BillAmount IS NOT NULL
    AND B2.PropertyNo IS NULL
    AND B1.BillToDate <>
        (SELECT MAX(BillToDate)
         FROM Bills AS B3
         WHERE B3.PropertyNo = B1.PropertyNo
         AND BillAmount IS NOT NULL);

    However, a fundamental problem with your Bill_Summary table is that it encodes data as column headings by having columns for each type of charge.  A fundamental principle of the database relational model is the Information Principle (Codd's Rule #1). This requires that all data be stored as values at column positions in rows in tables, and in no other way.  The table should really be decomposed into three related tables, Bills, Charges and, to model the binary many-to-many relationship type between them by resolving it into two unary one-to-many relationship types, BillCharges.

    Also, distinct ranges are not normally defined where the end date equates with the start of the subsequent range.  The start of the subsequent range is normally the date following the end date of the previous range.  This avoids the ambiguity of a date falling simultaneously within two ranges.

    Ken Sheridan, Stafford, England

    • Marked as answer by David_JunFeng Sunday, March 20, 2016 2:21 PM
    Thursday, March 10, 2016 12:47 AM

All replies

  • Hi. See if this article applies to your situation. Hope it helps...
    Wednesday, March 9, 2016 4:02 PM
  • Thanks, yes it looks like that may help - though of course I'm looking for what's not there rather than what is there but I'll see if I can adapt it.

    androo2351

    Wednesday, March 9, 2016 4:48 PM
  • Hi. Good luck. The purpose of using a Cartesian product was to create a set of dates without any gaps. You should be able to use the result to compare with your data to find the gaps. Cheers!
    Wednesday, March 9, 2016 5:13 PM
  • Sorry, I don't understand that last comment - the table created at Step One will have and does have gaps - some will be "real gaps" (to be identified at Step 5) some will be, lets say, "pseudo gaps" to be identified at step 3 based on the list of all gaps created at step two.

    Above is an extract of the output of step one - what I want from step two is a list of missing date ranges. So here for example it would be

    716319137 (Prop Ref) 4/8/14 (missing from date) 29/8/14 (missing to date).

    When I've got that list I can then check to see if I have that date range with that property ref in a line with a null Bill Amount in my original data. When I do get a list of those I can use it at step 4 to create my "accounted for days" list. I can then check that list, step five, to see what is really missing.

    I don't know what you mean by or what the purpose would be in creating "a set of dates without any gaps". However, I will try this out the code you linked to and see if I can adapt it. I'll report back how I get on.

    Thanks


    androo2351

    Wednesday, March 9, 2016 6:44 PM
  • Hi. What I am saying is if you have a table with dates as follows:

    1/1/16
    1/2/16
    1/4/16
    1/5/16

    And you want to find the gap, which would be 1/3/16, then the Cartesian query will create a data set for you with the following dates:

    1/1/16
    1/2/16
    1/3/16
    1/4/16
    1/5/16

    Which you can then use to compare with your table's data to find the gap, which is 1/3/16.

    Hope that makes sense...

    Wednesday, March 9, 2016 6:58 PM
  • These 2 queries will find the gaps.   The first gives a list of all the TO & FROM dates.   The 2nd finds where there is a single date.

        qryBillingDates --

    SELECT Bill_Summary.[Property No], Bill_Summary.[Bill Account No],  Bill_Summary.[Bill From Date] AS [BillingDate]

    FROM Bill_Summary

    UNION ALL SELECT Bill_Summary.[Property No], Bill_Summary.[Bill Account No],  Bill_Summary.[Bill To Date] AS [BillingDate]

    FROM Bill_Summary;

    SELECT qryBillingDates.[Property No], qryBillingDates.[Bill Account No],  [BillingDate], Count([BillingDate]) AS DateCount

    FROM qryBillingDates

    GROUP BY qryBillingDates.[Property No], qryBillingDates.[Bill Account No],  [BillingDate]

    HAVING Count([BillingDate]) = 1;

    EDIT -- The above is based upon the data you posted where you repeat the same date that is [Bill To Date] as the following [Bill From Date] so that each date show be used twice.


    Build a little, test a little


    Wednesday, March 9, 2016 9:33 PM
  • You can detect discontinuous ranges by left outer joining two instances of a table and returning rows where any required column to the right of the join in the result set is Null, and the row is not the latest in each subset.  The following is an example which returns rows where the subsequent row for the property in question has a BillFromDate which does not equate to the current row's BillToDate or the BillAmount in the subsequent row is Null, using a simplified version of the result table of your first query, named Bills and with columns PropertyNo, BillFromDate,BillToDate and BillAmount.

    SELECT B1.*
    FROM Bills As B1
    LEFT JOIN
        (SELECT PropertyNo, BillFromDate AS NextFromDate
         FROM Bills
         WHERE BillAmount IS NOT NULL) AS B2
    ON B1.PropertyNo = B2.PropertyNo
    AND B1.BillToDate = B2.NextFromDate
    WHERE B1.BillAmount IS NOT NULL
    AND B2.PropertyNo IS NULL
    AND B1.BillToDate <>
        (SELECT MAX(BillToDate)
         FROM Bills AS B3
         WHERE B3.PropertyNo = B1.PropertyNo
         AND BillAmount IS NOT NULL);

    However, a fundamental problem with your Bill_Summary table is that it encodes data as column headings by having columns for each type of charge.  A fundamental principle of the database relational model is the Information Principle (Codd's Rule #1). This requires that all data be stored as values at column positions in rows in tables, and in no other way.  The table should really be decomposed into three related tables, Bills, Charges and, to model the binary many-to-many relationship type between them by resolving it into two unary one-to-many relationship types, BillCharges.

    Also, distinct ranges are not normally defined where the end date equates with the start of the subsequent range.  The start of the subsequent range is normally the date following the end date of the previous range.  This avoids the ambiguity of a date falling simultaneously within two ranges.

    Ken Sheridan, Stafford, England

    • Marked as answer by David_JunFeng Sunday, March 20, 2016 2:21 PM
    Thursday, March 10, 2016 12:47 AM
  • Thanks for all the responses. I did manage to get a result in the end which may well have used many of the suggestions above, and, I'm sure, there are things in the above I can use to improve my solution. Unfortunately solving the step two problem has exposed another problem with the data which will likely need a different approach to solve. I will report back - but no time now. Thanks again.

    androo2351

    Saturday, March 12, 2016 10:38 AM
  • Hi androo. Good luck!
    Saturday, March 12, 2016 1:36 PM