locked
Get lastest row for duplicates RRS feed

  • Question

  • I have an Excel spreadsheet that has multiple rows for a specific customer.  I want to only keep the row for each customer that has the earliest shipping date but I want all the columns.  Can I do that with PowerPivot and if so, how?
    Wednesday, March 19, 2014 7:01 PM

Answers

  • Hi RACAND,

    Thanks for providing the Excel workbook. I've taken a look at the data and the rows that have been highlighted appear to be the rows with the latest shipment dates as opposed to the earliest ones.

    If you are indeed trying to return values for the rows with the earliest shipment date then you can use the following DAX formula to create a calculated column that will flag the rows for each Customer ID that has the earliest shipment date:

    = 
    'Order'[SHIP_DATE] = 
    CALCULATE(
      MIN('Order'[SHIP_DATE]), 
      ALLEXCEPT(
        'Order',
        'Order'[CUSTOMER_ID]
      )
    )

    Note: If you want the opposite effect (i.e. flagging the latest shipment date per customer) then you can swap the 'MIN' function with the 'MAX' one.

    This formula will return a 'true' or 'false' value per row. In the below example I have called the column 'FIRST_SHIP_DATE'.

    On your customer table, you can then use the following pattern to pull back the values from the Order table for each Customer where the Order[FIRST_SHIP_DATE] has a true value.

    =
    CALCULATE(
      LASTNONBLANK(
        <<Put the order table column name here>>, 
        1
      ), 
      'Order'[FIRST_SHIP_DATE] = TRUE()
    )

    An example of this pattern being used to add the 'SHIP_TO_ADDRESS' column from the Order table onto the Customer table is...

    =
    CALCULATE(
      LASTNONBLANK(
        'Order'[SHIP_ST_ADDRESS], 
        1
      ), 
      'Order'[FIRST_SHIP_DATE] = TRUE()
    )

    After applying this pattern for two other columns I got the following result...

    Hope this helps but if I have misunderstood part or all of your requirement, let me know ;)

    EDIT: You can replace where I have used:

    'Order'[FIRST_SHIP_DATE] = TRUE()


    With just:

    'Order'[FIRST_SHIP_DATE]


    For example, we can update the following formula:

    =
    CALCULATE(
      LASTNONBLANK(
        'Order'[SHIP_ST_ADDRESS], 
        1
      ), 
      'Order'[FIRST_SHIP_DATE] = TRUE()
    )

    to be written as:

    =
    CALCULATE(
      LASTNONBLANK(
        'Order'[SHIP_ST_ADDRESS], 
        1
      ), 
      'Order'[FIRST_SHIP_DATE]
    )

    This is because the column already returns a true/false value.


    Regards,

    Michael

    Please remember to mark a post that answers your question as an answer...If a post doesn't answer your question but you've found it helpful, please remember to vote it as helpful :)

    Website: nimblelearn.com, Blog: nimblelearn.com/blog



    • Edited by Michael Amadi Friday, March 21, 2014 10:46 AM
    • Marked as answer by RACAND Friday, March 21, 2014 4:50 PM
    Thursday, March 20, 2014 10:11 PM

All replies

  • Hi RACAND,

    Could you kindly provide an example of how the output should look (a screenshot of a mock-up in Excel would be fine), let us know what columns are in the table or tables that would need to be involved in this calculation and what the relationships are?


    Regards,

    Michael

    Please remember to mark a post that answers your question as an answer...If a post doesn't answer your question but you've found it helpful, please remember to vote it as helpful :)

    Website: nimblelearn.com, Blog: nimblelearn.com/blog

    Thursday, March 20, 2014 6:27 AM
  • Below is a link to an Excel 2013 spreadsheet with two linked tables one for Customer and one for Order.  I put them in Powerpivot and created the link.  If you look at the sheet with the Order I have highlighted the order it should pick for the two customers.  What I want is to add columns on the Customer for the ship date, order date, ship state, ship city and order amount.  Let me know if you have any issues getting to it.  Thanks.

    https://onedrive.live.com/redir?resid=73ED18EEC8BA7424!498&authkey=!AEbqDyg31H66ogw&ithint=file%2c.xlsx

    Thursday, March 20, 2014 3:02 PM
  • Hi RACAND,

    Thanks for providing the Excel workbook. I've taken a look at the data and the rows that have been highlighted appear to be the rows with the latest shipment dates as opposed to the earliest ones.

    If you are indeed trying to return values for the rows with the earliest shipment date then you can use the following DAX formula to create a calculated column that will flag the rows for each Customer ID that has the earliest shipment date:

    = 
    'Order'[SHIP_DATE] = 
    CALCULATE(
      MIN('Order'[SHIP_DATE]), 
      ALLEXCEPT(
        'Order',
        'Order'[CUSTOMER_ID]
      )
    )

    Note: If you want the opposite effect (i.e. flagging the latest shipment date per customer) then you can swap the 'MIN' function with the 'MAX' one.

    This formula will return a 'true' or 'false' value per row. In the below example I have called the column 'FIRST_SHIP_DATE'.

    On your customer table, you can then use the following pattern to pull back the values from the Order table for each Customer where the Order[FIRST_SHIP_DATE] has a true value.

    =
    CALCULATE(
      LASTNONBLANK(
        <<Put the order table column name here>>, 
        1
      ), 
      'Order'[FIRST_SHIP_DATE] = TRUE()
    )

    An example of this pattern being used to add the 'SHIP_TO_ADDRESS' column from the Order table onto the Customer table is...

    =
    CALCULATE(
      LASTNONBLANK(
        'Order'[SHIP_ST_ADDRESS], 
        1
      ), 
      'Order'[FIRST_SHIP_DATE] = TRUE()
    )

    After applying this pattern for two other columns I got the following result...

    Hope this helps but if I have misunderstood part or all of your requirement, let me know ;)

    EDIT: You can replace where I have used:

    'Order'[FIRST_SHIP_DATE] = TRUE()


    With just:

    'Order'[FIRST_SHIP_DATE]


    For example, we can update the following formula:

    =
    CALCULATE(
      LASTNONBLANK(
        'Order'[SHIP_ST_ADDRESS], 
        1
      ), 
      'Order'[FIRST_SHIP_DATE] = TRUE()
    )

    to be written as:

    =
    CALCULATE(
      LASTNONBLANK(
        'Order'[SHIP_ST_ADDRESS], 
        1
      ), 
      'Order'[FIRST_SHIP_DATE]
    )

    This is because the column already returns a true/false value.


    Regards,

    Michael

    Please remember to mark a post that answers your question as an answer...If a post doesn't answer your question but you've found it helpful, please remember to vote it as helpful :)

    Website: nimblelearn.com, Blog: nimblelearn.com/blog



    • Edited by Michael Amadi Friday, March 21, 2014 10:46 AM
    • Marked as answer by RACAND Friday, March 21, 2014 4:50 PM
    Thursday, March 20, 2014 10:11 PM
  • Thanks very much!  This was very helpful.
    Friday, March 21, 2014 4:52 PM
  • Glad it helped ;)

    Regards,

    Michael

    Please remember to mark a post that answers your question as an answer...If a post doesn't answer your question but you've found it helpful, please remember to vote it as helpful :)

    Website: nimblelearn.com, Blog: nimblelearn.com/blog

    Friday, March 21, 2014 5:21 PM