locked
Populate Blank Rows, How? RRS feed

  • Question

  • I need help on how to solve this predicament that I’m facing. The customer data that I’m dealing with comes in Excel files. The customers are divided according to location where they are but that is not populated in all the rows. Before I explain my question further, please refer to the below table to understand.

    Location

    Customer_ID

    Location_A

    1234

     

    1235

     

    1236

     

    1237

     

    1238

    Location_B

    7991

     

    7992

     

    7993

     

    7994

     

    7995

    What I would like to achieve is to populate the blank rows under the column Location. Any advice on how I can do this. Everything which comes after “Location_A” should be “Location_A” and everything which comes after “Location_B” should be “Location_B”. I would like to do this either in SQL or in SSIS.

    Thank you for your assistance in advance,


    Mpumelelo

    • Edited by Mpumelelo S Monday, October 16, 2017 4:45 PM
    Monday, October 16, 2017 4:44 PM

Answers

  • I have cracked this challenge by use of SSIS and stored procedure. It’s the best that I could get but I believe it can be improved. This is how I did it. I will be as basic in my explanation as possible for the benefit of those new to SSIS who may be facing a similar challenge.

    1. I first created an Identity column called Location_Row_ID on the landing table. This will ensure that the order in which the rows are populated on the table is the same as it was on the file (well, that is what I noticed when I crossed checked manually).
    2. I then loaded my landing table with the Location names missing, as the case was on the source file.

    è EEverything after this is controlled from SSIS using the following setup.

    Using a SQL Execute Task (SQL Get Location List Leverage), I selected a list of Location_Row_IDs and Location_Names where they are populated. Using the example that I supplied in this question, the script will be,

    SELECT Location_Row_ID

          ,Location

      FROM staging.My_Table

      WHERE Location IS NOT NULL

    The result which is loaded to an SSIS Object variable is displayed on the table below.

    Location_Row_ID

    Location

    1

    Location_A

    6

    Location_B

     

    On the ForEach Loop Container (FELC Populate Location Details), using the Execute SQL Task (SQL Update Table With Relevant Location) along with the associated parameters I ran the following stored procedure,

    ALTER PROCEDURE usp_PopulateLocationDetails

    @Location_Row_ID bigint,

    @Location_Name varchar(250)

    WITH RECOMPILE

    AS

    BEGIN

           SET NOCOUNT ON;

    UPDATE [staging].[My_Table]

    SET Location = @Location_Name

    WHERE Row_ID >= @Location_Row_ID

    END

    Many thanks,


    Mpumelelo

    • Edited by Mpumelelo S Tuesday, October 17, 2017 10:57 AM
    • Proposed as answer by Xi Jin Thursday, October 19, 2017 7:03 AM
    • Marked as answer by Mpumelelo S Thursday, October 19, 2017 8:34 AM
    Tuesday, October 17, 2017 10:53 AM

All replies

  • It can be done through T-SQL, but is there any other column to sort rows in specific order  ?

    Also, what is the SQL Server version you are using ? 


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.


    • Edited by Sarat (SS) Monday, October 16, 2017 4:53 PM
    Monday, October 16, 2017 4:52 PM
  • Using SSIS you can easily do this using a Script component. You would add a readwrite column (Location). In your script you need an instance variable (this is a variable that you declare outside of any method). In your ProcessInputRow method, you look for empty Location
    string location = "UNKNOWN";
    
    public override void Input0_ProcessInputRow(Input0Buffer Row)
        {
       if (string.IsNullOrEmpty(Row.Location) {
    Row.Location = location;
    } else {
    location = Row.Location;
    }
    }


    Russel Loski, MCSE Data Platform/Business Intelligence Twitter: @sqlmovers; blog: www.sqlmovers.com

    Monday, October 16, 2017 5:08 PM
  • Sarat,

    Yes, there is a column that I can use to sort the rows in order. It is the CustomerID column. But the challenge is that a given customer can appear in both locations as the Location column refers to a place where a customer gets a service.

    Russ,

    Thanks for the C# advice. I will see if I can be able to work through this as my C# is not the best in the world.

    Many thanks,


    Mpumelelo

    Monday, October 16, 2017 6:30 PM
  • Sarat,

    Yes, there is a column that I can use to sort the rows in order. It is the CustomerID column. But the challenge is that a given customer can appear in both locations as the Location column refers to a place where a customer gets a service.

    Russ,

    Thanks for the C# advice. I will see if I can be able to work through this as my C# is not the best in the world.

    Many thanks,


    Mpumelelo

    Reason why order required is to know if missing location corresponds to correct location or not.

    Data retried from table cannot be guarantee order unless order is specified. If information is retrieved from table as below , then it would be difficult to know 7994 belongs to Location_A or Location_B.

    Location

    Customer_ID

    Location_A

    1234

     

    1235

     

    7994 instead of 1236

     

    1237

     

    1238

    Location_B

    7991

     

    7992

     

    7993

     

    1236 instead of 7994

     

    7995


    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Monday, October 16, 2017 7:27 PM
  • Thanks Sarat. I don't think there is any simple way to order the data according to the Location groups where the rows belong especially after the data has been loaded to the table. That is where my challenge is. I thought it will be easier to handle this situation in SSIS as the rows are moved in the SSIS pipeline in the order that they appear on the file (well, that is what I noticed when I used a Data Viewer on SSIS). I will try to give Russ's suggestion a go. If there is any other non C# workaround, I would gladly welcome that.

    Thanks,


    Mpumelelo

    Monday, October 16, 2017 8:33 PM
  • I too think so , to better handle it at source . If it doesn't work , try adding an identity column to result in SSIS , so that identity could get order.

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Monday, October 16, 2017 8:37 PM
  • I too think so , to better handle it at source . If it doesn't work , try adding an identity column to result in SSIS , so that identity could get order.

    Please use Marked as Answer if my post solved your problem and use Vote As Helpful if a post was useful.

    Thanks. I will review and see the best option.

    Mpumelelo

    Monday, October 16, 2017 8:44 PM
  • I have cracked this challenge by use of SSIS and stored procedure. It’s the best that I could get but I believe it can be improved. This is how I did it. I will be as basic in my explanation as possible for the benefit of those new to SSIS who may be facing a similar challenge.

    1. I first created an Identity column called Location_Row_ID on the landing table. This will ensure that the order in which the rows are populated on the table is the same as it was on the file (well, that is what I noticed when I crossed checked manually).
    2. I then loaded my landing table with the Location names missing, as the case was on the source file.

    è EEverything after this is controlled from SSIS using the following setup.

    Using a SQL Execute Task (SQL Get Location List Leverage), I selected a list of Location_Row_IDs and Location_Names where they are populated. Using the example that I supplied in this question, the script will be,

    SELECT Location_Row_ID

          ,Location

      FROM staging.My_Table

      WHERE Location IS NOT NULL

    The result which is loaded to an SSIS Object variable is displayed on the table below.

    Location_Row_ID

    Location

    1

    Location_A

    6

    Location_B

     

    On the ForEach Loop Container (FELC Populate Location Details), using the Execute SQL Task (SQL Update Table With Relevant Location) along with the associated parameters I ran the following stored procedure,

    ALTER PROCEDURE usp_PopulateLocationDetails

    @Location_Row_ID bigint,

    @Location_Name varchar(250)

    WITH RECOMPILE

    AS

    BEGIN

           SET NOCOUNT ON;

    UPDATE [staging].[My_Table]

    SET Location = @Location_Name

    WHERE Row_ID >= @Location_Row_ID

    END

    Many thanks,


    Mpumelelo

    • Edited by Mpumelelo S Tuesday, October 17, 2017 10:57 AM
    • Proposed as answer by Xi Jin Thursday, October 19, 2017 7:03 AM
    • Marked as answer by Mpumelelo S Thursday, October 19, 2017 8:34 AM
    Tuesday, October 17, 2017 10:53 AM
  • Here you go

    CREATE TABLE #C (X CHAR(1) ,Y INT)
    INSERT INTO #C values('A', 1)
    INSERT INTO #C values('', 2)
    INSERT INTO #C values('', 3)
    INSERT INTO #C values('', 4)


    INSERT INTO #C values('B', 1)
    INSERT INTO #C values('', 2)
    INSERT INTO #C values('', 3)

    SELECT * FROM #C

    ALTER TABLE #C ADD ID INT IDENTITY(1,1)

    SELECT CASE WHEN X >''
                THEN X
                ELSE (SELECT max(X)
                      FROM #C
                      WHERE ID <= t.ID)
           END AS X,
           Y
    FROM #C t


    Best Regards,Uri Dimant SQL Server MVP, http://sqlblog.com/blogs/uri_dimant/

    MS SQL optimization: MS SQL Development and Optimization
    MS SQL Consulting: Large scale of database and data cleansing
    Remote DBA Services: Improves MS SQL Database Performance
    SQL Server Integration Services: Business Intelligence


    Tuesday, October 17, 2017 11:27 AM
    Answerer
  • Hi Mpumelelo,

    I'm glad to hear that you have resolved your issue. And thanks for sharing your solution.

    Thanks,
    Xi Jin.


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, October 19, 2017 7:04 AM