none
Split datatable based on row count RRS feed

  • Question

  • Hi.

    I am using VS 2010, C# and ASP.NET Web Application.

    I am trying to split a large data table into smaller data tables . Each containing 10000 rows. I have tried the below code:

    // Calling function
    getDataSet(con);
     
    
    public DataSet getDataSet(OracleConnection con)
            {          
                DataSet ds = new DataSet();
    
                ds.Tables.Add(getInfo("select * from Names", con, "Names"));
    
                ds.Tables.Add(getInfo("select * from Address", con, "Address"));
    
                ds.Tables.Add(getInfo("select * from City", con, "City"));
    
                ds.Tables.Add(getInfo("select * from Country", con, "Country"));
    
                return ds;
            }
    
    
    private List<DataTable> getInfo(string strQuery, OracleConnection con, string tblName)
            {
    List<DataTable> tables = new List<DataTable>();
                System.Data.DataTable dt = new System.Data.DataTable();
    
                using (OracleCommand cmd = new OracleCommand(strQuery))
                {
                    using (OracleDataAdapter da = new OracleDataAdapter())
                    {
                        cmd.CommandType = CommandType.Text;
                        cmd.Connection = con;
                        da.SelectCommand = cmd;
                        da.Fill(dt);
    
                    }
                }
    
                dt.TableName = tblName;
                int cnt = dt.Rows.Count;
       
                if (dt.Rows.Count > 10000)
                {
                    tables = SplitTable(dt, 10000, tblName);
                }
    
                 return tables;
            }
    
    
    private static List<DataTable> SplitTable(DataTable originalTable, int batchSize, string tblName)
            {         
                List<DataTable> tables = new List<DataTable>();
                DataTable new_table = new DataTable();
                new_table = originalTable.Clone();
                int j = 0;
                int k = 1;
                if (originalTable.Rows.Count < batchSize)
                {
                    new_table.TableName = tblName + "_" + j;
                    new_table = originalTable.Copy();
                    tables.Add(new_table.Copy());
                }
                else
                {
                    for (int i = 0; i < originalTable.Rows.Count; i++)
                    {
                        new_table.NewRow();
                        new_table.ImportRow(originalTable.Rows[i]);
                        if ((i + 1) == originalTable.Rows.Count)
                        {
                            new_table.TableName = tblName + "_" + j;
                            tables.Add(new_table.Copy());
                            new_table.Rows.Clear(); 
                            k++;
                        }
                        else if (++j == batchSize)
                        {
                            new_table.TableName = tblName + "_" + j;
                            tables.Add(new_table.Copy());
                            new_table.Rows.Clear();
                            k++; 
                            j = 0;
                        }
                    }
                }
                return tables;
            }

    Few issues here:

    1) If the first datatable contains 53140 rows, after execution of the split code I have datatables 0 to 6 with each datatable containing 10000 rows and last one containing 3139. There is one row missing. Not sure, where is the issue.

    2) Each datatable is having tablename as Table_10000 and the last one as Table_3139. I want the TableName as Table_1 , Table_2.. like this

    Thanks

    Monday, June 24, 2019 12:11 PM

Answers

  • The way you have written the code, it has a problem: you first bring in all of the rows into a DataTable, and then split the DataTable into several. This means that you are holding the data twice in memory and moving around large amounts of data. Ok, with 53000 rows it may not be too bad, but it can become an issue as the volume of data increases.

    As an alternative, I suggest paging the data at the server side. Instead of sending a query such as "select * from Names", you would instead send several queries similar to this one:

    "select * from Names ORDER BY primaryKeyName OFFSET 20000 ROWS FETCH NEXT 10000 ROWS ONLY"

    where of course the "20000" would change on every iteration (0, 10000, 20000, ...).

    In this way, every time you send the query you get one of the small DataTables. The data are only moved once and only stored once in memory.

    Note: The Offset...Fetch clause works for modern versions of SQL Sever. I presume that Oracle has something similar; after all this clause is standard in the latest ANSI SQL. If it doesn't, the same behavior can be simulated by adding a Where that compares with the last value read from the primary key.

    • Marked as answer by Venkatzeus Thursday, July 11, 2019 6:58 AM
    Thursday, June 27, 2019 10:26 AM
    Moderator
  • Hello,

    If the Oracle database is version 12 x consider using OFFSET x ROWS FETCH NEXT Y ROWS ONLY or LIMIT X,Y

    I'm on Oracle 11 so can not write a code sample but since SQL-Server has OFFSET ... here is an example that hits the table fours time (and is super fast, would expect Oracle to be fast too) to create four DataTable. In the table there are 91 records and all get returned.

    Note in the gist below I'm using a higher version of Visual Studio so you would need to adjust a tad.

    https://gist.github.com/karenpayneoregon/42b9f859b89758db60fca33f0ec5b880

    Oracle docs

    https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange


    Friday, June 28, 2019 2:16 PM
    Moderator

All replies

  • Hello,

    Take a look at the following page, first response will not work for VS2010 as it requires Framework 4.x and does work as I tool 4118 row DataTable from a database table and got four tables of 1000 and one of 118. So take a look at reply two which seems solid.


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Monday, June 24, 2019 1:33 PM
    Moderator
  • Hi Venkatzeus, 

    Thank you for posting here.

    For your question, I create a table with sixteen data and make a test, you could have a look.

    code:

            public static DataSet getDataSet(OracleConnection con)
            {
                DataSet ds = new DataSet();
    
                foreach (DataTable dataTable in getInfo("select * from Names", con, "Names"))
                {
                    ds.Tables.Add(dataTable);
                }
                //foreach (DataTable dataTable in getInfo("select * from Address", con, "Address"))
                //{
                //    ds.Tables.Add(dataTable);
                //}
                //foreach (DataTable dataTable in getInfo("select * from City", con, "City"))
                //{
                //    ds.Tables.Add(dataTable);
                //}
                //foreach (DataTable dataTable in getInfo("select * from Country", con, "Country"))
                //{
                //    ds.Tables.Add(dataTable);
                //}
                return ds;
            }
    
            private static List<DataTable> getInfo(string strQuery, OracleConnection con, string tblName)
            {
                List<DataTable> tables = new List<DataTable>();
                System.Data.DataTable dt1 = new System.Data.DataTable();
                using (OracleCommand cmd = new OracleCommand(strQuery))
                {
                    using (OracleDataAdapter da = new OracleDataAdapter())
                    {
                        cmd.CommandType = CommandType.Text;
                        cmd.Connection = con;
                        da.SelectCommand = cmd;
                        da.Fill(dt1);
                    }
                }         
                dt1.TableName = tblName;
                int cnt = dt1.Rows.Count;
                if (dt1.Rows.Count > 5)
                {
                    tables = SplitTable(dt1, 5, tblName);
                }
                return tables;
            }
            private static List<DataTable> SplitTable(DataTable originalTable, int batchSize, string tblName)
            {
                List<DataTable> tables = new List<DataTable>();
                DataTable new_table = new DataTable();
                new_table = originalTable.Clone();
                int j = 0;
                int k = 1;
                // add 'count' to record names of sheets.
                int count = 1; 
                if (originalTable.Rows.Count < batchSize)
                {
                    new_table.TableName = tblName + "-" + count;
                    new_table = originalTable.Copy();
                    tables.Add(new_table.Copy());
                }
                else
                {
                    for (int i = 0; i < originalTable.Rows.Count; i++)
                    {
                        new_table.NewRow();
                        new_table.ImportRow(originalTable.Rows[i]);
                        if (++j == batchSize)
                        {
                            new_table.TableName = tblName + "_" + count;
                            tables.Add(new_table.Copy());
                            new_table.Rows.Clear();
                            count++;
                            k++;
                            j = 0;
                        }
                        if ((i + 1) == originalTable.Rows.Count)
                        {
                            new_table.TableName = tblName + "_" + count;
                            tables.Add(new_table.Copy());
                            count = 0;
                            k++;
                        }
                    }
                }
                return tables;
            }

    Result:

    Best Regards,

    Xingyu Zhao


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, June 25, 2019 8:51 AM
    Moderator
  • Thank you very much for the reply.

    In the code provided, you have commented few lines in the method - getDataSet. I want the split method to be run for those methods as well.

    At the end , one list which contains all the split tables after running all the items in getDataset method. How to achieve this

    Thanks

    Thursday, June 27, 2019 9:29 AM
  • The way you have written the code, it has a problem: you first bring in all of the rows into a DataTable, and then split the DataTable into several. This means that you are holding the data twice in memory and moving around large amounts of data. Ok, with 53000 rows it may not be too bad, but it can become an issue as the volume of data increases.

    As an alternative, I suggest paging the data at the server side. Instead of sending a query such as "select * from Names", you would instead send several queries similar to this one:

    "select * from Names ORDER BY primaryKeyName OFFSET 20000 ROWS FETCH NEXT 10000 ROWS ONLY"

    where of course the "20000" would change on every iteration (0, 10000, 20000, ...).

    In this way, every time you send the query you get one of the small DataTables. The data are only moved once and only stored once in memory.

    Note: The Offset...Fetch clause works for modern versions of SQL Sever. I presume that Oracle has something similar; after all this clause is standard in the latest ANSI SQL. If it doesn't, the same behavior can be simulated by adding a Where that compares with the last value read from the primary key.

    • Marked as answer by Venkatzeus Thursday, July 11, 2019 6:58 AM
    Thursday, June 27, 2019 10:26 AM
    Moderator
  • Hi Venkatzeus,

    Thanks for your feedback,

    I have updated my code based on your description, you can have a look.

            DataTable Table_name = new DataTable();
            public Form1()
            {
                InitializeComponent();
            }
            public DataSet getDataSet(OracleConnection con)
            {
                DataSet ds = new DataSet();
                Table_name.TableName = "TableName";
                Table_name.Columns.Add("TableNames");
                foreach (DataTable dataTable in getInfo("select * from Names", con, "Names"))
                {    
                    ds.Tables.Add(dataTable);
                }
                foreach (DataTable dataTable in getInfo("select * from Address", con, "Address"))
                {
                    ds.Tables.Add(dataTable);
                }
                foreach (DataTable dataTable in getInfo("select * from City", con, "City"))
                {
                    ds.Tables.Add(dataTable);
                }
                foreach (DataTable dataTable in getInfo("select * from Country", con, "Country"))
                {
                    ds.Tables.Add(dataTable);
                }
                ds.Tables.Add(Table_name);
                return ds;
            }
    
            private List<DataTable> getInfo(string strQuery, OracleConnection con, string tblName)
            {
                List<DataTable> tables = new List<DataTable>();
                System.Data.DataTable dt1 = new System.Data.DataTable();
                using (OracleCommand cmd = new OracleCommand(strQuery))
                {
                    using (OracleDataAdapter da = new OracleDataAdapter())
                    {
                        cmd.CommandType = CommandType.Text;
                        cmd.Connection = con;
                        da.SelectCommand = cmd;
                        da.Fill(dt1);
                    }
                }         
                dt1.TableName = tblName;
                int cnt = dt1.Rows.Count;
                if (dt1.Rows.Count > 3)
                {
                    tables = SplitTable(dt1, 3, tblName);
                }
                return tables;
            }
            private List<DataTable> SplitTable(DataTable originalTable, int batchSize, string tblName)
            {
    
                List<DataTable> tables = new List<DataTable>();
                DataTable new_table = new DataTable();
                
                new_table = originalTable.Clone();
                int j = 0;
                int k = 1;
                // add 'count' to record names of sheets.
                int count = 1; 
                if (originalTable.Rows.Count < batchSize)
                {
                    new_table.TableName = tblName + "-" + count;
                    Table_name.Rows.Add(new_table.TableName);
                    new_table = originalTable.Copy();
                    tables.Add(new_table.Copy());
                }
                else
                {
                    for (int i = 0; i < originalTable.Rows.Count; i++)
                    {
                        new_table.NewRow();
                        new_table.ImportRow(originalTable.Rows[i]);
                        if (++j == batchSize)
                        {
                            new_table.TableName = tblName + "_" + count;
                            Table_name.Rows.Add(new_table.TableName);
                            tables.Add(new_table.Copy());
                            new_table.Rows.Clear();
                            count++;
                            k++;
                            j = 0;
                        }
                        if ((i + 1) == originalTable.Rows.Count)
                        {
                            new_table.TableName = tblName + "_" + count;
                            Table_name.Rows.Add(new_table.TableName);
                            tables.Add(new_table.Copy());
                            count = 0;
                            k++;
                        }
                    }
                }
                return tables;
            }
            private void button1_Click(object sender, EventArgs e)
            {
                string connstring = @"...";
                using (OracleConnection Conn = new OracleConnection(connstring))
                {
                    Conn.Open();
                    var m = getDataSet(Conn);
                    MessageBox.Show("Success");
                } 
            }

    Result:

    Hope it can help you.

    Best Regards,

    Xingyu Zhao


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.



    Friday, June 28, 2019 10:05 AM
    Moderator
  • Hello,

    If the Oracle database is version 12 x consider using OFFSET x ROWS FETCH NEXT Y ROWS ONLY or LIMIT X,Y

    I'm on Oracle 11 so can not write a code sample but since SQL-Server has OFFSET ... here is an example that hits the table fours time (and is super fast, would expect Oracle to be fast too) to create four DataTable. In the table there are 91 records and all get returned.

    Note in the gist below I'm using a higher version of Visual Studio so you would need to adjust a tad.

    https://gist.github.com/karenpayneoregon/42b9f859b89758db60fca33f0ec5b880

    Oracle docs

    https://oracle-base.com/articles/12c/row-limiting-clause-for-top-n-queries-12cr1


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange


    Friday, June 28, 2019 2:16 PM
    Moderator
  • Thank you so much
    Thursday, July 11, 2019 6:58 AM