none
Update Table1 based on calculated values from Table2 RRS feed

  • Question

  • HOW can I update Table1 based on calculated values from Table2?  Sample tables:

    Table1:
    TAXOR      SNAME      SAND
    Alfisols          ?             ?
    Andisols        ?             ?
     
    Table2:
    TAXOR      SNAME     SAND
    Alfisols        A             1.10
    Alfisols        B             2.23
    Alfisols        A            3.90
    Alfisols        A            1.80
    Alfisols        B           4.00
    Alfisols        C           2.00
    Andisols      E           1.10
    Andisols      E           2.30
    Andisols      F           1.00

    DESIRED RESULT FOR TABLE1:
    TAXOR      SNAME     SAND
    Alfisols         A           2.505
    Andisols       E           1.466

    Steps for calculating SNAME:

     Foreach unique TAXOR:
      - Count all unique values in SNAME
      - Get the largest among the values
      - e.g. A = 3; B = 2; C = 1 for Alfisols; so Alfisols' SNAME = A
        Andisols' SNAME:  E = 2; F = 1;       so Andisols' SNAME = E


    Steps for calculating SAND:

    FOR EACH UNIQUE TAXOR:
      - Alfisols average: (1.10 + 2.23 + 3.90 + 1.80 + 4.00 + 2.00) / number of rows of Alfisols
      - Andisols average: (1.10 + 2.30 + 1.00) / number of rows of Andisols

    I'm thinking of creating an array to store each unique value for SNAME.  But there are over

    2,000 unique values! while total number of table2 rows is over half a million.

    I'm currently doing this in T-SQL since the tables are stored in SQL Server 2008. But there

    are over 100 fields to calculate; quite tedious to individual update the fields for each TAXOR.

    Is there a better way to update Table1 using C# and ADO.NET? I still have several tables to

    update having the same scenario.  Any suggestion is very much appreciated.

    Friday, May 7, 2010 2:04 AM

Answers

  • Hi Marilyn,

     

    Glad to see you again!  J

     

    If you are using VS2008, I would recommend you to use LINQ to DataSet to handle the issue.   However, since there are 2000 unique values for the SNAME column, do we have some rules to differentiate which value is largest?   I have small sample for your references:

    =============================================================================

            private void button1_Click(object sender, EventArgs e)

            {

                DataTable table1 = new DataTable();

                table1.Columns.Add("TAXOR", typeof(string));

                table1.Columns.Add("SNAME", typeof(string));

                table1.Columns.Add("SAND", typeof(decimal));

     

                DataTable table2 = table1.Clone();

                table2.Rows.Add("Alfisols", "A", 1.10);

                table2.Rows.Add("Alfisols", "B", 2.23);

                table2.Rows.Add("Alfisols", "A", 3.90);

                table2.Rows.Add("Alfisols", "A", 1.80);

                table2.Rows.Add("Alfisols", "B", 4.00);

                table2.Rows.Add("Alfisols", "C", 2.00);

                table2.Rows.Add("Andisols", "E", 1.10);

                table2.Rows.Add("Andisols", "E", 2.30);

                table2.Rows.Add("Andisols", "F", 1.00);

     

                var query = from t2 in table2.AsEnumerable()

                            group t2 by t2.Field<string>("TAXOR") into list

                            select new

                            {

                                TAXOR = list.Key,

                                SNAME = GetValue(list.Select(l => l.Field<string>("SNAME"))),

                                SAND = list.Average(l => l.Field<decimal>("SAND"))

                            };

     

                foreach (var q in query)

                {

                    table1.Rows.Add(q.TAXOR, q.SNAME, q.SAND);

                }

            }

     

            private string GetValue(IEnumerable<string> list)

            {

                Dictionary<string, int> dic = new Dictionary<string, int>();

                dic.Add("A", 3);

                dic.Add("B", 2);

                dic.Add("C", 1);

                dic.Add("E", 2);

                dic.Add("F", 1);

     

                return list.First(l => dic[l] == dic.Where(d => list.Contains(d.Key)).Max(d => d.Value));

            }

        }

    =============================================================================

    The GetValue method will return the max SNAME value.    In my sample, we create a Dictionary to store the key and every unique value.   Such a Dictionary can be a global variable and be initialized only once to improve the performance.  

     

    If you have any questions, please feel free to let me know.

     

    Have a nice weekend!

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, May 7, 2010 8:45 AM
    Moderator
  • Hi,

    The SNAME column have distinct values for each TAXOR (eg., 'Alfisols' have about 2,447 distinct SNAMES, 'Andisols' have about 367, etc).  At this point, the only way I'm determining the "largest" or most dominant soil (and for that matter, for any of the text fields), is by counting the number of times a particular SNAME (text field) shows up in the rows for each of the unique TAXOR value (at least that's my boss's solution).

    The following SQL command works at this time for text fields (my alternate method loops through for each  order in "orders" array):

    string

     

    cmd = "UPDATE TaxOrder SET SNAME = (SELECT TOP 1 SNAME FROM " +

     

    "ConusSolLyr WHERE TAXCL <> '' AND TAXOR = '" + order + "'" +

     

    "GROUP BY SNAME ORDER BY COUNT(*) DESC) WHERE TAXOR = '" + order + "'";

    For numeric fields, I'm using the following sample SQL Command:

    string

     

    cmd = "UPDATE TaxOrder SET SAND  = (SELECT SUM(SAND )/COUNT(*) " +

     

    "FROM (SELECT SAND FROM ConusSolLyr " +

     

    "WHERE TAXCL <> '' AND TAXOR = '" + order + "' " +

     

    "GROUP BY SAND )X)WHERE TAXOR = '" + order + "'";

    I'm just wondering if there's more efficient way of doing this.

    Thanks for your attention.

    Marilyn

    Texas AgriLife Research Center

     

    Monday, May 10, 2010 1:37 PM
  • Hi Marilyn,

     

    I modified the GetValue method to fit to your scenario.  

    ========================================================================

            private string GetValue(IEnumerable<string> list)

            {

                var query = from l in list

                            group l by l into groups

                            select new { Value = groups.Key, Count = groups.Count() };

     

                var value = query.First(q => q.Count == query.Max(l => l.Count)).Value;

     

                return value;

            }

    ========================================================================

    Then all the operation can be put at the client side (inside DataSet).   However, it can be less efficient than working at the database side via T-SQL commands.   If the operation can be done at the database side, I would recommend you let the database to deal with it.   My solution is for your references if you want to use LINQ to DataSet.  J  

     

    If you want to improve the SQL statements that you are using, I would suggest you post the question at the T-SQL forum because many SQL experts may have better ideas on this problem, http://social.technet.microsoft.com/Forums/en-us/transactsql/threads.

     

    Have a great day!

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Tuesday, May 11, 2010 7:21 AM
    Moderator

All replies

  • Hi Marilyn,

     

    Glad to see you again!  J

     

    If you are using VS2008, I would recommend you to use LINQ to DataSet to handle the issue.   However, since there are 2000 unique values for the SNAME column, do we have some rules to differentiate which value is largest?   I have small sample for your references:

    =============================================================================

            private void button1_Click(object sender, EventArgs e)

            {

                DataTable table1 = new DataTable();

                table1.Columns.Add("TAXOR", typeof(string));

                table1.Columns.Add("SNAME", typeof(string));

                table1.Columns.Add("SAND", typeof(decimal));

     

                DataTable table2 = table1.Clone();

                table2.Rows.Add("Alfisols", "A", 1.10);

                table2.Rows.Add("Alfisols", "B", 2.23);

                table2.Rows.Add("Alfisols", "A", 3.90);

                table2.Rows.Add("Alfisols", "A", 1.80);

                table2.Rows.Add("Alfisols", "B", 4.00);

                table2.Rows.Add("Alfisols", "C", 2.00);

                table2.Rows.Add("Andisols", "E", 1.10);

                table2.Rows.Add("Andisols", "E", 2.30);

                table2.Rows.Add("Andisols", "F", 1.00);

     

                var query = from t2 in table2.AsEnumerable()

                            group t2 by t2.Field<string>("TAXOR") into list

                            select new

                            {

                                TAXOR = list.Key,

                                SNAME = GetValue(list.Select(l => l.Field<string>("SNAME"))),

                                SAND = list.Average(l => l.Field<decimal>("SAND"))

                            };

     

                foreach (var q in query)

                {

                    table1.Rows.Add(q.TAXOR, q.SNAME, q.SAND);

                }

            }

     

            private string GetValue(IEnumerable<string> list)

            {

                Dictionary<string, int> dic = new Dictionary<string, int>();

                dic.Add("A", 3);

                dic.Add("B", 2);

                dic.Add("C", 1);

                dic.Add("E", 2);

                dic.Add("F", 1);

     

                return list.First(l => dic[l] == dic.Where(d => list.Contains(d.Key)).Max(d => d.Value));

            }

        }

    =============================================================================

    The GetValue method will return the max SNAME value.    In my sample, we create a Dictionary to store the key and every unique value.   Such a Dictionary can be a global variable and be initialized only once to improve the performance.  

     

    If you have any questions, please feel free to let me know.

     

    Have a nice weekend!

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Friday, May 7, 2010 8:45 AM
    Moderator
  • Hi Again!

    Thanks for your reply.  I'm interested in the dictionary.  However, the values are not constant.  They are merely counters for each unique SNAME (Soil names).  I do have a table containing 13,299 unique US soil names. 

    What I'm trying to do is:

    1)  From one big contiguous US table, extract hierarchical tables representing soil taxonomic ORDER, SUBORDER, GREATGROUP, SUBGROUP, PARTICLE-SIZE, CLASS (with ORDER being at the top level).  I'm starting with ORDER table right now (which has the fewest number of rows).  If this works, I can apply the logic to the rest of the hierarchy--I hope.

    2) The contiguous US table consists of over 500,000 records.  ORDER has 12 distinct TAXOR values (Alfisols, Andisols, Aridisols, Entisols, Gelisols, Histosols, Inceptisols, Mollisols, Oxisols, Spodosols, Ultisols, Vertisols).  I think we can make an array for this.

      There are about 80+ fields in the US table and I need to count (text fields)/get averages (for numeric fields).  An array is also possible to hold these fields.

    3.  So, I'm thinking of looping through the order array; and then looping through each field; then apply the formulas for counting/getting averages for each field. 

    The numeric fields are straightforward: just get the average for each field within each order.

    The text fields are the tricky ones:  I need a counter for each unique value (impossible to create a dictionary because the numeric value is simply based on actual count, which we can't know beforehand, unless there's a way to do this).  The one with the largest count is then assigned to that order.

    I came across a neat code snippet for determining the maximum value; and I wonder if I can work it into the code:

    //Get the largest number in an array

     

    int[] snameCount = { 12, 7, 15, 2 }; // assuming I'm able to count each unique SNAME 

    int mxm = snameCount [0];  

    for (int i = 0; i < snameCount.Length; i++)  

    if (snameCount[i] > mxm)  

    {

    mxm = snameCount[i];

    }

    {

     

     

    I'm really grateful for your guidance.  Thanks so much,

    Marilyn

    Friday, May 7, 2010 4:27 PM
  • Hi Marilyn,

     

    Thank you for following up!   So the ORDER table can have 12 distinct values for TAXOR, then how about the SNAME column?   How can we determine which value in the SNAME is the largest?   I don’t quite understand how to calculate its value.  Could you please explain it more, maybe with some samples?  :-)  Thanks a lot!   I will do my best to help you!

     

    Have a nice day!

     

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Monday, May 10, 2010 8:34 AM
    Moderator
  • Hi,

    The SNAME column have distinct values for each TAXOR (eg., 'Alfisols' have about 2,447 distinct SNAMES, 'Andisols' have about 367, etc).  At this point, the only way I'm determining the "largest" or most dominant soil (and for that matter, for any of the text fields), is by counting the number of times a particular SNAME (text field) shows up in the rows for each of the unique TAXOR value (at least that's my boss's solution).

    The following SQL command works at this time for text fields (my alternate method loops through for each  order in "orders" array):

    string

     

    cmd = "UPDATE TaxOrder SET SNAME = (SELECT TOP 1 SNAME FROM " +

     

    "ConusSolLyr WHERE TAXCL <> '' AND TAXOR = '" + order + "'" +

     

    "GROUP BY SNAME ORDER BY COUNT(*) DESC) WHERE TAXOR = '" + order + "'";

    For numeric fields, I'm using the following sample SQL Command:

    string

     

    cmd = "UPDATE TaxOrder SET SAND  = (SELECT SUM(SAND )/COUNT(*) " +

     

    "FROM (SELECT SAND FROM ConusSolLyr " +

     

    "WHERE TAXCL <> '' AND TAXOR = '" + order + "' " +

     

    "GROUP BY SAND )X)WHERE TAXOR = '" + order + "'";

    I'm just wondering if there's more efficient way of doing this.

    Thanks for your attention.

    Marilyn

    Texas AgriLife Research Center

     

    Monday, May 10, 2010 1:37 PM
  • Hi Marilyn,

     

    I modified the GetValue method to fit to your scenario.  

    ========================================================================

            private string GetValue(IEnumerable<string> list)

            {

                var query = from l in list

                            group l by l into groups

                            select new { Value = groups.Key, Count = groups.Count() };

     

                var value = query.First(q => q.Count == query.Max(l => l.Count)).Value;

     

                return value;

            }

    ========================================================================

    Then all the operation can be put at the client side (inside DataSet).   However, it can be less efficient than working at the database side via T-SQL commands.   If the operation can be done at the database side, I would recommend you let the database to deal with it.   My solution is for your references if you want to use LINQ to DataSet.  J  

     

    If you want to improve the SQL statements that you are using, I would suggest you post the question at the T-SQL forum because many SQL experts may have better ideas on this problem, http://social.technet.microsoft.com/Forums/en-us/transactsql/threads.

     

    Have a great day!

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Tuesday, May 11, 2010 7:21 AM
    Moderator
  • Hi,

    Thanks for your insight.  I have my T-SQL codes working properly.  Just needed to know from experts if this is an efficient way of doing things.  Apparently, it is :)

    Again, many thanks.

     

    Marilyn

     

    Tuesday, May 11, 2010 4:33 PM
  • My pleasure!  :-)

     

    Best Regards,
    Lingzhi Sun

    MSDN Subscriber Support in Forum

    If you have any feedback on our support, please contact msdnmg@microsoft.com


    Please remember to mark the replies as answers if they help and unmark them if they provide no help.
    Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.
    Wednesday, May 12, 2010 1:24 AM
    Moderator