locked
Not able to organize query results to show the top 3 values for each grouping RRS feed

  • Question

  • Here are some tables with a simple PK/FK relationship:

    [REGION]
    region_id
    region
    [LANGUAGE]
    language_id
    region_id
    language
    [LEVEL]
    level_id
    region_id
    level

    I would like to display the top 3 languages spoken in each region where the level of understanding = 5.  This query gets me all the values I need, but I don't know how to organize them so they appear as a summarized grouping.

    select r.region, l.langauge, count(l.language) as lcount
    from Region r inner join Language l
    on r.region_id = l.region_id
    inner join Level v on r.region_id = v.region_id
    where v.level = 5
    group by r.region, l.language
    order by r.region, lcount desc

    I plan to display the results in a grid-style control (and not in a report), so I would have each region listed across the top of the grid with 3 rows beneath displaying the top 3 languages spoken in each of the individual regions (there can be several different languages, but I just want to pull out the most popular 3).

    region1 region2 region3 etc
    lang1   lang1   lang1
    lang2   lang2   lang2
    lang3   lang3   lang3
    Any suggestions on how to better write this query and organize the resulting data?

    Saturday, November 24, 2012 2:15 AM

Answers

    • Edited by Kalman Toth Saturday, November 24, 2012 9:01 AM
    • Marked as answer by jabberpunch Monday, November 26, 2012 3:08 PM
    Saturday, November 24, 2012 8:58 AM
  • Hi,

    Try with the below query

    WITH CTE AS(select r.region, l.langauge, count(l.language) as lcount, ROW_NUMBER() OVER(PARTITION BY r.region order by r.region) rn from Region r inner join Language l on r.region_id = l.region_id inner join Level v on r.region_id = v.region_id where v.level = 5

    )

    SELECT * FROM CTE WHERE rn=1


    PS.Shakeer Hussain

    • Marked as answer by jabberpunch Monday, November 26, 2012 3:08 PM
    Saturday, November 24, 2012 1:33 PM
  • Actually, if you need to show 3 top used languages, I think you need

    ;WITH CTE AS(select r.region, l.language, count(l.language) OVER (partition by r.region) as lCount
    from Region r inner join Language l
    on r.region_id = l.region_id
    inner join Level v on r.region_id = v.region_id
    where v.level = 5
    
    ), cte1 AS (select * from cte
    , ROW_NUMBER() OVER(PARTITION BY region order by lCount DESC) AS rn)
    
    SELECT * FROM CTE WHERE rn<=3 -- get 3 top used languages per region

    So, in the first cte we took count of each language per region, then in the second cte we ordered them by the count desc, so in the final result we took 3 top used languages per region.  For your final result you will need to PIVOT them, but let's first see if you get correct results using the above and if yes, post how the result looks like, I may help to pivot (although it's a simple static PIVOT).


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    • Edited by Naomi N Sunday, November 25, 2012 8:07 PM
    Sunday, November 25, 2012 7:47 PM

All replies

    • Edited by Kalman Toth Saturday, November 24, 2012 9:01 AM
    • Marked as answer by jabberpunch Monday, November 26, 2012 3:08 PM
    Saturday, November 24, 2012 8:58 AM
  • Hi,

    Try with the below query

    WITH CTE AS(select r.region, l.langauge, count(l.language) as lcount, ROW_NUMBER() OVER(PARTITION BY r.region order by r.region) rn from Region r inner join Language l on r.region_id = l.region_id inner join Level v on r.region_id = v.region_id where v.level = 5

    )

    SELECT * FROM CTE WHERE rn=1


    PS.Shakeer Hussain

    • Marked as answer by jabberpunch Monday, November 26, 2012 3:08 PM
    Saturday, November 24, 2012 1:33 PM
  • for your case of senarion you need to use Connect by prior clause along with pivot function... u need to convert all the regions as columns and display top 3 langaues spoken.

    http://stackoverflow.com/questions/959804/simulation-of-connect-by-prior-of-oracle-in-sql-server

    SQL Champ
    Database Consultants NY

    Sunday, November 25, 2012 5:04 AM
  • Have you result pointed desired answer?
    Sunday, November 25, 2012 7:42 AM
  • Wow, some really great information there and now I've jsut started reading about CTE's and ROW_NUMBER() OVER PARTITION BY.  It's going to take a bit to wrap my head around these concepts but should be really useful to me in the future!

    I also tried Syed's query straight up but received the following error:

    Column 'Region.region' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    I'm not sure in which part of the overall query I should be placing the Group By??

    Sunday, November 25, 2012 4:05 PM
  • I think in his query you should change count(l.Language) as lCount to

    COUNT(l.Language) OVER (partition by r.region) as lCount

    for the above query to work. Also, if you need 3 languages per region change this to

    select * from cte where Rn <=3

    Once you get that result, you will need to PIVOT to get your final result.


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog

    Sunday, November 25, 2012 4:34 PM
  • Hi Naomi,

    Would I change it to SELECT * FROM CTE WHERE lcount <=3 instead of "rn" since that is what you have aliased the OVER PARTITION BY with??

    This is all new to me, so just trying to figure out what's going on in this query exactly.  Going to look into Pivot function now to see how that is supposed to work.  Thanks!

    Sunday, November 25, 2012 6:54 PM
  • Actually, if you need to show 3 top used languages, I think you need

    ;WITH CTE AS(select r.region, l.language, count(l.language) OVER (partition by r.region) as lCount
    from Region r inner join Language l
    on r.region_id = l.region_id
    inner join Level v on r.region_id = v.region_id
    where v.level = 5
    
    ), cte1 AS (select * from cte
    , ROW_NUMBER() OVER(PARTITION BY region order by lCount DESC) AS rn)
    
    SELECT * FROM CTE WHERE rn<=3 -- get 3 top used languages per region

    So, in the first cte we took count of each language per region, then in the second cte we ordered them by the count desc, so in the final result we took 3 top used languages per region.  For your final result you will need to PIVOT them, but let's first see if you get correct results using the above and if yes, post how the result looks like, I may help to pivot (although it's a simple static PIVOT).


    For every expert, there is an equal and opposite expert. - Becker's Law


    My blog


    • Edited by Naomi N Sunday, November 25, 2012 8:07 PM
    Sunday, November 25, 2012 7:47 PM
  • >> Here are some tables with a simple PK/FK relationship: <<

    No there are no tables! There is a list of names and not one line of DDL. A table has to have a key, by definition. Posting DDL and sample data is minimal netiquette. Even the list of names our posted does not follow ISO-11179 rules. And you did not bother with sample data. 

    Guessing at what a polite person might have done, this data would be in one table. Language is an attribute of a region, not an entity in itself. 

    Did you even bother to research ISO codes for these attributes? Look up http://www.dmoz.org/guidelines/regional/taxonomy.html. What is the language_proficiency scale? I have seen a 0 (none) to 5 (native speaker). Thre is an ISO languagee code. 

    CREATE TABLE Regions
    (region_code CHAR(5) NOT NULL,
     language_code CHAR(3) NOT NULL
     CHECK (language_code LIKE '[a-z][a-z][a-z]'), --ISO 639-2 Code 
     PRIMARY KEY (region_code, language_code),
     language_proficiency SMALLINT DEFAULT 0 NOT NULL
     CHECK (language_proficiency BETWEEN 0 AND 5));

    >> I would like to display the top 3 languages spoken in each region where the level of understanding = 5. This query gets me all the values I need, but I don't know how to organize them so they appear as a summarized grouping. <<

    This specification useless. How you want to handle ties? What if there is only one language in a region at any proficiency level? What does “top” mean? Alphabetical order? Speaker population size? Your narrative implies speaker population, but there is no such data element in the vague narrative. If you had a properly designed normalized table, the query would have been simply:

    SELECT region_code, language_code 
      FROM Regions 
     WHERE language_proficiency = 5;

    If you need to get the region and language name, you can do it in the presentation layer, or with auxiliary tables which appear in REFERENCES .. ON UPDATE CASCADE clauses in the Regions table.  I tend to translate them in the front end so I can adjust the display language. 

    >> I plan to display the results in a grid-style control (and not in a report), so I would have each region listed across the top of the grid with 3 rows beneath displaying the top 3 languages spoken in each of the individual regions (there can be several different languages, but I just want to pull out the most popular 3).<<

    What is a “grid-style control” in SQL? In a tiered architecture, the database does the “data stuff” and throws it over the wall. Period. No display. That is a presentation layer. We do not do presentation layers here. 



    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL

    • Proposed as answer by Naomi N Sunday, November 25, 2012 8:09 PM
    • Unproposed as answer by jabberpunch Monday, November 26, 2012 3:09 PM
    Sunday, November 25, 2012 7:56 PM