[SQLite]Group By -> Expand again RRS feed

  • Question

  • I know it is not quite good place for SQLite question though,

    I'm using it with C#

    So, I'm posting a question 

    Sorry, for this.

    I'm handling some duplication issues.

    My query goes like this;

    myQuery = "SELECT *, COUNT(myName) AS myDuplicateCount  FROM myTable GROUP BY myName HAVING COUNT(*) > 1" ;  


    myQuery Result

    Now, I'd like to expand it again.

    So, my goal is like this

    How can I achieve it ?





    [EDIT 1]

    I've found some more tips.
    So, I almost there.

    Current status..

    -Used query like this;

    myQuery = "SELECT * FROM myTable WHERE myName IN (SELECT myName FROM myTable GROUP BY myName HAVING COUNT(*) > 1)";

    The last wall is..  "ORDER BY"
    Final should be like this;
    (odered.. apple > grape  and  blue>green>red....)

    Tips please !!!


    [EDIT 2]


    myQuere = "SELECT * FROM translation_units WHERE source_segment IN (SELECT source_segment FROM translation_units GROUP BY source_segment HAVING COUNT(*) > 1) ORDER BY source_segment, target_segment";

    Friday, September 7, 2018 12:59 PM


All replies

  • If you don't want the grouping then remove the group by clause. If you want all the records in C# then retrieve them all. You can always use LINQ on the C# side to group the data if you need to do that as well.

    Michael Taylor

    Friday, September 7, 2018 1:57 PM
  • Yes, you are right.

    C# can do all of them.

    But, I thought, If I finish (almost) all possible procedures within SQLite that should be faster.


    [I have Edited the main post]
    Friday, September 7, 2018 2:29 PM
  • Yes it would be faster if you could do it all in the DB but it really depends upon how many rows we're talking about. In your code you're making a full pass over the table for the grouping and then another pass to select the entire rows. If the bulk of the rows have more than 1 entry you're actually doing double duty, ignoring any database caching and whatnot that only perf monitoring will determine.

    If the bulk of the rows do have more than one then returning them all and filtering in C# will be more efficient. If the vast majority don't have dups then grouping first and then selecting would be sufficient in the DB. Of course if you're talking about millions of rows then you wouldn't want to send all that to C# anyway.

    Since this is the C# forum the answers will be related to using C#. If you want to do this in the DB then you should post in the SQLite forum. They can show you how to combine your 2 queries into a JOIN which would be much faster than a subquery. Something along these lines is how I'm leaning but I don't know the SQLite equivalent.

    -- Really shouldn't select all columns, that is really inefficient
    SELECT * FROM myTable t1
    (SELECT myName FROM myTable GROUP BY myName HAVING COUNT(*) > 1) t2 ON t1.myName = t2.myName

    Michael Taylor

    Friday, September 7, 2018 2:47 PM
  • Got it !!

    I have added ORDER BY phrase at the end of it.

    Thanks !!

    Friday, September 7, 2018 2:47 PM
  • woo


    I will read your comment carefully with time.

    The code I figured out are quite fast.

    My item are about millions.
    -it is a translation sentences data file (Korean <-->English)

    Thanks again.

    Friday, September 7, 2018 2:55 PM
  • ah..

    duplicated items are just 10 or 20.

    quite small amount.


    Friday, September 7, 2018 2:56 PM
  • I am C# boy not SQLite.

    My main concern is C# for the past, current and later.

    Please give me a room, do not try to expel me into SQLite place.
    I prefer to use LINQ though.. this case is (I think) some special.


    Friday, September 7, 2018 3:02 PM