locked
t-sql 2012 message counts should be combined RRS feed

  • Question

  • In a t-sql 2012 database, I am suppose to count the number of times each 'canned' message is used for
     Elementary students in the last school year and the current school  year.

    The following is the sql that I currently am using:
    Declare @Delimiter char(2) = '. ';

     ;With GetTheComments(Comments) As
     (SELECT GS.Comments
      FROM dbo.Enrol Enrol
      JOIN dbo.Student Student
          ON Student.StudentID = Enrol.StudentID
      JOIN dbo.GS GS
          ON GS.StudentID = Student.Studentid
           AND (GS.Comments IS NOT NULL)
        AND (GS.Comments <> '')
      where Enrol.grade in ('KG','01','02','03','04','05','06')
      and Enrol.endYear between 2016 and 2017)

      ),
      Pieces (Comments, start, stop) AS (
    SELECT Comments, CAST(1 AS bigint) As start, CAST(CHARINDEX(@Delimiter, Comments) AS bigint) As stop
    From GetTheComments
    UNION ALL
    SELECT Comments, CAST(stop + Len(@Delimiter) As bigint) As start, CAST(CHARINDEX(@Delimiter, Comments, stop + Len(@Delimiter)) AS bigint) As stop
    FROM Pieces
    WHERE stop > 0),

    EachComment(Comments) As
    (SELECT LTrim(SUBSTRING(Comments, start, CASE WHEN stop > 0 THEN stop-start ELSE LEN(Comments) END)) AS Comments
    FROM Pieces)

    Select Comments, Count(*) As Counts
    From EachComment
    where Comments <> ''
    Group By Comments
    Order By Counts Desc, Comments Asc;
    An example of multiple messages in the one GS.Comments field would look like the following:
    This student is good at math. This student turns in their assignments on time. This student seems to enjoy school.

     An example of when one messages is in  the one GS.Comments field would look like the following:

    This student seems to enjoy school

    The problem is the GS.Comments field is defined as varchar(1200). There can be one message in the field and/or there can be lots of messages in this field. Each message ends with a period and there is a space between each message that have
    more than 1 message in the varchar(1200) field. When there is only 1 message, there is no period afer the message.

    The problem is there can be results that look like the following

    message                                                                count
    This student seems to enjoy school.                           150
    This student seems to enjoy school                              25
    .
    Basically there is a count for a message with a period and there is a couunt for when there is not a period for the same message.

    Thus can you show me what I can do with the sql listed above so the messages are treated as being the same. Basically I want 1 message with 1 count for what I listed above.

    Thursday, May 25, 2017 10:02 PM

Answers

  • If the data is that well-behaved, it should work with changing this line:

    SELECT GS.Comments

    to read:

    SELECT CASE WHEN right(rtrim(GS.Comments), 1) = '.'             THEN GS.Comments
                ELSE GS.Comments + '. '
           END  AS Comments

    Thursday, May 25, 2017 10:16 PM
  • Apparently you have access to the data, since you are running queries over it. And you have even posted examples from it.

    Anyway, we only need to see a few examples that illustrate the problem. So you need to work with extract that. And, who knows, maybe that process will reveal that some assumptions you have made about the data are not correct.

    As I said, parsing free-text data like this is precarious, and I cannot type queries over data I don't see and I only have a verbal description of. My experience tells me that there are often more variation than one like to think. So be able to help you, I need the examples as INSERT statement. Adn you need to verify that the sample you post exhibits the problem you are seeing.

    Wednesday, May 31, 2017 7:28 AM

All replies

  • If the data is that well-behaved, it should work with changing this line:

    SELECT GS.Comments

    to read:

    SELECT CASE WHEN right(rtrim(GS.Comments), 1) = '.'             THEN GS.Comments
                ELSE GS.Comments + '. '
           END  AS Comments

    Thursday, May 25, 2017 10:16 PM
  • I tried what you suggested with modifying my original sql with the following:

    Select CASE WHEN right(rtrim(Comments), 1) = '.'
                THEN Comments
             ELSE Comments + '.'
           END  AS Comments
    , Count(*) As Counts
    From EachComment
    where Comments <> ''
    Group By Comments
    --Order By Counts Desc, Comments Asc;
    Order BY Comments Asc;

    All I ended up with getting an extra space added to the expression I was trying to compare with. This caused me to get the same results. Maybe I could remove the period from all the expressions. That would probably work.

    Thus would you show me how to accomplish what I am trying to do and/or maybe suggest another alternative.

    Friday, May 26, 2017 9:03 PM
  • It's difficult to give advice without actually seeing the data. What I tried last night should work - if it is as simple as you presented it. Alas, free-text columns rarely are that well-behaved, but there tends to be a lot of variation.

    Can you post CREATE TABLE statements and INSERT statements with some sample text and your query? I guess that for this purpose you can rip out the Students and Enrol tables, so we only need to work with the Comments table. Make sure that the sample data produces the problem you are seeing.

    Friday, May 26, 2017 9:59 PM
  • Just replace undesirable characters in your group by at the top

    Like so:

    Select replace(replace(comment, '.', ''), ',', '')

    From sometable

    this will strip away the discrepencies in your comments and give you the correct result

    Pesonally though it may be better to say:

    With mydata as(

    Select Comment, ROW_NUMBER()OVER(Partition by Comment, order by Comment) as RN

    WHERE comment like '%Enjoyed School%')

    Select max(rn), Comment

    group by comment

    Order by comment


    Monday, May 29, 2017 3:24 AM
  • In response to your question for more information, here it is:

    1. The table definition is the following:

    CREATE TABLE [dbo].[GS](
     [scoreID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, 
     [personID] [int] NOT NULL,
     [date] [smalldatetime] NULL,
     [comments] [varchar](1200) NULL
     )
     
     CONSTRAINT [PK_GS] PRIMARY KEY NONCLUSTERED
    (
     [scoreID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 97) ON [PRIMARY],
     CONSTRAINT [UQ_GS_externalLMSSourcedId] UNIQUE NONCLUSTERED
     )
     [externalLMSSourcedId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

    2 Here is the example of where the problem exists:
    Comments                                                       counts

     Being a good listener is very important in learning.           758
     Being a good listener is very important in learning              61 
     Remember, good listening makes learning fun                      856 
     Remember, good listening makes learning fun.                      67

    As you can see, the messages are same except for the existence of the period in the statements.

    3. Here is sql to get results from only getting from GS table:

    Select comments, count(*) as counts
    from  GS
    group by comments
    order by comments

    My original sql was on the original post.

    Can you tell me what you suggest that I try next?



    Tuesday, May 30, 2017 1:02 AM
  • I can not use ' '%Enjoyed School%', since there are lots of comments where I need count the number of times the values is used.
    Tuesday, May 30, 2017 1:05 AM
  • Show us your query after you added what Erland suggested as that meets what you described as the issue
    Tuesday, May 30, 2017 2:48 AM
  • To be able to help you, we also need INSERT statements with sample data that demonstrates the problem. Sorry, I am not going to make guesses about the data.

    Tuesday, May 30, 2017 7:05 AM
  • I do not have same insert data. The is a third party user interface is where the data is entered. The best I can do is to show you some rows from the table which I did. is there anything else I can show you?
    Tuesday, May 30, 2017 11:44 PM
  • Apparently you have access to the data, since you are running queries over it. And you have even posted examples from it.

    Anyway, we only need to see a few examples that illustrate the problem. So you need to work with extract that. And, who knows, maybe that process will reveal that some assumptions you have made about the data are not correct.

    As I said, parsing free-text data like this is precarious, and I cannot type queries over data I don't see and I only have a verbal description of. My experience tells me that there are often more variation than one like to think. So be able to help you, I need the examples as INSERT statement. Adn you need to verify that the sample you post exhibits the problem you are seeing.

    Wednesday, May 31, 2017 7:28 AM