locked
Varchar(max)'ed to death RRS feed

  • Question

  • I had a developer approach me today about a possible table design.  He wants to allow a person to enter free-formed text (ie varchar(max)) into 14+ columns and allow for full-text indexing on all of the columns.

    I'm trying to come up with a possible solution for this.  I was curious if anyone had some experience with this type of design and if so, how they handled it.  I was thinking about separating the table into its own filegroup and isolating the full-text index as well.

    We should only expect about 1000 records at the start, but since this developer is in love with LINQ to SQL, I can only imagine what type of SQL queries he's going to try and generate.

    Any ideas?
    Thursday, February 25, 2010 9:21 PM

Answers

  • Uck. One of the major points I try to make to people is that anytime you provide a freeform location for data to be entered, you are asking for trouble. The biggest problem with it is that searching and managing is going to always be a problem.  Sure you have only 1000 rows, and even charindex searching will be fast, but it will never be pleasant. Full-text searching is useful as a tool if you must use it, but I would much prefer a solution more like what Brian suggested because doing real searches becomes far more possible. 

    The server will let you do this, and you are right with the ou can

    I also would shy away from XML unless you really need it to be dynamic...Still too difficult.

    For your solution, if this is a read only database, I would probably suggest having the columns you suggest, but also a set of columns/tables to break out the data into relational tables. Why have data if you can't easily use SQL naturally to search it? Realistically it might be a good deal more work, but the ability to use regular b-tree indexes and search with algorithms that can pinpoint data in well formed and checked columns is extremely useful too.

    So if you have the pieces and parts to query (his range of dates for an education, class taught etc) would be really useful, since all dates would be formatted,  you could pinpoint times between other times without having to be spelled out in the text), plus the text to full-text on when looking for something you haven't predefined would give you a lot of power and understanding of your data.
    Louis

    Wednesday, March 3, 2010 2:48 AM
  • Interestingly enough, the CAT just blogged about a related topic.
    Wednesday, March 3, 2010 1:40 PM
    Answerer
  • Brian,

    Interesting conclusion from the link you provided: "Promotion to relational column is the primary option you should consider when working with XML data"


    It reminds me of a saying: "You can run but not hide " from XML.  SSRS report definitions are in XML, SSIS packages are in XML, configuration files in XML and the list just continues and grows. 



    Kalman Toth, SQL Server & BI Training, SSAS OLAP, SSIS, SSRS; http://www.SQLUSA.com
    Wednesday, March 3, 2010 10:19 PM
  • The question is, what is this for?

    If, for example, this is for student essays, and there are 14+, the idea makes sense. If, however, this is for notes clients leave over time, this is a Bad Idea.

    Once that is clear, efficiencies can be addressed. But first, what is this for? And why 14+ COLUMNs?
    Thursday, February 25, 2010 9:45 PM
    Answerer

All replies

  • The question is, what is this for?

    If, for example, this is for student essays, and there are 14+, the idea makes sense. If, however, this is for notes clients leave over time, this is a Bad Idea.

    Once that is clear, efficiencies can be addressed. But first, what is this for? And why 14+ COLUMNs?
    Thursday, February 25, 2010 9:45 PM
    Answerer
  • It's for professor "resumes".  The developer meet with a group of professors and one had a 20+ page resume.  The ~14 columns is just the tentative list of columns he wants, such as Education, Awards, Service Activites, etc.  The problem is that he wants to allow them to use a WYSIWYG editor for each column and that will create hidden HTML that needs to be stored as well.

    I've been reading more into special storage in the SQL 08 Internals book (great book by the way) and think a combination of row-overflow and lob data, it could work.  He's e-mailed me back and narrowed some of the columns down to a defined varchar length, but we'd still have major row-overflow.  Here's the e-mail text for the columns and length:

    Brief Biography - 2500
    Education - 1500
    Employment History - 2500
    Courses Taught - 5000
    Leadership Positions - 5000
    Research - 2500
    Grants Awarded - 4000
    Publications - max
    Presentations - max
    Awards and Recognitions - max
    Professional Activities - 5000
    Service Activities - 5000
    Personal Information - max
    Skills or Expertise - 2500
    Professional Development or Certifications - 2500
    Friday, February 26, 2010 1:37 AM
  • I see no problem with the proposal.

    I can anticipate the LINQ to SQL may be disappointing for searching free-form text with FULL-TEXT indexing.
    "You cannot do a kindness too soon because you never know how soon it will be too late." -Ralph Waldo Emerson
    Friday, February 26, 2010 8:29 AM
    Answerer
  • Allowing for free text and then searching seems silly. If each item is a list, use another TABLE.

    For example:

    Resume
    ---------
    Id
    Professor
    Recieved

    Resume_Education
    ----------------------
    Resume (FK Resume)
    Institution
    Started
    Ended

    Resume_Awards
    -------------------
    Resume (FK Resume)
    Name
    Received

    Resume_Service_Activities
    -------------------------------
    Resume (FK Resume)
    Name
    Started
    Ended

    etc..

    Thar way, the items are easily searchable, and each item is kept separately.

    Whether the education institutions, awards, or service activities have their own lookups would be based on the control required.
    Friday, February 26, 2010 2:14 PM
    Answerer
  • Consider using XML column to store the resumes.

    Take a look at HumanResources.JobCandidate table in AdventureWorks2008.

    Sample fragment:

    <ns:Name>
        <ns:Name.Prefix></ns:Name.Prefix>
        <ns:Name.First>Shai</ns:Name.First>
        <ns:Name.Middle></ns:Name.Middle>
        <ns:Name.Last>Bassli</ns:Name.Last>
        <ns:Name.Suffix></ns:Name.Suffix>
      </ns:Name>
      <ns:Skills>
    I am an experienced and versatile machinist who can operate a range of machinery personally as well as supervise the work of other machinists. I specialize in diagnostics and precision inspection, have expertise in reading blueprints, and am able to call on strong interpersonal and communication skills to guide the work of other production machinists whose work I am called upon to inspect. 
    My degree in mechanical engineering affords me a better theoretical understanding and mathematical background than many other candidates in the machinist trade.
        </ns:Skills>
      <ns:Employment>
        <ns:Emp.StartDate>2000-06-01Z</ns:Emp.StartDate>

    Kalman Toth, SQL Server & BI Training, SSAS OLAP, SSIS, SSRS; http://www.SQLUSA.com
    Saturday, February 27, 2010 12:40 AM
  • Not a bad idea, Kalman. XML would work, and might have other benefits.

    I'd still, myself, probably separate them. From the explanation they seem like separate items, hence distinct attributes for the resume object. New attribute, new COLUMN. I almost see the XML as a pseudo-TABLE. That is, a compromise, "sure you can keep it in the same field, just make it it's own TABLE anyway."

    Hmm.. i think the difference would be what we want. Do we want a resume that is easily searchable, or do we want the search items from the submitted resumes (but throw the resume away).
    Monday, March 1, 2010 1:48 PM
    Answerer
  • Uck. One of the major points I try to make to people is that anytime you provide a freeform location for data to be entered, you are asking for trouble. The biggest problem with it is that searching and managing is going to always be a problem.  Sure you have only 1000 rows, and even charindex searching will be fast, but it will never be pleasant. Full-text searching is useful as a tool if you must use it, but I would much prefer a solution more like what Brian suggested because doing real searches becomes far more possible. 

    The server will let you do this, and you are right with the ou can

    I also would shy away from XML unless you really need it to be dynamic...Still too difficult.

    For your solution, if this is a read only database, I would probably suggest having the columns you suggest, but also a set of columns/tables to break out the data into relational tables. Why have data if you can't easily use SQL naturally to search it? Realistically it might be a good deal more work, but the ability to use regular b-tree indexes and search with algorithms that can pinpoint data in well formed and checked columns is extremely useful too.

    So if you have the pieces and parts to query (his range of dates for an education, class taught etc) would be really useful, since all dates would be formatted,  you could pinpoint times between other times without having to be spelled out in the text), plus the text to full-text on when looking for something you haven't predefined would give you a lot of power and understanding of your data.
    Louis

    Wednesday, March 3, 2010 2:48 AM
  • Interestingly enough, the CAT just blogged about a related topic.
    Wednesday, March 3, 2010 1:40 PM
    Answerer
  • Brian,

    Interesting conclusion from the link you provided: "Promotion to relational column is the primary option you should consider when working with XML data"


    It reminds me of a saying: "You can run but not hide " from XML.  SSRS report definitions are in XML, SSIS packages are in XML, configuration files in XML and the list just continues and grows. 



    Kalman Toth, SQL Server & BI Training, SSAS OLAP, SSIS, SSRS; http://www.SQLUSA.com
    Wednesday, March 3, 2010 10:19 PM
  • The way i see it, XML is a good way to transfer data, not a good way to store it.
    Thursday, March 4, 2010 4:51 AM
    Answerer
  • Agreed. And it is an excellent format for holding structures like report definitions, config files, etc. I just wouldn't want to have to do considerable searching across it...
    Louis

    Thursday, March 4, 2010 4:57 AM