none
How to validate .csv files in the incomming request to an web application RRS feed

  • Question

  • Hi, 

    I have an MVC apllication where we are validating all the incomming(uploaded) files by checking their  HEX Signature. For example 

    .pdf will contain an HEX signature [ 25 50 44 46 2d ] and if the incomming file contains the same signature then we allow it to be saved into our system. But for .csv there isnt any HEX signature so if any one saves an .exe as .csv file and upload i will not be able to validate it. So in this scenario how can i validate the csv file?

    Thursday, August 22, 2019 6:31 AM

All replies

  • There is no "signature" in .csv files, so you will need to examine the content and apply some heuristics to determine if it has the structure that you are expecting.

    First, take the array of bytes that you receive and convert it into a string by means of GetString (in System.Text.Encoding). Then, examine the content of the string. For example, you can use a Regex to verify that it is made out of blocks of ascii characters separated by commas and newlines (if that is what you expect in your CSVs). Or you can split the string into lines and then loop through the lines verifying that they have a series of fields separated by commas. How far you go here depends on how much information you have about the expected content of your CSVs and how thorough you want to be in your validations.

    Thursday, August 22, 2019 8:13 AM
    Moderator
  • In general, it is a difficult problem to establish that a given file's contents actually match its extension.  It may be more productive for you just to establish a set of "banned" signatures instead.  EXE and ELF are easy to check.

    Tim Roberts | Driver MVP Emeritus | Providenza & Boekelheide, Inc.

    Thursday, August 22, 2019 6:49 PM
  • Depending on how many "long text fields" and "how long a typical long text field" would be, I would look for "file size to number of comma(0x2C) ratio" to determine whether it's a valid CSV file.

    Say, usually for autopay records for banks anything above 5% would be good enough.

    That said, people who knows you implemented this check can fool it easily by appending a number of commas behind it (They don't affect execution of unsigned EXE file), so signature check for some unwanted signiture is better.

    Alternatively, you can mandate the CSV file need to contain header row and use the whole header row as signature for your application. This gives you additional benefit that if your schema changes and added/removed fields, the header check will know it and tell the user.


    Friday, August 23, 2019 1:33 AM
    Answerer