locked
Detect file type?

    Question

  •  

    How would I detect the filetype of a file?

     

    Thanks,

    paoloTheCool

    Wednesday, September 26, 2007 7:19 PM

Answers

  • For files that contain a header (e. g. .bmp, .wav), you can open the file and examine it to determine it.  For files that don't have a header (e. g. .txt), then you will have to come up with some other way to do it.

     

    Usually the file's extension identifies the file type.  If you don't trust that, then you will have to examine the file.

     

    Chris

    Wednesday, September 26, 2007 9:21 PM
  • Let's not even think about what if you're trying to examine a encrypted file (without having it decrypted by the file system for you).

    The only two ways is to go by header (or footer in some cases) or by file extention. I could go create my own file type with a specific format, no header, and call it a sff. How would you expect to know what this is? It could be something like a xaml file where analysis of it (ignoring any headers) would tell you it might be a text file... or a xml file... or a xaml file...

    I think you should stick to file extention, then check file types with tag information for the prescense of the proper tagging. If the file extention fails, you can check for tags. However if the tag checks fail treat it as unknown (my only slight clarification of Chris' explanation).

    What is your goal in checking for file type?
    Wednesday, September 26, 2007 10:25 PM

All replies

  • if u mean find out the extension, like .dll, .jpg, .mp3, etc.  then just use:
    System.IO.Path.GetExtension(string path);
    or
    System.IO.FileInfo f = new System.IO.FileInfo("C\\:blah.txt");
    return f.Extension;

    Wednesday, September 26, 2007 7:49 PM
  • Getting extension and finding out file type is completly different.  I could save a file with a  ".jpeg"  when it is really a ".ico" file.

     

    Thanks,

    paoloTheCool

    Wednesday, September 26, 2007 8:48 PM
  • there isnt a way to access the "file type" by those standerds.  .exe = App file. .jpg, .bmp, .png = Image file. .mp3, .mp4, .ogg = audio file.  they only way to find out the file type is by the extension, im prty sure about that.
    Wednesday, September 26, 2007 9:19 PM
  • For files that contain a header (e. g. .bmp, .wav), you can open the file and examine it to determine it.  For files that don't have a header (e. g. .txt), then you will have to come up with some other way to do it.

     

    Usually the file's extension identifies the file type.  If you don't trust that, then you will have to examine the file.

     

    Chris

    Wednesday, September 26, 2007 9:21 PM
  • Let's not even think about what if you're trying to examine a encrypted file (without having it decrypted by the file system for you).

    The only two ways is to go by header (or footer in some cases) or by file extention. I could go create my own file type with a specific format, no header, and call it a sff. How would you expect to know what this is? It could be something like a xaml file where analysis of it (ignoring any headers) would tell you it might be a text file... or a xml file... or a xaml file...

    I think you should stick to file extention, then check file types with tag information for the prescense of the proper tagging. If the file extention fails, you can check for tags. However if the tag checks fail treat it as unknown (my only slight clarification of Chris' explanation).

    What is your goal in checking for file type?
    Wednesday, September 26, 2007 10:25 PM
  • Ok...so I guess there's no easy method for accuratly detecting the file type easily.  I'll just use extensions.

     

    Thanks,

    paoloTheCool

     

    Thursday, September 27, 2007 4:48 PM
  • Kloot, try understanding the question before you answer it.

    You aren't getting this one, so just leave it be.

    If you don't understand the question, or think you should because you need to show how "smart" you are, DON'T.


    "Better to remain silent and be thought a fool than to open one's mouth and remove all doubt."  -- Walt Whitman (I believe)
    Wednesday, February 06, 2008 5:54 AM
  • Kloot, try understanding the question before you answer it.

    You aren't getting this one, so just leave it be.

    If you don't understand the question, or think you should because you need to show how "smart" you are, DON'T.


    "Better to remain silent and be thought a fool than to open one's mouth and remove all doubt."  -- Walt Whitman (I believe)

    Wednesday, February 06, 2008 5:55 AM
  • you created an account just to insult someone?

    Kloot was mistaken, but his mistake is an extremely common one among inexperienced computer users.

     

    It's common enough that even some operating systems and major applications make it that are created by multibillion dollar companies.

     

    yes, I'm talking about Microsoft Windows here. Under NT4 and Windows 95 Internet Explorer would use the extension and the extension only to determine which external application to use to show files it couldn't render on its own (and it may well have used extensions only for internal display as well).

    This despite http actually sending the file type in the request headers, so it DID have the type available (if the server correctly sent it of course).

     

    So before you make a fool of yourself again by telling someone else how foolish he is, do some research (or better yet, forget about it).
    Wednesday, February 06, 2008 8:20 PM
  • Paolo, there may be an easy way. But it all depends on what you're trying to detect.

    If it's a specific filetype (like JPEG images), all you need to do is figure out the constant signature of the filetype and check for that in files.

    While not 100% certain (some types may share a signature, for example) it's pretty reliable.

     

    Of course if you are planning to detect all kinds of files, and don't know in advance which you will get, that's indeed not something that's going to work.

     

    But in that case extensions also won't work. And they're unreliable anyway, as it's quite possible to use different extensions than the standard ones.

    And of course multiple file formats may well share the same extension.

    For example the .doc and .dat extensions are quite common, and used for many different actual formats.

    Wednesday, February 06, 2008 8:26 PM
  • dang...this is an old thread..

     

    Quoting Rob V.
    Kloot, try understanding the question before you answer it.

    You aren't getting this one, so just leave it be.

    If you don't understand the question, or think you should because you need to show how "smart" you are, DON'T.


    "Better to remain silent and be thought a fool than to open one's mouth and remove all doubt."  -- Walt Whitman (I believe)

     

     


    Seriously, why did you create an account just to insult someone...just for the record, you didn't give any insight into the topic so Cameron's reply is more valuable than yours...

     

    paoloTheCool

    Wednesday, February 06, 2008 9:11 PM
  • Lookup "Winista" it uses binary analysis to determine a files MIME type...

    Say someone renames a exe with a jpg extension... you can still determine the "real" file format.  It doesn't detect swf's or flv's but does pretty much every other well known format + you can get a hex editor and add more files it can detect.

    HTH
    Wednesday, May 28, 2008 11:38 PM
  • Where Winista fail's to detect the real file format, I've resorted back to the URLMon method:


        public class urlmonMimeDetect
        {
            [DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
            private extern static System.UInt32 FindMimeFromData(
                System.UInt32 pBC,
                [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
                [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
                System.UInt32 cbSize,
                [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
                System.UInt32 dwMimeFlags,
                out System.UInt32 ppwzMimeOut,
                System.UInt32 dwReserverd
            );

            public string GetMimeFromFile(string filename)
            {
                if (!File.Exists(filename))
                    throw new FileNotFoundException(filename + " not found");

                byte[] buffer = new byte[256];
                using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read))
                {
                    if (fs.Length >= 256)
                        fs.Read(buffer, 0, 256);
                    else
                        fs.Read(buffer, 0, (int)fs.Length);
                }
                try
                {
                    System.UInt32 mimetype;
                    FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
                    System.IntPtr mimeTypePtr = new IntPtr(mimetype);
                    string mime = Marshal.PtrToStringUni(mimeTypePtr);
                    Marshal.FreeCoTaskMem(mimeTypePtr);
                    return mime;
                }
                catch (Exception e)
                {
                    return "unknown/unknown";
                }
            }

        }

     

    From inside the Winista method, I fall back on the URLMon here:

     

           public MimeType GetMimeTypeFromFile(string filePath)
            {
                sbyte[] fileData = null;
                using (FileStream srcFile = new FileStream(filePath, FileMode.Open, FileAccess.Read))
                {
                    byte[] data = new byte[srcFile.Length];
                    srcFile.Read(data, 0, (Int32)srcFile.Length);
                    fileData = Winista.Mime.SupportUtil.ToSByteArray(data);
                }

                MimeType oMimeType = GetMimeType(fileData);
                if (oMimeType != null) return oMimeType;

                //We haven't found the file using Magic (eg a text/plain file)
                //so instead use URLMon to try and get the files format
                Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect urlmonMimeDetect = new Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect();
                string urlmonMimeType = urlmonMimeDetect.GetMimeFromFile(filePath);
                if (!string.IsNullOrEmpty(urlmonMimeType))
                {
                    foreach (MimeType mimeType in types)
                    {
                        if (mimeType.Name == urlmonMimeType)
                        {
                            return mimeType;
                        }
                    }
                }

                return oMimeType;


    Jeremy - MCP | MCAD.Net | MCSD.Net
    Monday, April 19, 2010 6:24 AM