none
Convert XLSX file to XLS file quickly RRS feed

  • Question

  • Hello,

    I have several XLSX files and I would like to convert each of its files to XLS format in a very fast time, that is to say the conversion of an XLSX file to XLS should take some 100 ms.
    To do this, I had thought of using Interop, but it is very slow, at least 8 seconds to convert an XLSX file to XLS.
    Here is my code with Interop:

    string fileNameXlsx = @"C:\file1.xlsx";
    
    string fileNameXls = @ "C:\file1.xls";
    
    _Application excelTemplate = new Excel.Application ();
    excelTemplate.Visible = false; // do not open the file
    
    excelTemplate.DisplayAlerts = false;
    WbTemplate Workbook;wbTemplate = excelTemplate.Workbooks.Open(fileNameXlsx);
    
    wbTemplate.SaveAs(fileNameXls, FileFormat: XlFileFormat.xlExcel8);
    wbTemplate.Close(true);
    excelTemplate.Quit();
     
    With Interop, it is very slow to convert XLSX to XLS.

    Do you know of a library in C# that allows you to convert an XLSX file to XLS very quickly (ideally 300 ms/file) ?
    If yes, could you give me an example of C # code, please?

    Thanks.
    • Edited by speed780 Monday, June 29, 2020 4:19 PM
    Monday, June 29, 2020 3:04 PM

All replies

  • The only code/library I've seen that can perform this fast is GemBox SpreadSheet which has a Load and Save method e.g. use Load to open a .xlsx then Save off the object for the Load method to save as .xls. Downside is the price tag.

    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange

    Monday, June 29, 2020 3:32 PM
    Moderator
  • Thank you for your reply. Do you know free version with no limitation licence, please?

    Thanks.

    Monday, June 29, 2020 3:44 PM
  • While 8 seconds seems really slow (perhaps it is just the COM startup time) why do you need 100ms? Are you trying to do something as part of a task that needs to run fast instead of making this an async call? Or is it a numbers things where you need to process 100s of files and you cannot do them in parallel for some reason?

    One thing I do notice about your code is that you're calling save twice. You're also closing the app each time so this is equivalent to opening Excel, saving the file and then closing it. If you wanted to process files fast you'd open Excel once and then open/save repeatedly.

    Of course the speed of opening and saving is completely dependent upon the size of the Excel file. There are other factors as well including if your AV is scanning the folder you are saving to, whether you have addins that slow things down, whether you're using Office 365 which may need to sync credentials, etc.


    Michael Taylor http://www.michaeltaylorp3.net

    Monday, June 29, 2020 4:13 PM
    Moderator
  • I have 8000 XLSX files and each size is 2MB.

    How do you open once Excel if you have 100 XLSX files?

    Thanks.

    Monday, June 29, 2020 4:25 PM
  • var app = new Excel.Application () {
       Visible = false, // do not open the file
       DisplayAlerts = false
    };
    
    //Not Tested!!!
    foreach (var file in files)
    {
       var template = app.Workbooks.Open(file);
       template.SaveAs(Path.ChangeExtension(file, ".xls"), FileFormat: XlFileFormat.xlExcel8);
    
       //Already saved, no reason to save again
       template.Close(false);
    };
    
    app.Quit();
    `files` is the list of filenames to save a copy of.

    Michael Taylor http://www.michaeltaylorp3.net

    Monday, June 29, 2020 4:40 PM
    Moderator
  • Thank you for your reply. It takes 6 seconds/file with your code. I have at least 8000 XLSX, so 48000 seconds. It is too slow. I'm wondering how to reduce this time?

    Thanks.


    • Edited by speed780 Monday, June 29, 2020 4:50 PM
    Monday, June 29, 2020 4:48 PM
  • Thank you for your reply. Do you know free version with no limitation licence, please?

    Thanks.

    No I don't, over the years I figured out the cost of such a library was well worth the cost, first used Apose Cells then moved to GemBox. Both get the job done no matter the task and done quickly.

    EDIT: I always present no cost solutions when available and know they work.


    Please remember to mark the replies as answers if they help and unmarked them if they provide no help, this will help others who are looking for solutions to the same or similar problem. Contact via my Twitter (Karen Payne) or Facebook (Karen Payne) via my MSDN profile but will not answer coding question on either.

    NuGet BaseConnectionLibrary for database connections.

    StackOverFlow
    profile for Karen Payne on Stack Exchange


    Monday, June 29, 2020 4:50 PM
    Moderator
  • " It takes 6 seconds/file with your code"

    How many files did you send it? If you sent it one then it won't take 6 seconds. If you sent it 10 or 100 and they are averaging 6 seconds then perhaps.

    If you do it sequentially then it is going to take a while but you can parallelize this into batches and speed it up dramatically. You're ultimately limited by your machine and Excel but even having 2 threads of work that divides the work into (theoretically) half. 4 threads would be quarters. This is how we work around large batches of work, not necessarily "speeding up" the code. 6 seconds still seems really slow though.

    I ran the above code against 5 2.5MB files and it took less than 6 seconds for all of them. Taking the perf hit to log the time after each call revealed that each file took ~1 second which is realistic for reading 2MB files and generating 8MB files in return. I used to work in soft realtime and 100ms isn't realistic for file operations of this size except in ideal circumstances, or it wasn't then.

    Of course going from 2.5 MB XSLX to XLS yielded 8MB files. Doing some math here - 10 MB * 8K files = 84 GB so it'll take a while unless you have an SSD. 


    Michael Taylor http://www.michaeltaylorp3.net

    Monday, June 29, 2020 5:13 PM
    Moderator
  • It's strange, I tested your code with one XLSX file and it took 6 seconds, but if I test with 100 XLSX files, it takes 28 seconds. 

    I have the impression that the more I put XLSX files to convert in your code, the faster it goes, how to explain that?

    By the way, why when the conversion from XLSX file to XLS file increases the size of XLS file so much?

    How do you parrallelize your previous code to gain time process about conversion from XLSX file to XLS file, please?

    Thanks.

    Monday, June 29, 2020 9:53 PM
  • Startup costs. Using interop requires that the COM server be spun up. Once it is running, reusing it (aka not quiting) allows it to run faster overall (ignoring the COM unload policies). Hence why I said that profiling a single file isn't sufficient. 

    "By the way, why when the conversion from XLSX file to XLS file increases the size of XLS file so much?"

    I cannot answer that. The XLSX format is XML so it would normally be bigger but it is possible the XLS binary format stores more data because it is using fixed-length structures whereas XML does not. You'd have to ask MS that question.

    "How do you parrallelize your previous code to gain time process about conversion from XLSX file to XLS file, please?"

    I probably wouldn't. Let's do the math. 100 files every 30 seconds means you'd do 8000 files in about 2400 seconds. That is 40 minutes. It'll take you longer to get the parallel code working most likely.

    But if you really want an example (not tested) then you're going to have to take the hit of creating the Excel object in each thread. Excel is COM based and IIRC it requires an STA which serializes all calls to a single thread. Therefore the thread must be created as an STA which you have to do manually. Once the STA is created you would run the code I posted earlier. However instead of passing all the files you want to process you'd have to manually split them up first into batches. The more files you run in a batch the longer it takes to run but the less overhead creating the COM object costs. In a perfect threading world where you had X processors running at once you could run X threads at once. However you most likely have at most 8 processors so 8 threads are maximal. Therefore I would recommend you break your work into batches of say 500 (which would take about 1.5 minutes each). Then enumerate through until you run out of files. Each batch would need to spawn a new thread as an STA. This gives you about 16 threads that are running in "parallel". Each thread would use the earlier posted code to run against the files it is given. Unfortunately the more threads you run the more instances of Excel in memory which means more memory and CPU will be needed.


    Michael Taylor http://www.michaeltaylorp3.net

    Monday, June 29, 2020 10:10 PM
    Moderator
  • Well, XLSX is "zipped XML files", but XLS is not a compressed format. (Consider when Excel is first released and everyone was still working on low speed CPU and floppy, using encryption that adds workload beyond opening the file will cause it very difficult to load the file by parts. It's design is to make it feasible to open the file by streams and substreams so applications and open exactly what it needs to access instead of need to transverse the whole file) So XLS files are naturally bigger than corresponding XLSX file.
    Tuesday, June 30, 2020 1:38 AM
    Answerer
  • Hi speed780,

    You can also use FreeSpire.XLS to do this, it is the free version of Spire.XLS..

    When I test it, it will be slightly faster than Microsoft.Office.Interop.Excel, and if the excel file is plain text without too much format, the xls file generated in this way will be smaller.

                string[] fileInfos = Directory.GetFiles(@"D:\test\excel");
                Workbook book = new Workbook();
                for (int i = 0; i < fileInfos.Length; i++)
                {
                    book.LoadFromFile(fileInfos[i]);
                    book.SaveToFile(@"D:\test\excel\test2\" + i + ".xls", ExcelVersion.Version97to2003);
                }

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Tuesday, June 30, 2020 2:15 AM
  • Thanks, but there is a watermark with Spire.XLS. It is not free to have the full features of Spire.XLS.

    Do you know another solution?

    Thanks.

    Tuesday, June 30, 2020 10:20 PM
  • Hi speed780,

    I tried searching for other feasible solutions but no results.

    Microsoft.Office.Interop.Excel may be the best choice now.

    Best Regards,

    Timon


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    Thursday, July 2, 2020 1:46 AM