none
How to break my pdf file in specific way RRS feed

  • Question

  • My question is how to split a pdf file in specific way. i am using ITextSharp library. i have folder where many pdf files will exist. i need to iterate in pdf file collection in loop and read each file one by one and split in specific way.

    1) suppose i have pdf file called abc.pdf which has 10 pages. user pass 3 to split routine. so i need to split pdf file in 3 small files. like abc-1.pdf will consist 3 pages, abc-2.pdf will consist 3 pages, abc-3.pdf will consist 4 pages.

    so i need to extract 3 pages from abc.pdf file and create a new pdf file called abc-1.pdf. again i will extract 3 more pages from abc.pdf file and will create another pdf file called abc-2.pdf. again i will extract 4 instead of 3 pages and will create another pdf file called abc-3.pdf.

    what will happen if user pass 4 to split routine. say user pass 4 to split routine. so i need to split pdf file like this way abc-1.pdf will consist 4 pages, abc-2.pdf will consist 6 pages.

    my below routine is working but if user pass 3 then it is creating 4 pdf files instead of 3. if user pass 4 then routine is creating 3 pdf file instead of 2.

    i know few modification is required in split routine but what and where to modify to achieve my goal not clear. so please help and show me what to modify in split routine to get my desire output.

    looking for guidance.
    private int Split(string pdffilelocation, int userinput, int PageCountBeforeSplit)
            {
    
                int totalpage = PageCountBeforeSplit;
                int pagesize = userinput;
    
                int newpagecount = totalpage % pagesize != 0
                    ? totalpage / pagesize + 1
                    : totalpage / pagesize;
    
    
                string pdfFilePath = pdffilelocation;
                string outputPath = TargetPdfFileLocation;
                int interval = userinput; // newpagecount;
                int pageNameSuffix = 0;
    
                PdfReader reader = new PdfReader(pdffilelocation);
    
                FileInfo file = new FileInfo(pdffilelocation);
                string pdfFileName = file.Name.Substring(0, file.Name.LastIndexOf(".")) + "-";
    
    
                for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber += interval)
                //for (int pageNumber = 1; pageNumber <= newpagecount; pageNumber += interval)
                {
                    pageNameSuffix++;
                    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
                    SplitAndSaveInterval(pdfFilePath, outputPath, pageNumber, interval, newPdfFileName);
                }
    
                return pageNameSuffix;
    
            }
    
            private void SplitAndSaveInterval(string pdfFilePath, string outputPath, int startPage, int interval, string pdfFileName)
            {
                using (PdfReader reader = new PdfReader(pdfFilePath))
                {
                    Document document = new Document();
                    PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + pdfFileName + ".pdf", FileMode.Create));
                    document.Open();
    
                    for (int pagenumber = startPage; pagenumber < (startPage + interval); pagenumber++)
                    {
                        if (reader.NumberOfPages >= pagenumber)
                        {
                            copy.AddPage(copy.GetImportedPage(reader, pagenumber));
                        }
                        else
                        {
                            break;
                        }
    
                    }
    
                    document.Close();
                }
            }



    • Edited by Sudip_inn Wednesday, August 8, 2018 7:25 PM
    Wednesday, August 8, 2018 7:25 PM

Answers

  • Hi Sudip_inn,

    Thank you for posting here.

    For your question, please try the code below. You could split PDF file with parameter interval. If I have 10 pages, the interval is 3. The PDF file will be split into 3 files with 3,3,4 pages. If the interval is 4, the PDF file will be split into 2 files with 4,6 pages. If the interval is 5, the PDF file will be split into 2 files with 6,4 pages and so on.

    using iTextSharp.text;
    using iTextSharp.text.pdf;
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    
    namespace ConsoleApp
    {
        /// <summary>
        /// Spilt PDF file.pdf
        /// </summary>
        class Split_PDF_File
        {
            static void Main(string[] args)
            {
                int interval = 2;
                Split(interval);
            }
    
            public static void Split(int interval)
            {
                string pdfFilePath = @"Split PDF file.pdf";
                string outputPath = @"C:\Users\wendyz.WICRESOFT.000\Desktop\PDF";
    
                int pageNameSuffix = 0;
    
                // Intialize a new PdfReader instance with the contents of the source Pdf file:  
                PdfReader reader = new PdfReader(pdfFilePath);
    
                FileInfo file = new FileInfo(pdfFilePath);
                string pdfFileName = file.Name.Substring(0, file.Name.LastIndexOf(".")) + "-";
    
                List<int> list = new List<int>();
                int value1 = reader.NumberOfPages / interval;
                int value2 = reader.NumberOfPages % interval;
    
                for (int i = 0; i < value1; i++)
                {
                    list.Add(interval);
                }
                int index = list.Count();
                if (list.Count == 1)
                {
                    list.Add(list[index-1] + value2);
                }
                else
                {
                    list[index - 1] = list[index - 1] + value2;
                }
    
    
                int startPage = 1;
                foreach (var page in list)
                {
                    pageNameSuffix++;
                    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
                    SplitAndSaveInterval(pdfFilePath, outputPath, startPage, page, newPdfFileName);
                    startPage = startPage + page;
                }
    
                //for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber += interval)
                //{
                //    pageNameSuffix++;
                //    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
                //    SplitAndSaveInterval(pdfFilePath, outputPath, pageNumber, interval, newPdfFileName);
                //}
    
            }
            public static void SplitAndSaveInterval(string pdfFilePath, string outputPath, int startPage, int page, string pdfFileName)
            {
                using (PdfReader reader = new PdfReader(pdfFilePath))
                {
                    Document document = new Document();
                    PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + pdfFileName + ".pdf", FileMode.Create));
                    document.Open();
    
                    for (int pagenumber = startPage; pagenumber < (startPage + page); pagenumber++)
                    {
                        if (reader.NumberOfPages >= pagenumber)
                        {
                            copy.AddPage(copy.GetImportedPage(reader, pagenumber));
                        }
                        else
                        {
                            break;
                        }
                    }
                    document.Close();
                }
            }
        }
    }
    

    If you have something else, please feel free to contact us.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Proposed as answer by Stanly Fan Friday, August 10, 2018 6:02 AM
    • Marked as answer by Sudip_inn Sunday, April 7, 2019 12:30 PM
    Thursday, August 9, 2018 9:33 AM
    Moderator

All replies

  • i have done the job this way.

            private void button1_Click(object sender, EventArgs e)
            {
                Split("c:\abc.pdf", 5, 36);
            }
    
            private void Split(string pdffilelocation, int userinput, int PageCountBeforeSplit)
            {
                int newsplit = 0;
                int totalpage = PageCountBeforeSplit;
                int pagesize = userinput;
                int mod = totalpage % pagesize;
    
                int newpagecount = totalpage % pagesize != 0
                    ? totalpage / pagesize + 1
                    : totalpage / pagesize;
    
                int interval = userinput;
    
                for (int pageNumber = 1; pageNumber < totalpage; pageNumber += interval)
                {
                    if (pageNumber + interval + mod < totalpage)
                    {
                        SplitAndSaveInterval(pageNumber, interval);
                    }
                    else
                    {
                        SplitAndSaveInterval(pageNumber, interval + mod);
                        pageNumber = totalpage;
                    }
                    newsplit++;
                }
    
                MessageBox.Show("New split "+newsplit.ToString());
            }
    
            private void SplitAndSaveInterval(int startPage, int interval)
            {
                int x = startPage;
                int y = interval;
                MessageBox.Show("Start Page " + startPage + " End Page " + interval);
            }

    Wednesday, August 8, 2018 8:27 PM
  • Working code

    Calling
    ============
            private void btnRead_Click(object sender, EventArgs e)
            {
                PDFPageCounter.Utilities.PdfOperations _pdf = new PDFPageCounter.Utilities.PdfOperations();
                _pdf.SourcePdfFileLocation = Environment.CurrentDirectory + @"\..\..\PDFDump";
                _pdf.TargetPdfFileLocation = Environment.CurrentDirectory + @"\..\..\PDFMoved";
                DataTable dt = _pdf.Process(5);
            }
    
    
    Utility
    ===============
    
        public class PdfOperations
        {
            public string SourcePdfFileLocation { get; set; }
            public string TargetPdfFileLocation { get; set; }
            private int totalPages { get; set; }
    
    
            public DataTable Process(int userinput)
            {
                int currentpagecount = 0;
                List<PdfData> pdfdata = null;
    
                DataTable dt = new DataTable();
                dt.Columns.Add("PdfGuidName", typeof(string));
                dt.Columns.Add("PdfLocation", typeof(string));
                dt.Columns.Add("PageCountBeforeSplit", typeof(int));
                dt.Columns.Add("PageCountAfterSplit", typeof(int));
    
    
                if (userinput > 0)
                {
                    string[] pdffiles = Directory.GetFiles(SourcePdfFileLocation, "*.pdf", SearchOption.AllDirectories);
    
                    if (pdffiles.Length > 0)
                    {
                        pdfdata = new List<PdfData>();
    
                        foreach (string pdffile in pdffiles)
                        {
                            //currentpagecount = GetPdfPageCount(pdffile);
                            totalPages = 0;
                            pdfdata.Add(new PdfData()
                            {
                                PdfGuidName = Path.GetFileName(pdffile),
                                PdfLocation = pdffile,
                                PageCountAfterSplit = Split(pdffile, userinput),
                                //PageCountBeforeSplit = currentpagecount,
                                PageCountBeforeSplit = totalPages,
                            });
                        }
                    }
    
                    if (pdfdata.Count > 0)
                    {
                        foreach (PdfData _data in pdfdata)
                        {
                            DataRow dr = dt.NewRow();
    
                            dr[0] = _data.PdfGuidName;
                            dr[1] = _data.PdfLocation;
                            dr[2] = _data.PageCountBeforeSplit;
                            dr[3] = _data.PageCountAfterSplit;
    
                            dt.Rows.Add(dr);
    
                            if(File.Exists(_data.PdfLocation))
                            {
                                File.Delete(_data.PdfLocation);
                            }
                        }
                    }
                }
    
                return dt;
            }
    
            private int Split(string pdffilelocation, int userinput)
            {
                int totalpage = 0, pagesize = 0, mod = 0, newpagecount = 0, interval = 0, pageNameSuffix=0;
                string pdfFilePath = "", outputPath="";
    
                pdfFilePath = pdffilelocation;
                outputPath = TargetPdfFileLocation;
    
                PdfReader reader = new PdfReader(pdffilelocation);
                totalPages = reader.NumberOfPages;
    
                interval = userinput; 
                totalpage = totalPages;
                pagesize = userinput;
                mod = totalpage % pagesize;
    
                newpagecount = totalpage % pagesize != 0
                    ? totalpage / pagesize + 1
                    : totalpage / pagesize;
    
                FileInfo file = new FileInfo(pdffilelocation);
                string pdfFileName = file.Name.Substring(0, file.Name.LastIndexOf(".")) + "-";
    
                
                for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber += interval)
                {
                    pageNameSuffix++;
                    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
    
                    if (pageNumber + interval + mod < totalpage)
                    {
                        SplitAndSaveInterval(pdfFilePath, outputPath, pageNumber, interval, newPdfFileName);
                    }
                    else
                    {
                        SplitAndSaveInterval(pdfFilePath, outputPath, pageNumber, (interval + mod), newPdfFileName);
                        pageNumber = totalpage;
                    }
                }
    
                return pageNameSuffix;
    
            }
    
            private void SplitAndSaveInterval(string pdfFilePath, string outputPath, int startPage, int interval, string pdfFileName)
            {
                using (PdfReader reader = new PdfReader(pdfFilePath))
                {
                    Document document = new Document();
                    PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + pdfFileName + ".pdf", FileMode.Create));
                    document.Open();
    
                    for (int pagenumber = startPage; pagenumber < (startPage + interval); pagenumber++)
                    {
                        if (reader.NumberOfPages >= pagenumber)
                        {
                            copy.AddPage(copy.GetImportedPage(reader, pagenumber));
                        }
                        else
                        {
                            break;
                        }
    
                    }
    
                    document.Close();
                }
            }
    
            //private int GetPdfPageCount(string pdflocation)
            //{
            //    int numberOfPages = 0;
    
            //    try
            //    {
            //        PdfReader pdfReader = new PdfReader(pdflocation);
            //        numberOfPages = pdfReader.NumberOfPages;
            //    }
            //    catch (Exception ex)
            //    {
            //        numberOfPages = 0;
            //    }
            //    return numberOfPages;
            //}
        }
    

    Thursday, August 9, 2018 6:59 AM
  • Hi Sudip_inn,

    Thank you for posting here.

    For your question, please try the code below. You could split PDF file with parameter interval. If I have 10 pages, the interval is 3. The PDF file will be split into 3 files with 3,3,4 pages. If the interval is 4, the PDF file will be split into 2 files with 4,6 pages. If the interval is 5, the PDF file will be split into 2 files with 6,4 pages and so on.

    using iTextSharp.text;
    using iTextSharp.text.pdf;
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    
    namespace ConsoleApp
    {
        /// <summary>
        /// Spilt PDF file.pdf
        /// </summary>
        class Split_PDF_File
        {
            static void Main(string[] args)
            {
                int interval = 2;
                Split(interval);
            }
    
            public static void Split(int interval)
            {
                string pdfFilePath = @"Split PDF file.pdf";
                string outputPath = @"C:\Users\wendyz.WICRESOFT.000\Desktop\PDF";
    
                int pageNameSuffix = 0;
    
                // Intialize a new PdfReader instance with the contents of the source Pdf file:  
                PdfReader reader = new PdfReader(pdfFilePath);
    
                FileInfo file = new FileInfo(pdfFilePath);
                string pdfFileName = file.Name.Substring(0, file.Name.LastIndexOf(".")) + "-";
    
                List<int> list = new List<int>();
                int value1 = reader.NumberOfPages / interval;
                int value2 = reader.NumberOfPages % interval;
    
                for (int i = 0; i < value1; i++)
                {
                    list.Add(interval);
                }
                int index = list.Count();
                if (list.Count == 1)
                {
                    list.Add(list[index-1] + value2);
                }
                else
                {
                    list[index - 1] = list[index - 1] + value2;
                }
    
    
                int startPage = 1;
                foreach (var page in list)
                {
                    pageNameSuffix++;
                    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
                    SplitAndSaveInterval(pdfFilePath, outputPath, startPage, page, newPdfFileName);
                    startPage = startPage + page;
                }
    
                //for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber += interval)
                //{
                //    pageNameSuffix++;
                //    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
                //    SplitAndSaveInterval(pdfFilePath, outputPath, pageNumber, interval, newPdfFileName);
                //}
    
            }
            public static void SplitAndSaveInterval(string pdfFilePath, string outputPath, int startPage, int page, string pdfFileName)
            {
                using (PdfReader reader = new PdfReader(pdfFilePath))
                {
                    Document document = new Document();
                    PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + pdfFileName + ".pdf", FileMode.Create));
                    document.Open();
    
                    for (int pagenumber = startPage; pagenumber < (startPage + page); pagenumber++)
                    {
                        if (reader.NumberOfPages >= pagenumber)
                        {
                            copy.AddPage(copy.GetImportedPage(reader, pagenumber));
                        }
                        else
                        {
                            break;
                        }
                    }
                    document.Close();
                }
            }
        }
    }
    

    If you have something else, please feel free to contact us.

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    • Proposed as answer by Stanly Fan Friday, August 10, 2018 6:02 AM
    • Marked as answer by Sudip_inn Sunday, April 7, 2019 12:30 PM
    Thursday, August 9, 2018 9:33 AM
    Moderator
  • you said -- You could split PDF file with parameter interval. If I have 10 pages, the interval is 3. The PDF file will be split into 3 files with 3,3,4 pages. If the interval is 4, the PDF file will be split into 2 files with 4,6 pages. If the interval is 5, the PDF file will be split into 2 files with 6,4 pages and so on.

    the logic i am looking for is

    1) if total page is 10 and user split into 2 files then two files will be generated each file will have 5 pages

    2) if total page is 10 and user split into 3 files then 3 files will be generated. first 2 files will have 3 pages and last pages will have 4 pages.

    3) if total page is 10 and user split into 4 files then 4 files will be generated. first 3 files will have 3 pages and last pages will have 1 pages.

    4) if total page is 10 and user split into 5 files then 5 files will be generated. each files will have 2 pages

    5) if total page is 10 and user split into 6 files then first 5 files will be generated with 1 page. last file will have 5 pages

    6) if total page is 10 and user split into 7 files then first 6 files will be generated with 1 page. last file will have 4 pages.

    7) if total page is 10 and user split into 8 files then first 7 files will be generated with 1 page. last file will have 3 pages.

    8) if total page is 10 and user split into 9 files then first 8 files will be generated with 1 page. last file will have 2 pages.

    8) if total page is 10 and user split into 10 files then 10 files will be generated with 1 page. 

    again

    1) if total page is 9 and user split into 2 files then first file will be generated with 4 page. last file will have 5 pages.2) if total page is 9 and user split into 3 files then each 3 files will have 3 pages.

    3) if total page is 9 and user split into 4 files then first 3 files will have 2 pages and last page will have 3 pages.

    4) if total page is 9 and user split into 5 files then first 4 files will have 2 pages and last page will have 2 pages.5) if total page is 9 and user split into 6 files then first 5 files will have 1 pages and last page will have 4 pages.

    6) if total page is 9 and user split into 7 files then first 6 files will have 1 pages and last page will have 3 pages.

    7) if total page is 9 and user split into 8 files then first 7 files will have 1 pages and last page will have 2 pages.

    8) if total page is 9 and user split into 9 files then each file will have 1 pages.

    hopefully i am clear what i am looking for. please test your code then you can understand it is not returning data as per my way.

    so my request sir if possible modify your routine as a result i can get my desire output. thanks


    Friday, August 10, 2018 7:21 PM
  • Hi Sudip_inn,

    I need to confirm a question with you first.

    >>4) if total page is 9 and user split into 5 files then first 4 files will have 2 pages and last page will have 2 pages.

    According to your feedback, it should be 1,1,1,1,5 pages. Right?

    If yes and you want to split a PDF file into the format like below. Please try the code below.

    pages  files  split
     9      2      4,5
     9      3      3,3,3
     9      4      2,2,2,3
     9      5      1,1,1,1,5
     9      6      1,1,1,1,1,4
     9      7      1,1,1,1,1,1,3
     9      8      1,1,1,1,1,1,1,2
     9      9      1,1,1,1,1,1,1,1,1

    Here is the code.

      static void Main(string[] args)
            {
                int interval = 2;
                Split(interval);
            }
    
            public static void Split(int interval)
            {
                string pdfFilePath = @"Split PDF file2.pdf";
                string outputPath = @"C:\Users\wendyz.WICRESOFT.000\Desktop\PDF";
    
                int pageNameSuffix = 0;
    
                // Intialize a new PdfReader instance with the contents of the source Pdf file:  
                PdfReader reader = new PdfReader(pdfFilePath);
    
                FileInfo file = new FileInfo(pdfFilePath);
                string pdfFileName = file.Name.Substring(0, file.Name.LastIndexOf(".")) + "-";
    
                List<int> list = new List<int>();
                int value1 = reader.NumberOfPages / interval;
                int value2 = reader.NumberOfPages % interval;
    
                if (interval != 1)
                {
                    for (int i = 0; i < interval - 1; i++)
                    {
                        list.Add(value1);
                    }
                    list.Add(value1 + value2);
                }
                else
                {
                    for (int i = 0; i < reader.NumberOfPages; i++)
                    {
                        list.Add(interval);
                    }
                }
    
                int startPage = 1;
                foreach (var page in list)
                {
                    pageNameSuffix++;
                    string newPdfFileName = string.Format(pdfFileName + "{0}", pageNameSuffix);
                    SplitAndSaveInterval(pdfFilePath, outputPath, startPage, page, newPdfFileName);
                    startPage = startPage + page;
                }
            }
    
            public static void SplitAndSaveInterval(string pdfFilePath, string outputPath, int startPage, int page, string pdfFileName)
            {
                using (PdfReader reader = new PdfReader(pdfFilePath))
                {
                    Document document = new Document();
                    PdfCopy copy = new PdfCopy(document, new FileStream(outputPath + "\\" + pdfFileName + ".pdf", FileMode.Create));
                    document.Open();
    
                    for (int pagenumber = startPage; pagenumber < (startPage + page); pagenumber++)
                    {
                        if (reader.NumberOfPages >= pagenumber)
                        {
                            copy.AddPage(copy.GetImportedPage(reader, pagenumber));
                        }
                        else
                        {
                            break;
                        }
                    }
                    document.Close();
                }
            }

    Best Regards,

    Wendy


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.



    Wednesday, August 15, 2018 3:04 AM
    Moderator
  • Sir it was nice answer. thanks
    Saturday, September 12, 2020 12:21 PM