none
使用OpenXMLSDKv2.5读取pptx文件中所有的文本内容(包括备注内容) RRS feed

  • 问题

  • 初次接触OpenXmlv2.5,下载到了SDK,已经安装上。我需要读取pptx文件中的所有文本内容。由于SDK内容太多,未找到解决问题的切入点,来此寻求帮助,有实例工程文件,或者核心代码最好。谢谢。

                                                                                                                                          2020年12月29日


    IT从业者

    2020年12月29日 8:40

全部回复

  • 你好,

    根据我的测试,你可以尝试下面的代码来使用OpenXML来获取到pptx文件中的所有文本内容。

    首先,你需要安装nuget包->DocumentFormat.OpenXml。

    其次,你需要添加相关using语句。

    using System;
    using System.Collections.Generic;
    using DocumentFormat.OpenXml.Presentation;
    using A = DocumentFormat.OpenXml.Drawing;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml;
    using System.Text;
    using System.Linq;

    最后,你可以参照下下面的示例代码:

     static void Main(string[] args)
            {
                string file = @"E:\\test.pptx";
                int numberOfSlides = CountSlides(file);
                Console.WriteLine("Number of slides = {0}", numberOfSlides);
                string slideText;
                for (int i = 0; i < numberOfSlides; i++)
                {
                    GetSlideIdAndText(out slideText, file, i);
                    Console.WriteLine("Slide #{0} contains: {1}", i + 1, slideText);
                }
                Console.ReadKey();
            }
    
            public static int CountSlides(string presentationFile)
            {
                // Open the presentation as read-only.
                using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
                {
                    // Pass the presentation to the next CountSlides method
                    // and return the slide count.
                    return CountSlides(presentationDocument);
                }
            }
    
            // Count the slides in the presentation.
            public static int CountSlides(PresentationDocument presentationDocument)
            {
                // Check for a null document object.
                if (presentationDocument == null)
                {
                    throw new ArgumentNullException("presentationDocument");
                }
    
                int slidesCount = 0;
    
                // Get the presentation part of document.
                PresentationPart presentationPart = presentationDocument.PresentationPart;
                // Get the slide count from the SlideParts.
                if (presentationPart != null)
                {
                    slidesCount = presentationPart.SlideParts.Count();
                }
                // Return the slide count to the previous method.
                return slidesCount;
            }
    
            public static void GetSlideIdAndText(out string sldText, string docName, int index)
            {
                using (PresentationDocument ppt = PresentationDocument.Open(docName, false))
                {
                    // Get the relationship ID of the first slide.
                    PresentationPart part = ppt.PresentationPart;
                    OpenXmlElementList slideIds = part.Presentation.SlideIdList.ChildElements;
    
                    string relId = (slideIds[index] as SlideId).RelationshipId;
    
                    // Get the slide part from the relationship ID.
                    SlidePart slide = (SlidePart)part.GetPartById(relId);
    
                    // Build a StringBuilder object.
                    StringBuilder paragraphText = new StringBuilder();
    
                    // Get the inner text of the slide:
                    IEnumerable<A.Text> texts = slide.Slide.Descendants<A.Text>();
                    foreach (A.Text text in texts)
                    {
                        paragraphText.Append(text.Text);
                    }
                    sldText = paragraphText.ToString();
                }
            }

    最后的结果,应该是在控制台上输出每一页有哪些文本,像下面这样:

    Best Regards,

    Jack


    MSDN Community Support
    Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

    2020年12月30日 2:18
  • 感谢,我测试一下。

    IT从业者

    2021年1月5日 8:36