Answered by:
How to read text that is present in text box of MS word document?

Question
-
Hi, I have an word document which I want to convert to text (.txt) file programmatically. I am using C# for this. I am able to read paragraphs and tables from word document and convert them to text. There are some textboxes in the word document and those textboxes contain text that I want to read and put them in text file. My problem is I do not know in which collection those textboxes are stored. For example, all tables are stored in tables collection, paragraphs in paragraphs collection. Can anyone please tell me how to read from these text boxes? Please let me know if you need any additional information.
I am using MS Office 2003.
Many Thanks,
Thanks and Regards, shekhar kotekarThursday, February 11, 2010 10:36 AM
Answers
-
Hi Shekhar,
You could use the Sentences collection to get access to the document text in the way you need it:
foreach (Word.Range sentence in doc.Sentences) { if (sentence.ShapeRange.Count > 0) { foreach (Word.Shape shape in sentence.ShapeRange) if (shape.Type == Microsoft.Office.Core.MsoShapeType.msoTextBox) Console.WriteLine(shape.TextFrame.TextRange.Text); } else Console.WriteLine(sentence.Text); }
Hope this helps
Marcel- Marked as answer by Chandrashekhar Kotekar Monday, February 22, 2010 1:18 PM
Monday, February 22, 2010 11:09 AM -
Shekhar,
Marcel is right, the Shapes collection is what you want.
If you are enumerating the Shapes collection you might want to do something like this:foreach (Microsoft.Office.Interop.Word.Shape shape in this.Application.ActiveDocument.Shapes) { if(shape.Type == Microsoft.Office.Core.MsoShapeType.msoTextBox) { // do something with shape.TextFrame.TextRange.Text } }
Cheers,
Aaron- Marked as answer by Bessie Zhao Tuesday, February 16, 2010 1:50 AM
Sunday, February 14, 2010 7:57 PM -
Hi Shekhar,
I think you should use the Shapes collection:
using Word = Microsoft.Office.Interop.Word; //[...] Word.Application wordApp = (Word.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Word.Application"); object firstShape = 1; string textFrameText = wordApp.ActiveDocument.Shapes.get_Item(ref firstShape).TextFrame.TextRange.Text;
Marcel- Marked as answer by Bessie Zhao Tuesday, February 16, 2010 1:50 AM
Saturday, February 13, 2010 7:48 PM -
Try this:
if(sentence.ListType == Word.WdListType.wdListBullet) {...}
Marcel- Marked as answer by Chandrashekhar Kotekar Monday, February 22, 2010 3:09 PM
Monday, February 22, 2010 1:47 PM
All replies
-
Hi Shekhar,
I think you should use the Shapes collection:
using Word = Microsoft.Office.Interop.Word; //[...] Word.Application wordApp = (Word.Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Word.Application"); object firstShape = 1; string textFrameText = wordApp.ActiveDocument.Shapes.get_Item(ref firstShape).TextFrame.TextRange.Text;
Marcel- Marked as answer by Bessie Zhao Tuesday, February 16, 2010 1:50 AM
Saturday, February 13, 2010 7:48 PM -
Shekhar,
Marcel is right, the Shapes collection is what you want.
If you are enumerating the Shapes collection you might want to do something like this:foreach (Microsoft.Office.Interop.Word.Shape shape in this.Application.ActiveDocument.Shapes) { if(shape.Type == Microsoft.Office.Core.MsoShapeType.msoTextBox) { // do something with shape.TextFrame.TextRange.Text } }
Cheers,
Aaron- Marked as answer by Bessie Zhao Tuesday, February 16, 2010 1:50 AM
Sunday, February 14, 2010 7:57 PM -
@Marcel Roma,
@Aaron Cathcart,Thanks for your help and I apologize for late reply.
I know about shapes collection and I am able to process shapes collections but my problem is, shapes collection is different from paragraph collection. So if some textbox comes in between two paragraphs, then I am not able to determine what is the location of textbox in actual doc file.
For example,Suppose we have text like this :
"This is 1st long paragraph of <-- paragraph #1
2-3 lines."["and this is text box and text within it"] <-- textbox
"This is 2nd paragraph and text within it." <-- paragraph #2
If we have doc file like above, then I want to convert this intot text file as below :
<para>"This is 1st long paragraph of
2-3 lines."</para><textbox>"and this is text box and text within it"</textbox>
<para>"This is 2nd paragraph and text within it."</para>
Currently with my code, I am getting output in text file like below :<para>"This is 1st long paragraph of
2-3 lines."</para><para>"This is 2nd paragraph and text within it."</para>
It is because I am taking paragraph collection from doc file and processing it one by one.Is there any way to process each element one by one?like,
foreach element in document
if element type is paragraph then
processAsParagraph()
else if element type is textbox then
processAsTextBox()
Thanks and Regards, shekhar kotekarMonday, February 22, 2010 7:10 AM -
Also,
There's data member called 'ID' associated with each paragraph and with each shape, but when I try to see the value of ID member with paragraph, I get null. So we cannot use ID also to keep track of all the elements of doc file.
Thanks and Regards, shekhar kotekarMonday, February 22, 2010 7:52 AM -
Hi Shekhar,
You could use the Sentences collection to get access to the document text in the way you need it:
foreach (Word.Range sentence in doc.Sentences) { if (sentence.ShapeRange.Count > 0) { foreach (Word.Shape shape in sentence.ShapeRange) if (shape.Type == Microsoft.Office.Core.MsoShapeType.msoTextBox) Console.WriteLine(shape.TextFrame.TextRange.Text); } else Console.WriteLine(sentence.Text); }
Hope this helps
Marcel- Marked as answer by Chandrashekhar Kotekar Monday, February 22, 2010 1:18 PM
Monday, February 22, 2010 11:09 AM -
Thank you very much Marcel. Now I am able to convert doc file to text file correctly.
Can I ask for some more help?
If there is bulleted text , then how to get the bullet information?
For example, If I have line like below :
- I am working on doc to text converter project.
I need to convert above line like :
<BULLET> I am working on doc to text converter project. </BULLET>
Thanks and Regards, shekhar kotekar- Marked as answer by Chandrashekhar Kotekar Monday, February 22, 2010 3:08 PM
- Unmarked as answer by Chandrashekhar Kotekar Monday, February 22, 2010 3:10 PM
Monday, February 22, 2010 1:16 PM -
Try this:
if(sentence.ListType == Word.WdListType.wdListBullet) {...}
Marcel- Marked as answer by Chandrashekhar Kotekar Monday, February 22, 2010 3:09 PM
Monday, February 22, 2010 1:47 PM -
Thanks a million !!!!!!
Thanks and Regards, shekhar kotekarMonday, February 22, 2010 3:10 PM