Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

What would be the best approach to count number of words in a microsoft word doc

ID: 658853 • Letter: W

Question

What would be the best approach to count number of words in a microsoft word document in terms of attributes like font size or font color . Eg: Return number of words in red font. I can code that but basically I need a push in the right direction . Spent my entire day with ElementTree package,python and realized it wasn't the best approach.

My first preference is any python package , so I'm looking forward for some suggestions. win32com , elementtree , lxml were in my mind. Not sure yet.

Non pythonic approaches are also welcome.

Explanation / Answer

For a non-Pythonic approach you could consider a couple of other possibilities using MS Word as a starting point.

You can use words find, click on the advanced tag and you can search for a given set of characteristics and use find all - the downside is that you need to know in advance which characteristics you are looking for.
You could export your document from word to a less complex format such as html and then use either python tools such as beautiful soup or regular expressions to parse the resulting html file. This would probably be the quickest option to implement but does rely on having Word available.
You can use a recent version of pandoc to do the conversion above and then use BS to parse.

Hire Me For All Your Tutoring Needs
Integrity-first tutoring: clear explanations, guidance, and feedback.
Drop an Email at
drjack9650@gmail.com
Chat Now And Get Quote