What would be the best approach to count number of words in a microsoft word doc
ID: 658853 • Letter: W
Question
What would be the best approach to count number of words in a microsoft word document in terms of attributes like font size or font color . Eg: Return number of words in red font. I can code that but basically I need a push in the right direction . Spent my entire day with ElementTree package,python and realized it wasn't the best approach.
My first preference is any python package , so I'm looking forward for some suggestions. win32com , elementtree , lxml were in my mind. Not sure yet.
Non pythonic approaches are also welcome.
Explanation / Answer
For a non-Pythonic approach you could consider a couple of other possibilities using MS Word as a starting point.
You can use words find, click on the advanced tag and you can search for a given set of characteristics and use find all - the downside is that you need to know in advance which characteristics you are looking for.
You could export your document from word to a less complex format such as html and then use either python tools such as beautiful soup or regular expressions to parse the resulting html file. This would probably be the quickest option to implement but does rely on having Word available.
You can use a recent version of pandoc to do the conversion above and then use BS to parse.
Related Questions
drjack9650@gmail.com
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.