處理Word文檔 - Python文本處理教學

要讀取word文檔，可使用python中的docx模組。首先安裝docx，如下所示。然後編寫一個程式，使用docx模組中的不同函數按段落讀取整個檔。

使用以下命令將docx模組放入程式環境中。

 pip install docx

在下面的示例中，通過將每個行附加到段落並最終列印出所有段落文本來讀取word文檔的內容。

import docx

def readtxt(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

print (readtxt('path\zaixianspoint.docx'))

當運行上面的程式時，我們得到以下輸出 -

zaixian Point originated from the idea that there exists a class of readers who respond
better to online content and prefer to learn new skills at their own pace from the comforts
of their drawing rooms.

The journey commenced with a single tutorial on HTML in 2006 and elated by the response it generated,
we worked our way to adding fresh tutorials to our repository which now proudly flaunts
a wealth of tutorials and allied articles on topics ranging from programming languages
to web designing to academics and much more.

讀取個別段落

可以使用paragraph屬性從word文檔中讀取特定段落。在下面的例子中，只讀取word文檔中的第二段。

import docx

doc = docx.Document('path\zaixianspoint.docx')
print len(doc.paragraphs)

print doc.paragraphs[2].text

當運行上面的程式時，我們得到以下輸出 -

The journey commenced with a single tutorial on HTML in 2006 and elated by the response
it generated, we worked our way to adding fresh tutorials to our repository
which now proudly flaunts a wealth of tutorials and allied articles on topics
ranging from programming languages to web designing to academics and much more.

上一篇：處理PDF 下一篇：閱讀RSS提要

HTML / CSS

腳本語言

高級語言

Java技術

XML技術

大數據

開發工具

框架

軟體測試

前端技術

資料庫

其他技術