Python處理OpenXML的類庫 openxmllib

fmms 14年前發布 | 39K 次閱讀 Python Python開發

openxmllib 為 Python 語言提供了用來處理 OpenXML 文檔的類庫,要求 lxml 的支持。

Office Open XML格式使用Open Packaging Conventions,XML Paper Specification (XPS)也使用它。但是,這兩種格式在許多重要的方面是不同的。XPS是一個頁面內的,固定的文檔格式,它是在Microsoft Windows Vista操作系統當中所引入的。而Office Open XML格式是面向Office Word 2007,Office Excel 2007,和Office PowerPoint 2007的完全可編輯的文件格式。雖然它們在XML和ZIP壓縮的使用方面有很多相似的地方,但是它們在文件格式的設計和使用目的上還是有著很大的不同。

These examples say all::

  >>> import openxmllib
  >>> doc = openxmllib.openXmlDocument(path='office.docx')
  >>> # Raises a ValueError on not supported office files.
  >>> doc.mimeType
  'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
  >>> doc.coreProperties # Keys may depend on application
  {'title': u'blah...', u'creator': u'John Doe', ...}
  >>> doc.extendedProperties # Keys may depend on application
  {'Words': u'312', 'Application': u'Your favorite word processor', ...}
  >>> doc.customProperties # May return an empty mapping
  {'My property': u'My value', ...}
  >>> doc.allProperties # Merges core+extended+custom properties (see above)
  {...}
  >>> doc.indexableText(include_properties=False)
  u'all the words of that document body'
  >>> doc.indexableText(include_properties=True)
  u'all the words of that document body and all properties values'

Standard ``mimetypes`` package extensions ::

  >>> import mimetypes
  >>> mimetypes.guess_type('somedoc.docx')
  ('application/vnd.openxmlformats-officedocument.wordprocessingml.document', None)
  >>> mimetypes.guess_type('somecalc.xlsx')
  ('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', None)
  >>> mimetypes.guess_type('someslides.pptx')
  ('application/vnd.openxmlformats-officedocument.presentationml.presentation', None)

Document factory signatures::

  >>> # We have the path for the office file
  >>> doc = openxmllib.openXmlDocument(path='office.docx')
  >>> # We have a file object for the office file
  >>> fh = open('office.docx', 'rb')
  >>> doc = openxmllib.openXmlDocument(file_='office.docx')
  >>> # We have the URL for the office file
  >>> doc = openxmllib.openXmlDocument(url='http://domain.tld/office.docx')
  >>> # Xe have the raw data of the office file
  >>> import mimetypes
  >>> docx_mimetype = mimetypes.guess_type('office.docx')
  >>> body = open('office.docx', 'rb').read()
  >>> doc = open(data=body, mime_type=docx_mimetype)

Note that if you're not running a Python application, you may get the indexable
text from a document with the `openxmlinfo.py` console utility. Just type::

項目主頁:http://www.baiduhome.net/lib/view/home/1326939723124

 本文由用戶 fmms 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!