將Word轉成Markdown:word2markdown

jopen 10年前發布 | 72K 次閱讀 word2markdown

這個工具能夠將 Word 轉成 Markdown,包含圖片和Math。 它由9個連續的步驟:

  1. Exporting to HTML using Microsoft Word 2012. We automated this on OS X using Automator. Solutions for other platforms are welcome!
  2. Extracting image types that we want to use. Keeps the original quality, unless that's a proprietary .emz file. In this step we also fix some math.
  3. Converting HTML to XML using tagsoup.
  4. Covert OOML (proprietary Word format) into MathML equations, using Microsoft's own conversion XSLT, and a custom version of this XSLT. Uses Saxon 8.
  5. Some intermediate fixes for whitespace and math.
  6. Conversion back into HTML using Tidy. Also strips a lot of stuff.
  7. More intermediate fixes to deal with shortcomings of Tidy and Pandoc.
  8. Conversion into Markdown using Pandoc.
  9. Lots of cleanup and final fixes to the Markdown.

環境要求

  • Mac OS X
  • Microsoft Office 2011
  • Pandoc
  • HTML Tidy
  • npm install in this directory
  • Open Microsoft Office, File->Save As Webpage->Compatibility->Encoding->UTF-8. Save, exit, and now you're good to go!

項目主頁:http://www.baiduhome.net/lib/view/home/1415695057461

 本文由用戶 jopen 自行上傳分享,僅供網友學習交流。所有權歸原作者,若您的權利被侵害,請聯系管理員。
 轉載本站原創文章,請注明出處,并保留原始鏈接、圖片水印。
 本站是一個以用戶分享為主的開源技術平臺,歡迎各類分享!