Java 中常用的幾種 DOCX 轉 PDF 方法
DOCX2PDF
將DOCX文檔轉化為PDF是項目中常見的需求之一,目前主流的方法可以分為兩大類,一類是利用各種Office應用進行轉換,譬如Microsoft Office、WPS以及LiberOffice,另一種是利用各種語言提供的對于Office文檔讀取的接口(譬如Apache POI)然后使用專門的PDFGenerator庫,譬如IText進行PDF構建。總的來說,從樣式上利用Office應用可以保證較好的樣式,不過相對而言效率會比較低。其中Microsoft Office涉及版權,不可輕易使用(筆者所在公司就被抓包了),WPS目前使用比較廣泛,不過存在超鏈接截斷問題,即超過256個字符的超鏈接會被截斷,LiberOffice的樣式排版相對比較隨意。而利用POI接口進行讀取與生成的方式性能較好,適用于對于格式要求不是很高的情況。另外還有一些封裝好的在線工具或者命令行工具,譬如 docx2pdf 與 OfficeToPDF 。
MicroSoft Office
本部分的核心代碼如下,全部代碼參考 這里 :
private ActiveXComponent oleComponent = null;
private Dispatch activeDoc = null;
private final static String APP_ID = "Word.Application";
// Constants that map onto Word's WdSaveOptions enumeration and that
// may be passed to the close(int) method
public static final int DO_NOT_SAVE_CHANGES = 0;
public static final int PROMPT_TO_SAVE_CHANGES = -2;
public static final int SAVE_CHANGES = -1;
// These constant values determine whether or not tha application
// instance will be displyed on the users screen or not.
public static final boolean VISIBLE = true;
public static final boolean HIDDEN = false;
/**
- Create a new instance of the JacobWordSearch class using the following
- parameters.
*
- @param visibility A primitive boolean whose value will determine whether
- or not the Word application will be visible to the user. Pass true
- to display Word, false otherwise.
*/
public OfficeConverter(boolean visibility) {
this.oleComponent = new ActiveXComponent(OfficeConverter.APP_ID);
this.oleComponent.setProperty("Visible", new Variant(visibility));
}
/**
- Open ana existing Word document.
*
- @param docName An instance of the String class that encapsulates the
- path to and name of a valid Word file. Note that there are a few
- limitations applying to the format of this String; it must specify
- the absolute path to the file and it must not use the single forward
- slash to specify the path separator.
*/
public void openDoc(String docName) {
Dispatch disp = null;
Variant var = null;
// First get a Dispatch object referencing the Documents collection - for
// collections, think of ArrayLists of objects.
var = Dispatch.get(this.oleComponent, "Documents");
disp = var.getDispatch();
// Now call the Open method on the Documents collection Dispatch object
// to both open the file and add it to the collection. It would be possible
// to open a series of files and access each from the Documents collection
// but for this example, it is simpler to store a reference to the
// active document in a private instance variable.
var = Dispatch.call(disp, "Open", docName);
this.activeDoc = var.getDispatch();
}
/**
- There is more than one way to convert the document into PDF format, you
- can either explicitly use a FileConvertor object or call the
- ExportAsFixedFormat method on the active document. This method opts for
- the latter and calls the ExportAsFixedFormat method passing the name
- of the file along with the integer value of 17. This value maps onto one
- of Word's constants called wdExportFormatPDF and causes the application
- to convert the file into PDF format. If you wanted to do so, for testing
- purposes, you could add another value to the args array, a Boolean value
- of true. This would open the newly converted document automatically.
*
- @param filename
*/
public void publishAsPDF(String filename) {
// The code to expoort as a PDF is 17
//Object args = new Object{filename, new Integer(17), new Boolean(true)};
Object args = new Object {
filename, new Integer(17)
} ;
Dispatch.call(this.activeDoc, "ExportAsFixedFormat", args);
}
/**
- Called to close the active document. Note that this method simply
- calls the overloaded closeDoc(int) method passing the value 0 which
- instructs Word to close the document and discard any changes that may
- have been made since the document was opened or edited.
*/
public void closeDoc() {
this.closeDoc(JacobWordSearch.DO_NOT_SAVE_CHANGES);
}
/**
- Called to close the active document. It is possible with this overloaded
- version of the close() method to specify what should happen if the user
- has made changes to the document that have not been saved. There are three
- possible value defined by the following manifest constants;
- DO_NOT_SAVE_CHANGES - Close the document and discard any changes
- the user may have made.
- PROMPT_TO_SAVE_CHANGES - Display a prompt to the user asking them
- how to proceed.
- SAVE_CHANGES - Save the changes the user has made to the document.
*
- @param saveOption A primitive integer whose value indicates how the close
- operation should proceed if the user has made changes to the active
- document. Note that no checks are made on the value passed to
- this argument.
*/
public void closeDoc(int saveOption) {
Object args = {new Integer(saveOption)};
Dispatch.call(this.activeDoc, "Close", args);
}
/**
- Called once processing has completed in order to close down the instance
of Word.
*/
public void quit() {
Dispatch.call(this.oleComponent, "Quit");
}</code></pre>
WPS
本文的核心代碼如下,完整代碼查看 這里 :
@Override
public boolean convert(String word, String pdf) {
File pdfFile = new File(pdf);
File wordFile = new File(word);
boolean convertSuccessfully = false;
ActiveXComponent wps = null;
ActiveXComponent doc = null;
try {
wps = new ActiveXComponent("KWPS.Application");
// Dispatch docs = wps.getProperty("Documents").toDispatch();
// Dispatch d = Dispatch.call(docs, "Open", wordFile.getAbsolutePath(), false, true).toDispatch();
// Dispatch.call(d, "SaveAs", pdfFile.getAbsolutePath(), 17);
// Dispatch.call(d, "Close", false);
doc = wps.invokeGetComponent("Documents")
.invokeGetComponent("Open", new Variant(wordFile.getAbsolutePath()));
try {
doc.invoke("SaveAs",
new Variant(new File("C:\\Users\\lotuc\\Documents\\mmm.pdf").getAbsolutePath()),
new Variant(17));
convertSuccessfully = true;
} catch (Exception e) {
logger.warning("生成PDF失敗");
e.printStackTrace();
}
File saveAsFile = new File("C:\\Users\\lotuc\\Documents\\saveasfile.doc");
try {
doc.invoke("SaveAs", saveAsFile.getAbsolutePath());
logger.info("成功另存為" + saveAsFile.getAbsolutePath());
} catch (Exception e) {
logger.info("另存為" + saveAsFile.getAbsolutePath() + "失敗");
e.printStackTrace();
}
} finally {
if (doc == null) {
logger.info("打開文件 " + wordFile.getAbsolutePath() + " 失敗");
} else {
try {
logger.info("釋放文件 " + wordFile.getAbsolutePath());
doc.invoke("Close");
doc.safeRelease();
} catch (Exception e1) {
logger.info("釋放文件 " + wordFile.getAbsolutePath() + " 失敗");
}
}
if (wps == null) {
logger.info("加載 WPS 控件失敗");
} else {
try {
logger.info("釋放 WPS 控件");
wps.invoke("Quit");
wps.safeRelease();
} catch (Exception e1) {
logger.info("釋放 WPS 控件失敗");
}
}
}
return convertSuccessfully;
}</code></pre>
LiberOffice
LiberOffice本身提供了一個命令行工具進行轉換,在你安裝好了LiberOffice之后
/usr/local/bin/soffice --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc
如果有打開的libreoffice實例, 要穿入env選項指定一個工作目錄
/usr/local/bin/soffice "-env:UserInstallation=file:///tmp/LibreOffice_Conversion_abc" --convert-to pdf:writer_pdf_Export /Users/lotuc/Downloads/test.doc
首先我們需要安裝好LiberOffice,然后將依賴的Jar包添加到classpath中:
Install Libre Office
Create a Java project in your favorite editor and add these to your class path:
[Libre Office Dir]/URE/java/juh.jar
[Libre Office Dir]/URE/java/jurt.jar
[Libre Office Dir]/URE/java/ridl.jar
[Libre Office Dir]/program/classes/unoil.jar</code></pre>
然后我們需要啟動一個LiberOffice進程:
import java.util.Date;
import java.io.File;
import com.sun.star.beans.PropertyValue;
import com.sun.star.comp.helper.Bootstrap;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.frame.XDesktop;
import com.sun.star.frame.XStorable;
import com.sun.star.lang.XComponent;
import com.sun.star.lang.XMultiComponentFactory;
import com.sun.star.text.XTextDocument;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.uno.XComponentContext;
import com.sun.star.util.XReplaceDescriptor;
import com.sun.star.util.XReplaceable;
public class MailMergeExample {
public static void main(String[] args) throws Exception {
// Initialise
XComponentContext xContext = Bootstrap.bootstrap();
XMultiComponentFactory xMCF = xContext.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext(
"com.sun.star.frame.Desktop", xContext);
XDesktop xDesktop = (XDesktop) UnoRuntime.queryInterface(
XDesktop.class, oDesktop);</code></pre>
接下來我們需要加載目標Doc文檔:
// Load the Document
String workingDir = "C:/projects/";
String myTemplate = "letterTemplate.doc";
if (!new File(workingDir + myTemplate).canRead()) {
throw new RuntimeException("Cannot load template:" + new File(workingDir + myTemplate));
}
XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
.queryInterface(com.sun.star.frame.XComponentLoader.class, xDesktop);
String sUrl = "file:///" + workingDir + myTemplate;
PropertyValue[] propertyValues = new PropertyValue[0];
propertyValues = new PropertyValue[1];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Hidden";
propertyValues[0].Value = new Boolean(true);
XComponent xComp = xCompLoader.loadComponentFromURL(
sUrl, "_blank", 0, propertyValues);</code></pre>
然后我們可以使用如下方式對內容進行替換:
// Search and replace
XReplaceDescriptor xReplaceDescr = null;
XReplaceable xReplaceable = null;
XTextDocument xTextDocument = (XTextDocument) UnoRuntime
.queryInterface(XTextDocument.class, xComp);
xReplaceable = (XReplaceable) UnoRuntime
.queryInterface(XReplaceable.class, xTextDocument);
xReplaceDescr = (XReplaceDescriptor) xReplaceable
.createReplaceDescriptor();
// mail merge the date
xReplaceDescr.setSearchString("<date>");
xReplaceDescr.setReplaceString(new Date().toString());
xReplaceable.replaceAll(xReplaceDescr);
// mail merge the addressee
xReplaceDescr.setSearchString("<addressee>");
xReplaceDescr.setReplaceString("Best Friend");
xReplaceable.replaceAll(xReplaceDescr);
// mail merge the signatory
xReplaceDescr.setSearchString("<signatory>");
xReplaceDescr.setReplaceString("Your New Boss");
xReplaceable.replaceAll(xReplaceDescr);</code></pre>
然后可以輸出到PDF中:
// save as a PDF
XStorable xStorable = (XStorable) UnoRuntime
.queryInterface(XStorable.class, xComp);
propertyValues = new PropertyValue[2];
propertyValues[0] = new PropertyValue();
propertyValues[0].Name = "Overwrite";
propertyValues[0].Value = new Boolean(true);
propertyValues[1] = new PropertyValue();
propertyValues[1].Name = "FilterName";
propertyValues[1].Value = "writer_pdf_Export";
// Appending the favoured extension to the origin document name
String myResult = workingDir + "letterOutput.pdf";
xStorable.storeToURL("file:///" + myResult, propertyValues);
System.out.println("Saved " + myResult);</code></pre>

xdocreport
本文的核心代碼如下,完整代碼查看 這里 :
/**
- @param inpuFile 輸入的文件流
- @param outFile 輸出的文件對象
- @return
@function 利用Apache POI從輸入的文件中生成PDF文件
*/
@SneakyThrows
public static void convertWithPOI(InputStream inpuFile, File outFile) {
//從輸入的文件流創建對象
XWPFDocument document = new XWPFDocument(inpuFile);
//創建PDF選項
PdfOptions pdfOptions = PdfOptions.create();//.fontEncoding("windows-1250")
//為輸出文件創建目錄
outFile.getParentFile().mkdirs();
//執行PDF轉化
PdfConverter.getInstance().convert(document, new FileOutputStream(outFile), pdfOptions);
}
/**
- @param inpuFile
- @param outFile
- @param renderParams
@function 先將渲染參數填入模板DOCX文件然后生成PDF
*/
@SneakyThrows
public static void convertFromTemplateWithFreemarker(InputStream inpuFile, File outFile, Map<String, Object> renderParams) {
//創建Report實例
IXDocReport report = XDocReportRegistry.getRegistry().loadReport(
inpuFile, TemplateEngineKind.Freemarker);
//創建上下文
IContext context = report.createContext();
//填入渲染參數
renderParams.forEach((s, o) -> {
context.put(s, o);
});
//創建輸出流
outFile.getParentFile().mkdirs();
//創建轉化參數
Options options = Options.getTo(ConverterTypeTo.PDF).via(
ConverterTypeVia.XWPF);
//執行轉化過程
report.convert(context, options, new FileOutputStream(outFile));
}</code></pre>
來自:https://segmentfault.com/a/1190000006789644