開源的Java PDF庫，PDFBox 1.8.8 發布

jopen 11年前發布 | 13K 次閱讀 PDFBox

PDFBox是一個開源的Java PDF庫，這個庫允許你訪問PDF文件的各項信息。
PDFBox: www.pdfbox.org

它提供如下特性：

提取文本，包括Unicode字符。

和Jakarta Lucene等文本搜索引擎的整合過程十分簡單。

加密/解密PDF文檔。

從PDF和XFDF格式中導入或導出表單數據。

向已有PDF文檔中追加內容。

將一個PDF文檔切分為多個文檔。

覆蓋PDF文檔。

import java.io.BufferedWriter;
import java.io.FileInputStream;
import java.io.FileWriter;

import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.util.PDFTextStripper;


public class PdfParser {

   /**
   * @param args
   */
   // TODO 自動生成方法存根

       public   static   void   main(String[]   args)   throws   Exception{
            FileInputStream   fis   =   new   FileInputStream("F:\\task\\lerman-atem2001.pdf");
            BufferedWriter writer = new BufferedWriter(new FileWriter("F:\\task\\pdf_change.txt"));
            PDFParser   p   =   new   PDFParser(fis);
            p.parse();        
            PDFTextStripper   ts   =   new   PDFTextStripper();        
            String   s   =   ts.getText(p.getPDDocument());
            writer.write(s);
            System.out.println(s);
            fis.close();
            writer.close();

   }
}

Apache PDFBox 1.8.8 發布，這是一個增量的 bug 修復版本，修復了大量的 bug，包括：

Bug

[PDFBOX-649] - loading an fdf containing a file attachment throws IOException
[PDFBOX-1036] - FDFExport/Import gives strange results
[PDFBOX-1060] - convertToImage includes "ghost" annotation outlines
[PDFBOX-1087] - FDF parsing is unreliable when xref are missing
[PDFBOX-1273] - java.io.IOException: Error: Unknown annotation type null
[PDFBOX-1512] - TextPositionComparator is not compatible with Java 7
[PDFBOX-1574] - ImportFDF fails to do anything
[PDFBOX-1595] - PDFMerger failed with the following exception: 
java.lang.NullPointerException
[PDFBOX-1918] - PDF with incorrect startxref
[PDFBOX-2001] - Digital Signature information (parser bug?)
[PDFBOX-2015] - Hybrid reference pdf still contain XRefStm info in the trailer 
dictionary after PDDocument#save
[PDFBOX-2173] - Nullpointer when validating empty file
[PDFBOX-2296] - Wrong stream length
[PDFBOX-2306] - Error reading stream, expected='endstream' actual='endobj'
[PDFBOX-2320] - IOException: Could not read embedded TTF for font TimesNewRoman
[PDFBOX-2332] - Error reading stream, expected='endstream' actual='endstream8' 
at offset 1993
[PDFBOX-2342] - WriteDecodedDoc cant decrypt pdf form correctly
[PDFBOX-2351] - /XRefStm content missing in saved file
[PDFBOX-2356] - Error Validating PDF Archive Document with half hour timezone
[PDFBOX-2371] - Overlay page off by one when using -useAllPages
[PDFBOX-2376] - Small regression in text extraction with PDFBox 1.8.7 vs. 1.8.6
[PDFBOX-2377] - Apparent regression in character mapping in a few files from 
govdocs1
[PDFBOX-2385] - inline image with EI at the end incorrectly parsed
[PDFBOX-2395] - Signing PDF document changes documentID
[PDFBOX-2401] - Image has wrong colors after Merge
[PDFBOX-2402] - NonSequentialPDFParser cannot recover from spurious closing brackets
[PDFBOX-2406] - fix typo "AlpaConstant"
[PDFBOX-2411] - Pushback buffer is full on seamingly small PDF
[PDFBOX-2412] - Loading XFDF document fails with ClassCastException
[PDFBOX-2413] - Loaded FDF document returns null fields
[PDFBOX-2419] - XFDF export is not XML compliant
[PDFBOX-2424] - ClassCastException in getMetaData if no real meta data
[PDFBOX-2434] - ClassCastException in readVersionInTrailer
[PDFBOX-2435] - ConvertToImage Appears To Invert Colors
[PDFBOX-2441] - Improve XRef self healing mechanism when more than one xref table
[PDFBOX-2443] - About to return NULL from unhandled branch when constructing a 
PDJpeg
[PDFBOX-2449] - Character missing in text extraction
[PDFBOX-2455] - NonSequentialParser does not tolerate missing %%EOF markers
[PDFBOX-2458] - Signing doesn't work anymore using BC 1.51 instead of 1.50
[PDFBOX-2465] - NPE in PdfaExtensionHelper.populateSchemaMapping
[PDFBOX-2469] - javax.crypto.BadPaddingException in PDFBox 1.8.8-SNAPSHOT
[PDFBOX-2470] - Exception in PDDocument.addSignature(PDSignature sigObject, 
SignatureInterface signatureInterface, SignatureOptions options))
[PDFBOX-2471] - AES encryption failing to write Acroform field names and values
[PDFBOX-2477] - NPE in DomXmpParser.createProperty
[PDFBOX-2478] - NPE in XObjImageValidator.checkColorSpaceAndImageMask
[PDFBOX-2481] - Adding large TYPE_BYTE_BINARY image to pdf document generates 
distorted result
[PDFBOX-2483] - StackOverflowError in preflight
[PDFBOX-2484] - Cannot decrypt AES256 encrypted files with nonSeq parser
[PDFBOX-2488] - NPE in FontValidator.isSubSet in preflight
[PDFBOX-2490] - Return value of COSDocument#isEncrypted is unclear
[PDFBOX-2491] - NPE in PDFAIdentificationValidation.checkConformanceLevel()
[PDFBOX-2492] - Java 8u25 IllegalBlockSizeException decrypting pdf
[PDFBOX-2497] - GRAVE: FlateFilter: stop reading corrupt stream due to a 
DataFormatException
[PDFBOX-2498] - ArrayIndexOutOfBoundsException in PreflightParser.lastIndexOf
[PDFBOX-2500] - ClassCastException in StreamValidationProcess.checkFilters
[PDFBOX-2502] - false negative? 1.4.6 : Trailer Syntax error, ID is different in 
the first and the last trailer
[PDFBOX-2503] - false negative? 1: 7.2 : Error on MetaData, Producer present in 
the document catalog dictionary doesn't match with XMP information
[PDFBOX-2504] - ClassCastException in preflight: PDAnnotationWidget cannot be 
cast to PDField
[PDFBOX-2512] - OutOfMemory while signing large documents
[PDFBOX-2517] - Better error message on pdfA identification
[PDFBOX-2520] - Don't decrypt already decrypted pdfs
[PDFBOX-2521] - Don't throw IOException if stream length is missing in lenient mode
[PDFBOX-2522] - javax.crypto.IllegalBlockSizeException in ExtractText
[PDFBOX-2523] - IOException: Error: Expected a long type at offset 1218571, 
instead got 'xref'
[PDFBOX-2528] - IOException: Object must be defined and must not be compressed 
object: 0:0
[PDFBOX-2533] - Poor rendering with non-sequential parser
[PDFBOX-2541] - ClassCastException in BaseParser.parseCOSDictionaryValue

Improvement

[PDFBOX-543] - Document the dependencies of PDFBox
[PDFBOX-1224] - Angle units are not consistent
[PDFBOX-1648] - FontBox can't load CMaps with no spaces between tokens
[PDFBOX-1738] - PDF with parsing IOException
[PDFBOX-1798] - Performance problem with PDDocument.saveIncremental (when 
signing document)
[PDFBOX-1833] - BaseParser tidy up
[PDFBOX-2197] - Add sample how to import a page as PDFormXObject
[PDFBOX-2250] - Improve XRef self healing mechanism
[PDFBOX-2394] - Add example code to extract embedded files in annotations
[PDFBOX-2414] - Allow non-sequential parser for PDFMerger in app
[PDFBOX-2456] - create TestSymmetricKeyEncryption.java
[PDFBOX-2468] - Switch FDFDocument.load from PDFParser to NonSequentialParser
[PDFBOX-2475] - Fix Checkstyle errors in the 1.8 branch
[PDFBOX-2480] - Add information about Snapshots to download section

本文由用戶 jopen 自行上傳分享，僅供網友學習交流。所有權歸原作者，若您的權利被侵害，請聯系管理員。

轉載本站原創文章，請注明出處，并保留原始鏈接、圖片水印。

本站是一個以用戶分享為主的開源技術平臺，歡迎各類分享！

本文地址：http://www.baiduhome.net/news/view/e288bc

PDFBox

開源的Java PDF庫，PDFBox 1.8.8 發布

相關資訊

相關經驗

相關文檔