Read and extract text and other content from PDFs in C# (port of PDFBox)
This is a release with various bug-fixes and quality of life improvements but no new major features. It adds many of the supporting classes necessary for PDF rendering.
IColor
can now be of type PatternColor
. This implementation will throw an error when calling ToRGBValues()
. You might have to check for IColor.ColorSpace != ColorSpace.Pattern
before calling this functionDetails
suffix from ColorSpaceDetails
property namesAlternateColorSpaceDetails
renamed to AlternateColorSpace
BaseColorSpaceDetails
renamed to BaseColorSpace
IColor
implementationsdouble
instead of decimal
in color spaces and colorsIColorSpaceContext
from IOperationContext
to CurrentGraphicsState
ColorSpace
property from IPdfImage
. Use ColorSpaceDetails.Type
to get the enum valueIColorSpaceContext
's CurrentStrokingColorSpace
and CurrentNonStrokingColorSpace
are now of type ColorSpaceDetails
(not a ColorSpace
enum
anymore). Use CurrentStrokingColorSpace.Type
or CurrentNonStrokingColorSpace.Type
to get the enum
valueDefaultWordExtractor
, a logic bug in the existing implementation was fixed, meaning the output of the default page.GetWords()
may change in this versionNote that this version removes support for .NET 4.5. Consumers should upgrade to .NET 4.5.1 or 4.5.2
TextRenderingMode
, StrokeColor
and FillColor
PageSize
enum for landscape orientation documentsCreationDate
and ModifiedDate
are now available in DocumentInformationBuilder
PdfAction
exposed by Annotation
class. InReplyTo
property also addedGetFields
extensions method for AcroForm
typePdfDocumentBuilder
PdfDocumentBuilder
with one or more existing documentsRijndael
and RijndaelManaged
to Aes
since these were marked as obsoleteChanges since 0.1.6:
page.SetRotation
for PdfPageBuilder
SkipMissingFonts
to parsing options to ignore content where the font is not present or corrupt. Can result in content being missed during extraction but will enable partial extraction of retrievable content on page for corrupted files.PdfPageBuilder
thanks to @JonowaGrahamScan
thanks to @BobLdDebugger.Break
from the encryption handlerMainly bug fixes. There are some compatibility changes in the document layout analysis API. See here: https://github.com/UglyToad/PdfPig/wiki/Migration-to-0.1.6
Changes since v0.1.4: https://github.com/UglyToad/PdfPig/compare/v0.1.4...v0.1.5
Some more bug-fixes:
NullToken
presence when creating documents.IPdfImage
.DefaultWordExtractor
to try and detect word gap size based on preceding text instead of a global gap threshold.Note that changes to DefaultWordExtractor
may change the output of calls to Page.GetWords()
in this version.
First alpha version of 0.1.5
page.GetOptionalContents()
partial optional content retrieval support.IPdfImage
s.Breaking changes:
PdfDocumentBuilder
now implements IDisposable
. This disposes the underlying stream by default but this is a MemoryStream
normally so not any serious consequences if left undisposed.PdfPageBuilder
had the AdvancedEditing
property removed. The API is now available in the ContentStream
methods / properties (this was from #250).PdfDocumentBuilder
. The DrawRectangle
method now takes an optional boolean parameter, fill
.Arial MT
naming.endobj
tokens.Differences
arrays for encodings.PointSize
for letters accounting for rotation and other transformationsFirst alpha version of 0.1.3
Some new features, performance tweaks and improved Document Layout Analysis tools:
PdfDocumentBuilder
, use PdfDocumentBuilder.ArchiveStandard
to select a PDF/A compliance level.PdfPaths
, now PdfSubpath
. Use ParsingOptions.ClipPaths
to enable clipping.PdfMerger.Merge
to generate merged PDFs.IPdfImage
now supports TryGetBytes()
instead of Bytes
. TryGetBytes
returns false
for JPXDecode and DCTDecode image filters for which RawBytes
represent a valid JPEG image.Letter
.TextDirection
is now TextOrientation
, various fixes to the calculations of orientation and bounding box for Word
s.DlaOptions
parameter to specify behaviour.