borb is a library for reading, creating and manipulating PDF files in python.
This release is a beauty pageant release:
CONSTRUCTOR
PRIVATE
PUBLIC
borb
package have been sorted (in their respective part)mypy
warnings have been taken care ofAlthough the majority of the work has been done, this will always be an ongoing task. As new development adds code, I may need this kind of release from time to time to ensure the quality of the code stays up.
This release includes the following minor fixes:
DisconnectedShape
(method names related to scaling were not analoguous to ConnectedShape
)This release is a bugfix release.
SimpleFindReplace
which enables you to find and replace text in a PDF
tests
directory in this projecttoolkit
Most of the classes in the following table implement EventListener
and are part of the package toolkit
.
They have a class method (which you can call if you instantiate them and add them as EventListener
to a PDF).
They also have a static method that you can call. The class method and static method typically return the same type/thing.
The static method has the advantage that it allows you to work with a Document
, whereas the class method only works with a PDF that is being loaded.
Or, to put it simply, the static method can be used at any point in the life-cycle of Document
, whereas the class method can only be used when reading an existing PDF.
This table gives you an overview of the available classes in toolkit
and their methods:
class | class method | static method | status |
---|---|---|---|
ColorExtraction |
get_color |
get_color_from_pdf |
:heavy_check_mark: |
FontExtraction |
get_fonts |
||
FontExtraction |
get_font_names |
||
HTMLToPDF |
convert_html_to_layout_element |
:heavy_check_mark: | |
HTMLToPDF |
convert_html_to_pdf |
:heavy_check_mark: | |
ImageExtraction |
get_images |
get_images_from_pdf |
:heavy_check_mark: |
MarkdownToPDF |
convert_markdown_to_layout_element |
:heavy_check_mark: | |
MarkdownToPDF |
convert_markdown_to_pdf |
:heavy_check_mark: | |
PDFToJPG |
convert_to_jpg |
convert_pdf_to_jpg |
:heavy_check_mark: |
PDFToMP3 |
convert_to_mp3 |
convert_pdf_to_mp3 |
:heavy_check_mark: |
PDFToSVG |
convert_to_svg |
convert_pdf_to_svg |
:heavy_check_mark: |
RegularExpressionTextExtraction |
get_matches |
get_matches_for_pdf |
:heavy_check_mark: |
SimpleLineOfTextExtraction |
get_lines_of_text |
get_lines_of_text_from_pdf |
:heavy_check_mark: |
SimpleNonLigatureTextExtraction |
get_text |
get_text_from_pdf |
:heavy_check_mark: |
SimpleParagraphExtraction |
get_paragraphs |
get_paragraphs_from_pdf |
:heavy_check_mark: |
SimpleTextExtraction |
get_text |
get_text_from_pdf |
:heavy_check_mark: |
TableDetectionByLines |
get_tables |
||
TableDetectionByLines |
get_table_bounding_boxes |
||
TextRankKeywordExtraction |
get_keywords |
get_keywords_from_pdf |
:heavy_check_mark: |
TFIDFKeywordExtraction |
get_keywords |
get_keywords_from_pdf |
:heavy_check_mark: |
FormField
elements behave more like LayoutElement
nowSmartArt
Version
class to have 1 point of reference for getting version/author/producer informationRecently, borb
has gone into the early stages of finding a reseller.
This is a very exciting step that I am sure will bring positive things for all of us, both borb
and its users.
Understandably, the marketing/sales team would like some data to figure out what our target audience is, where to invest effort and resources, and more.
So I have added UsageStatistics
to borb
. This class gathers the following data:
borb
to ensure consistency between calls)borb
is running)borb
that is running)These statistics are periodically sent to our server(s). I have done my best to ensure this does not hinder the performance of borb
in any way.
I urge to look at the source code of the License
package to reassure yourself of the fact that we are gathering only the bare minimum of data.
Nevertheless, I fully understand that you may prefer not to send this information.
You can turn it off by calling UsageStatistics.disable()
.
This release is a maintenance/feature release.
borb.toolkit
deepcopy
on borb
objects
deepcopy
SmartArt
to allow document designers to easily create beautiful charts and illustrations
SmartArt
SmartArt
to come in future releasesThis release is a minor bugfix release:
Following the large refactor of LayoutElement
, some minor classes still needed to be updated to work with the new framework.
Most notable among these is HTMLToPDF`.
GradientColoredDisjointShape
has become GradientColoredDisconnectedShape
to follow suit with the rename of DisjointShape
to DisconnectedShape
.
InlineFlow
and BlockFlow
have been moved to page_layout
. Easy imports have been provided for them.
More convenient imports have been made possible for FormField
elements.
The documentation of borb
(to be found in the examples repository) has been given a major check.
There is also a script that will automatically attempt to run each example code snippet.
This should make it easier to detect when a new release breaks something in the examples repository.
This release is a minor bugfix release:
Following the large refactor of LayoutElement
, some minor classes still needed to be updated to work with the new framework.
Most notable among these is HTMLToPDF`.
GradientColoredDisjointShape
has become GradientColoredDisconnectedShape
to follow suit with the rename of DisjointShape
to DisconnectedShape
.
InlineFlow
and BlockFlow
have been moved to page_layout
. Easy imports have been provided for them.
More convenient imports have been made possible for FormField
elements.
The documentation of borb
(to be found in the examples repository) has been given a major check.
There is also a script that will automatically attempt to run each example code snippet.
This should make it easier to detect when a new release breaks something in the examples repository.
This release is a feature release:
HTMLToPDF
has been updated to ensure even more HTML syntax is supported
HTMLToPDF
allows you to specify a typing.List[Font]
of fallback fontsMarkdownToPDF
now uses HTMLToPDF
(making it easier for me to maintain the code)BlockFlow
and InlineFlow
elementsSingleColumnLayoutWithOverflow
to enable certain LayoutElement
implementations to be split across multiple Page
objects. Currently only splitting of Table
is supported.This release is a feature release:
LayoutElement
framework has had a major upgrade
LayoutElement
now offers a get_layout_box
method which tells you how much space a LayoutElement
takes upLayoutElement
now offers a paint
method which renders the LayoutElement
on a Page
LayoutElement
only adds its own content to the Page
(previously it would change the order of page content to ensure backgrounds get drawn first)HexColor("00ff00")
, which allows me to add the date of the test in the output, and still compare only the relevant pixelsMarkdownToPDF
has been refactored to convert Markdown
to HTML
HTMLToPDF
still needs work to produce PDF documents