PyPDF2 Versions Save

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

4.2.0

3 weeks ago

What's new

New Features (ENH)

  • Allow multiple charsets for NameObject.read_from_stream (#2585) by @pubpub-zz
  • Add support for /Kids in page labels (#2562) by @stefan6419846
  • Allow to update fields on many pages (#2571) by @pubpub-zz
  • Tolerate PDF with invalid xref pointed objects (#2335) by @pubpub-zz
  • Add Enforce from PDF2.0 in viewer_preferences (#2511) by @pubpub-zz
  • Add += and -= operators to ArrayObject (#2510) by @pubpub-zz

Bug Fixes (BUG)

  • Fix merge_page sometimes generating unknown operator 'QQ' (#2588) by @rfotino
  • Fix fields update where annotations are kids of field (#2570) by @pubpub-zz
  • Process CMYK images without a filter correctly (#2557) by @pubpub-zz
  • Extract text in layout mode without finding resources (#2555) by @pubpub-zz
  • Prevent recursive loop in some PDF files (#2505) by @pubpub-zz

Robustness (ROB)

  • Tolerate "truncated" xref (#2580) by @pubpub-zz
  • Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334) by @pubpub-zz
  • Rebuild xref table if one entry is invalid (#2528) by @pubpub-zz
  • Robustify stream extraction (#2526) by @pubpub-zz

Documentation (DOC)

  • Update release process for latest changes (#2564) by @stefan6419846
  • Encryption/decryption: Clone document instead of copying all pages (#2546) by @redfast00
  • Minor improvements (#2542) by @j-t-1
  • Update annotation list (#2534) by @j-t-1
  • Update references and formatting (#2529) by @j-t-1
  • Correct threads reference, plus minor changes (#2521) by @j-t-1
  • Minor readability increases (#2515) by @j-t-1
  • Simplify PaperSize examples (#2504) by @j-t-1
  • Minor improvements (#2501) by @j-t-1

Developer Experience (DEV)

  • Remove unused dependencies (#2572) by @stefan6419846
  • Remove page labels PR link from message (#2561) by @stefan6419846
  • Fix changelog generator regarding whitespace and handling of "Other" group (#2492) by @stefan6419846
  • Add REL to known PR prefixes (#2554) by @stefan6419846
  • Release using the REL commit instead of git tag (#2500) by @MartinThoma
  • Unify code between PdfReader and PdfWriter (#2497) by @pubpub-zz
  • Bump softprops/action-gh-release from 1 to 2 (#2514) by @dependabot[bot]

Maintenance (MAINT)

  • Ressources → Resources (and internal name childs) (#2550) by @pubpub-zz
  • Fix typos found by codespell (#2549) by @stefan6419846
  • Update Read the Docs configuration (#2538) by @j-t-1
  • Add root_object, _info and _ID to PdfReader (#2495) by @pubpub-zz

Testing (TST)

  • Allow loading truncated images if required (#2586) by @stefan6419846
  • Fix download issues from #2562 (#2578) by @pubpub-zz
  • Improve test_get_contents_from_nullobject to show real use-case (#2524) by @stefan6419846
  • Add missing test annotations (#2507) by @stefan6419846

Full Changelog

4.1.0

2 months ago

What's new

Generating name objects (NameObject) without a leading slash is considered deprecated now. Previously, just a plain warning would be logged, leading to possibly invalid PDF files. According to our deprecation policy, this will log a DeprecationWarning for now.

New Features (ENH)

  • Add get_pages_from_field (#2494) by @pubpub-zz
  • Add reattach_fields function (#2480) by @pubpub-zz
  • Automatic access to pointed object for IndirectObject (#2464) by @pubpub-zz

Bug Fixes (BUG)

  • missing error on name without leading / (#2387) by @Rak424
  • encode_pdfdocencoding() always returns bytes (#2440) by @sbourlon
  • BI in text content identified as image tag (#2459) by @pubpub-zz

Robustness (ROB)

  • Missing basefont entry in type 3 font (#2469) by @pubpub-zz

Documentation (DOC)

  • Amend robustness documentation (#2479) by @j-t-1

Developer Experience (DEV)

  • Fix changelog for UTF-8 characters (#2462) by @stefan6419846

Maintenance (MAINT)

  • Add _get_page_number_from_indirect in writer (#2493) by @pubpub-zz
  • Remove user assignment for feature requests (#2483) by @stefan6419846
  • Remove reference to old 2.0.0 branch (#2482) by @stefan6419846

Testing (TST)

  • Fix benchmark failures (#2481) by @stefan6419846
  • Resolve file naming conflict in test_iss1767 (#2445) by @sbourlon

Full Changelog

4.0.2

2 months ago

What's new

Bug Fixes (BUG)

  • Use NumberObject for /Border elements of annotations (#2451) by @rsinger417

Documentation (DOC)

  • Document easier way to update metadata (#2454) by @stefan6419846
  • Typo Polyline \xe2\x86\x92 PolyLine in adding-pdf-annotations.md (#2426) by @CWKSC

Developer Experience (DEV)

  • Bump codecov/codecov-action from 3 to 4 (#2430) by @dependabot[bot]

Testing (TST)

  • Avoid catching not emitted warnings (#2429) by @stefan6419846

Full Changelog

4.0.1

3 months ago

What's new

Bug Fixes (BUG)

  • layout mode text extraction ZeroDivisionError (#2417) by @shartzog

Testing (TST)

  • Skip tests using fpdf2 if it's not installed (#2419) by @MartinThoma

Full Changelog

4.0.0

3 months ago

What's new

pypdf==4.0.0 is a big milestone forward:

  • We finally have a layout-mode text extraction. This enables users who want to detect / extract tables with heuristics to give it a try.
  • We deprecated a lot of the old PyPDF2 API that was either not following PEP8 naming styles or was not using a property. Users coming from PyPDF2 might want to switch to pypdf<4.0.0 first to get helpful error messages that show the new API in their specific cases.

A big 'Thank you!' the the whole pypdf community for your work. Thanks to you, pypdf is better than ever.

Kudos to @shartzog who added the layout-mode with his first contribution!

Deprecations (DEP)

  • Drop Python 3.6 support (#2369) by @MartinThoma
  • Remove deprecated code (#2367) by @MartinThoma
  • Remove deprecated XMP properties (#2386) by @stefan6419846

New Features (ENH)

  • Add "layout" mode for text extraction (#2388) by @shartzog
  • Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma
  • Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846

Bug Fixes (BUG)

  • PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66
  • Add support for GBK2K cmaps (#2385) by @stefan6419846

Documentation (DOC)

  • Add pmiller66 for #2406 as a contributor by @MartinThoma
  • Add missing expand parameter (#2393) by @Atomnp
  • Resolve build warnings (#2380) by @stefan6419846
  • Fix testing prerequisites (#2381) by @stefan6419846
  • Improve formatting of contributors page (#2383) by @stefan6419846
  • Add Tobeabellwether as a contributor for #2341 by @MartinThoma

Developer Experience (DEV)

  • Make dependabot aware of our PR prefixes (#2415) by @stefan6419846
  • Fail on Sphinx issues (#2405) by @stefan6419846
  • Move title check to own workflow (#2384) by @MasterOdin
  • Write to temporary files instead of the working directory (#2379) by @stefan6419846
  • Ensure that the PR titles have the correct format (#2378) by @stefan6419846

Maintenance (MAINT)

  • Return None instead of -1 when page is not attached (#2376) by @MartinThoma
  • Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma
  • Replace warning with logging.error (#2377) by @MartinThoma

Testing (TST)

  • Add missing pytest.mark.samples annotations (#2412) by @kitterma
  • Correctly close temporary files (#2396) by @stefan6419846
  • Fix side effect #2379 (#2395) by @pubpub-zz
  • Add test for layout extraction mode (#2390) by @MartinThoma

Code Style (STY)

  • Use the UserAccessPermissions enum (#2398) by @MartinThoma
  • Run black (#2370) by @MartinThoma

Full Changelog

3.17.4

4 months ago

What's new

Bug Fixes (BUG)

  • Handle IndirectObject as image filter (#2355) by @stefan6419846

Documentation (DOC)

  • Quote specs in generate_file_identifiers (#2363) by @exiledkingcc
  • Notes about form fields and annotations (#1945) by @dmjohnsson23
  • Notes about update_page_form_field_values(auto_regenerate) (#2359) by @dmjohnsson23
  • Fix stamping example (#2358) by @dmjohnsson23
  • Stamp images directly on a PDF (#2357) by @dmjohnsson23
  • Correct the example of adding highlight annotation (#2341) by @Tobeabellwether

Maintenance (MAINT)

  • Update upload-artifact and download-artifact actions from v3 to v4 (#2352) by @stefan6419846

Testing (TST)

  • Add xfail test for #2336 (#2365) by @MartinThoma
  • Increase test coverage for flate handling of image mode 1 (#2339) by @stefan6419846

Code Style (STY)

  • File identifier generation restructuring (#2362) by @exiledkingcc
  • Add PdfWriter._ID attribute (#2361) by @exiledkingcc
  • Variable naming convention (#2360) by @MartinThoma

Full Changelog

3.17.3

4 months ago

What's new

Robustness (ROB)

  • Out-of-bounds issue in handle_tj (text extraction) (#2342) by @rgwood-rely

Developer Experience (DEV)

  • Make make_release.py easier to configure (#2348) by @MartinThoma

Maintenance (MAINT)

  • Bump actions/download-artifact from 3 to 4 (#2344) by @dependabot[bot]

Full Changelog

3.17.2

4 months ago

What's new

Bug Fixes (BUG)

  • Cope with deflated images with CMYK Black Only (#2322) by @pubpub-zz
  • Handle indirect objects as parameters for CCITTFaxDecode (#2307) by @stefan6419846
  • check words length in _cmap type1_alternative function (#2310) by @Takher

Robustness (ROB)

  • Relax flate decoding for too many lookup values (#2331) by @stefan6419846
  • Let _build_destination skip in case of missing /D key (#2018) by @nickryand

Documentation (DOC)

  • Note in reading form data (#2338) by @MartinThoma
  • Pull Request prefixes and size by @MartinThoma
  • Add https://github.com/zuypt for #2325 as a contributor by @MartinThoma
  • Fix docstring for RunLengthDecode.decode (#2302) by @stefan6419846

Maintenance (MAINT)

  • Enable disallow_any_generics and add missing generics (#2278) by @nilehmann

Testing (TST)

  • Centralize file downloads (#2324) by @MartinThoma

Code Style (STY)

  • Fix typo "steam" \xe2\x86\x92 "stream" (#2327) by @stefan6419846
  • Run black by @MartinThoma
  • Make Traceback in bug report template uppercase (#2304) by @stefan6419846

Full Changelog

3.17.1

5 months ago

What's new

Bug Fixes (BUG)

  • Mediabox expansion size when applying non-right angle rotation (#2282) by @MrinalJain17

Robustness (ROB)

  • MissingWidth is IndirectObject (#2288) by @MartinThoma
  • Initialize states array with an empty value (#2280) by @alexey-v-paramonov

Documentation (DOC)

  • Typo in example in extract-attachments.md (#2285) by @ageitgey
  • Add Alexey Paramonov as a contributor for #2280 by @MartinThoma

Maintenance (MAINT)

  • Update sample-files by @MartinThoma

Full Changelog

3.17.0

6 months ago

What's new

Security (SEC)

  • Infinite recursion when using PdfWriter(clone_from=reader) (#2264) by @Alexhuszagh

New Features (ENH)

  • Add parameter to select images to be removed (#2214) by @pubpub-zz

Bug Fixes (BUG)

  • Correctly handle image mode 1 with FlateDecode (#2249) by @stefan6419846
  • Error when filling a value with parentheses #2268 (#2269) by @KanorUbu
  • Handle empty root outline (#2239) by @pubpub-zz

Documentation (DOC)

  • Improve merging docs (#2247) by @stefan6419846

Developer Experience (DEV)

  • Test Python 3.7 with cryptopgraphy provider as well (#2276) by @stefan6419846
  • Run CI with windows-latest (#2258) by @MartinThoma
  • Use pytest-xdist (#2254) by @MartinThoma
  • Attribute correct authors in the release notes (#2246) by @stefan6419846

Maintenance (MAINT)

  • Apply pre-commit hooks (#2277) by @MartinThoma
  • Update requirements + mypy fixes (#2275) by @MartinThoma
  • Explicitly provide Any for IO generic argument (#2272) by @nilehmann

Testing (TST)

  • Fix test_image_without_pillow in windows environment (#2257) by @pubpub-zz

Code Style (STY)

  • Remove unused import by @MartinThoma

Full Changelog