Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encryption support and access privileges #2696

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

klimeryk
Copy link

@klimeryk klimeryk commented Mar 29, 2024

Hi, @diegomura! Love your library - I've been using it extensively at https://github.com/klimeryk/recalendar.js/ (https://recalendar.me/). I stumbled upon #672 and figured it was a great chance to contribute back :)

I've tried my best to bring over the changes from the upstream pdfkit version. Encryption was originally added in foliojs/pdfkit#820 there. It was mostly a straightforward update... except changes in reference.js. Looks like that class has been fundamentally changed - upstream it uses internally Buffer, while in react-pdf, the class extends Writeable instead.
I've tried first keeping the changes to reference.js minimal (see 4376bcb). Most of the changes were easy to port - basically copy and paste.

The only part I did not know was the changes in finalize in reference.js here. As a result, this was a partial success - non-encrypted PDFs look as before (good), you can generate a PDF that requires a password, the password is properly validated... but then the opened PDF is missing content. See this file: recalendar-password.pdf (password: secret)

I've then tried just switching to upstream's reference.js, especially seeing your intention of doing that from #2613. See 5d9a8e9. That looks more promising - the decrypted PDF has the text and almost looks perfect... except, for some reason, the font seems to be missing, so it looks off. See this file: recalendar-password2.pdf (password: secret)
And this applies to both encrypted and non-encrypted generated PDFs. So I'm guessing there's some modification/customization, maybe in some other file that I missed that was done in this version of pdfkit that is not compatible with the upstream version of reference.js? Unfortunately, I could not find it. I've tried also going back in history of reference.js, but unfortunately these changes seems to be from before it was moved to monorepo and I cannot find the previous repository 😞
I figured I'd create this PR for you to have a look, maybe something will spark a quick solution or hint from you. If not, I can continue digging 🙇

Everything should be working smoothly now, see #2924 (comment) 🎉

Testing

I'm guessing you have your preferred ways of testing different use cases :) Enabling password protection simply boils down to passing userPassword when creating a new instance of PDFDocument like so:

const ctx = new PDFDocument( {
    // ...
    userPassword: 'secret',
} );

To be on the safe side, I've also tried enabling all optional permissions, but that did not make a difference:

    permissions: {                                                                                                                                        
        printing: 'highResolution',
        modifying: true,
        copying: true,
        annotating: true,
        fillingForms: true,
        contentAccessibility: true,
        documentAssembly: true,
    },

Additionally, I've ported some unit tests from pdfkit, see 798995c

Copy link

changeset-bot bot commented Mar 29, 2024

🦋 Changeset detected

Latest commit: 798995c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@react-pdf/pdfkit Minor
@react-pdf/layout Patch
@react-pdf/renderer Patch
@react-pdf/svgkit Patch
@react-pdf/examples Patch
@react-pdf/e2e-node-cjs Patch
@react-pdf/e2e-node-esm Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@klimeryk klimeryk force-pushed the add/password-security branch from 494459f to 5d9a8e9 Compare March 29, 2024 22:53
Copy link
Contributor

@wojtekmaj wojtekmaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, nice one! Looks good to me.

I wonder if we could have some tests for that. As long as #2633 repairs them 🥲

Also, changeset is missing. Can you please run yarn changeset as explained in CONTRIBUTORS.md?

packages/pdfkit/src/reference.js Show resolved Hide resolved

return Buffer.from(byteArray);
};
import CryptoJS from 'crypto-js';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how this change affects bundle size and what was the original reason for selective import done like that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried digging in revision history, but it was introduced with the selective import in #1894. Upstream never used the selective import and imported the whole library from the beginning: foliojs/pdfkit#820. I'm guessing it's because the full/upstream security.js file uses extensively this library - CryptoJS.MD5, CryptoJS.RC4, CryptoJS.lib.WordArray, CryptoJS.SHA256, CryptoJS.AES. Whereas the simplified version (not supporting password protection) indeed just used CryptoJS.MD5, so the selective import might have made sense for that use case.

For completeness sake (so that the implications of this PR on the bundle size are known):

Current master

646,480 pdfkit.browser.cjs
644,160 pdfkit.browser.js
326,577 pdfkit.browser.min.cjs
325,782 pdfkit.browser.min.js
407,839 pdfkit.cjs
405,488 pdfkit.js
232,858 pdfkit.min.cjs
231,996 pdfkit.min.js

This branch/PR

683,846 pdfkit.browser.cjs
680,853 pdfkit.browser.js
343,476 pdfkit.browser.min.cjs
342,353 pdfkit.browser.min.js
445,176 pdfkit.cjs
442,171 pdfkit.js
249,962 pdfkit.min.cjs
248,604 pdfkit.min.js

So, 9% increase for pdfkit.js and 7.2% for pdfkit.min.js.

@klimeryk klimeryk force-pushed the add/password-security branch from 5d9a8e9 to cb2d168 Compare June 14, 2024 11:57
@klimeryk
Copy link
Author

klimeryk commented Jun 14, 2024

Thanks for looking at the PR, @wojtekmaj! Sorry for the late follow-up - private life got in the way, but now I'm back and determined to see this PR through. I've addressed your comments and generated the changeset 🙇

To be extra clear - I don't think this PR is ready to merge as-is. At the very least there's an issue with the fonts. There's links to generated PDF files in the description and my attempt at best describing the possible issue, but it might be best shown on screenshots:

Production file, correct fonts

Screenshot 2024-06-14 at 12 10 04

PR file, missing fonts

Screenshot 2024-06-14 at 12 10 12

And there are indeed warnings from the Firefox internal PDF previewer:

Warning: getFontFileType: Unable to detect correct font file Type/Subtype.
Warning: FormatError: Required "loca" table is not found

My guess is that these attachments/resources are not correctly encoded or decoded with the passwords/security logic introduced in this PR. I've double-checked the differences between the pdfkit package in this repo and the upstream one and nothing stood out to me that could cause this. So I was hoping I'd get some pointers from @diegomura or anyone else familiar with PDF internals 🙏 Though, if not possible (I understand that we're all busy one way or another), I'll try my best to dig into - it's an interesting problem and I love understanding how things work. Just I might not be as effective without expert help 🙇

@klimeryk klimeryk force-pushed the add/password-security branch from 99c8119 to 16f1475 Compare November 3, 2024 23:35
From https://github.com/foliojs/pdfkit/tree/3a6977e813a886744a383efedd124ae661bcc96c/tests/unit

Basically, just need small adjustments:
 - Adding `import { beforeEach, describe, expect, test } from 'vitest';`
 - Changing the relative imports from `import PDFMetadata from
   '../../lib/metadata';` to `import PDFMetadata from
   '../../src/metadata';`, since the source lives in `src` in this
   version, while the _output_ lives in `lib`.
 - `import matcher from './toContainChunk'; expect.extend(matcher);` if
   the file needs the `toContainChunk` matcher. This should be in a
   global vitest setup file, but that requires even more boilerplate.
@klimeryk
Copy link
Author

klimeryk commented Nov 4, 2024

PR updated - it should be good for a final/proper review. The font issues have been solved thanks to @florianbepunkt, see #2924 (comment). I've also ported some unit tests from pdfkit, see 798995c. Only ported the ones that were passing and trivial to move over. More work needs to be done there, since this version of pdfkit is a subset of upstream. But it will be helpful in the efforts in #2613

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants