Paperless
Paperless-NGX document management.
Migrated to ansible collection
Tip
- The UID/GID should be set to a user/group that have access to your media. All media clients should run under the same user to run correctly.
Suggested Archival Use
Suggested Use (based on archivst recommendations):
- Document Types refer to the broad type of document in question. Is it a letter? Receipt? Bill? Every instance will be different, but this should be your broadest field. You just want to more of less get it in the ballpark. For example, my Receipts doctype holds receipts that I scan in, but it will also hold confirmations from my debtors that I paid a bill, or an email from Cash app that I sold Bitcoin.
- Correspondent refers to the person/organization you are communicating with in the document. A bill from your credit card would have Capital One as correspondent for example, while a copy of your W2 might go under IRS. Again, you can be broad here, as trying to narrow it down is going to drive you crazy.
- Tags are used to answer the below basic concepts:
- Who is it referring to? In my case, I have tags for myself, my wife, the kids, and the dogs. They are all the same color to easily denote that. Note that this is NOT the same as correspondent.
- What is it referring to? Is it related to your car loan? Is it related to your homes maintenance? Mark these tags in a different color to easily notice them.
- When is the information in this document relevant? Was it a bill from 2 years ago? Does it relate to your taxes for 2022? Personally, I make tags for the year it was received, as it makes it easier to sort. You can further break this down by month if needed.
- I also make tags for special categories that I need to track. For example, I have a tag for any documents that we'll need for our taxes in the coming year, or critical documents (birth certs, etc). This helps to further break it down.
Using Management Utilities
Login and switch to paperless user to run management utilities.
su - -s /bin/bash paperless
. /opt/paperless/src/{VERSION}/.venv/bin/activate
cd /opt/paperless/src/{VERSION}
python3 manage.py document_renamer
Reduce PDF size
This will enable you to reduce pdf size if needed. Use the following settings for specific resolutions:
/screen72dpi/ebook150dpi/prepress300dpi/printer300dpi/defaultno change
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS={SETTING} -dNOPAUSE
-dQUIET -dBATCH -sOutputFile={OUTPUT}.pdf {INPUT}.pdf
img2pdf
Included to provide increased functionality for import scripts. This will enable lossless conversion of images to pdf's, enabling import into paperless. The following example will strip alpha channel data from png's and convert to a lossless pdf for import.
convert test.png -background white -alpha remove -alpha off out.png
img2pdf out.png -o import.pdf
Merge PDF's
Now directly supported in Paperless.
Actions ➔ Merge
Select documents in merge order.
gs -dNOPAUSE -sDEVICE=pdfwrite -dBATCH -sOutputFile={OUTPUT}.pdf {INPUT}1.pdf
{INPUT}2.pdf
Split PDF's
Now directly supported in Paperless.
Actions ➔ Split
Select document first.
For documents that failed consumption, manually split before re-adding to the consumption directory.
# Repeat for each chunk of the PDF document.
gs -dBATCH -dPDFINFO {INPUT}.pdf
gs -dNOPAUSE -sDEVICE=pdfwrite -dBATCH -sOutputFile={OUTPUT}.pdf -dFirstPage=1
-dLastPage=3 {INPUT}.pdf
Reverse Proxy
Reverse proxy configuration has drastically changed resulting in multiple reported issues (674, 817, 712). Be sure to set the following configuration variables:
PAPERLESS_USE_X_FORWARD_HOST=true
PAPERLESS_USE_X_FORWARD_PORT=true
PAPERLESS_URL=https://paperless.example.com
PAPERLESS_CSRF_TRUSTED_ORIGINS=https://paperless.example.com
PAPERLESS_ALLOWED_HOSTS=paperless.example.com
PAPERLESS_CORS_ALLOWED_HOSTS=https://paperless.example.com
Receiving a 403 after logging in explicitly
Forbidden (403) CSRF verification failed. Request aborted.
See the Proxy Rule Changes and be sure to add referrer-policy to allow requests through:
add_header Referrer-Policy 'strict-origin-when-cross-origin';
Restart both NGINX and Paperless and try again.