scan paper documents 📄 from a scanner 🖨️ as PDFs to Google Drive for full-text search
scan2drive is a Go program (with a web interface) for scanning, converting and uploading physical documents to Google Drive. The author runs scan2drive as a gokrazy appliance on a Raspberry Pi 4.
During the conversion step, scan2drive skips empty pages and converts the rest from multi-megabyte JPEGs into a kilobyte-sized PDF. This allows you to use Google Drive’s OCR-based full text search.
Both the originals and the converted PDF are uploaded to Google Drive, so that you can enjoy full text search but still have the full-quality originals just in case.
In comparison to the native Google Drive connectivity which some document scanner vendors provide, scan2drive has these main advantages:
Currently, there are a number of open issues and not all functionality might work well. Use at your own risk!
The project vision is described above. Notably, scan2drive is already feature complete. We don’t want to add any more features to it than it currently has.
scan2drive was published in the hope that it could be useful to others, but the main author has no time to create an active community around it or accept contributions in a timely manner. All support, development and bug fixes are strictly best effort.
The scans directory (-scans_dir
flag) contains the following files:
<sub>/
is the per-user directory under which scans are placed2016-05-09-21:05:02+0200/
is a directory for an individual scan
page*.jpg
are the raw pages obtained by calling scanimage
scan.pdf
is the converted PDFthumb.png
is the first page of the converted PDF for display in the UICOMPLETE.*
are empty files recording which individual processing steps
are doneAny file in the scans directory can be deleted at will, with the caveat that
deleting scans before the COMPLETE.uploadoriginals
file is present will
result in that scan being irrevocably lost.
The state directory (-state_dir
flag) contains the following files:
cookies.key
is a secret key with which cookies are encryptedsessions/
contains session contentsusers/
is a directory containing per-user datausers/<sub>/
is a directory for an individual user
drive_folder.json
contains information about the selected destination
Google Drive folder. In case this file is deleted, the user will need to
re-select the destination folder and scans cannot be uploaded until a new
destination folder has been selected.token.json
contains the offline OAuth token for accessing Google Drive
on behalf of the user. In case this file is deleted, the user will need
to re-login. In case this file is leaked, the user should revoke the
token
First, follow the gokrazy quickstart instructions.
Then, add github.com/stapelberg/scan2drive/cmd/scan2drive
package to your
gokrazy instance:
gok -i scanner add github.com/stapelberg/scan2drive/cmd/scan2drive
Deploy your gokrazy instance to your Raspberry Pi and connect a supported scanner.
You should be able to access the gokrazy web interface at the URL which the
gok
tool printed. To access the scan2drive web interface, switch to port 7120.
For setting up Google OAuth, you’ll need to access scan2drive via a domain name with a valid TLS certificate. scan2drive has builtin support to obtain free certificates from Let’s Encrypt, but you do need to make your scan2drive installation reachable over the internet for this to work:
libjpeg-turbo is a JPEG image codec that uses SIMD instructions (Arm Neon in case of the Raspberry Pi) to accelerate baseline JPEG compression.
scan2drive can optionally make use of libjpeg-turbo (via the turbojpeg
build
tag), but doesn’t include it by default because of the cumbersome setup.
Using libjpeg-turbo on gokrazy requires a few extra setup steps. Because gokrazy does not include a C runtime environment (neither libc nor a dynamic linker), we need to link scan2drive statically.
Install the gcc cross compiler, for example on Debian:
apt install crossbuild-essential-arm64
Enable cgo for your gokrazy instance. This means setting the following
environment variables when calling gok
(for example in your “gokline”, see
gokrazy → Automation):
export CC=aarch64-linux-gnu-gcc
export CGO_ENABLED=1
Enable static linking and the turbojpeg
build tag for scan2drive in your
instance config (use gok edit
):
{
"Hostname": "scanner",
"Packages": [
"github.com/gokrazy/fbstatus",
"github.com/gokrazy/hello",
"github.com/gokrazy/serial-busybox",
"github.com/gokrazy/breakglass",
"github.com/stapelberg/scan2drive/cmd/scan2drive"
],
"PackageConfig": {
"github.com/stapelberg/scan2drive/cmd/scan2drive": {
"GoBuildFlags": [
"-ldflags=-linkmode external -extldflags -static"
],
"GoBuildTags": [
"turbojpeg"
]
}
}
}