Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix
fwrite() on 58.5 broke some commands
Works on multipart / indexed multipart archive
Not yet 100% tested
The -verify switch will run a test-against-the-filesystem [good for debugging]
Encrypted-indexed
zpaqfranz a z:\test_??? c:\zpaqfranz\*.exe -index z:\indez.zpaq -key pippo -fasttxt
zpaqfranz a z:\test_??? c:\zpaqfranz\*.cpp -index z:\indez.zpaq -key pippo -fasttxt -verify
zpaqfranz a z:\test_??? c:\zpaqfranz\*.txt -index z:\indez.zpaq -key pippo -fasttxt -verify
zpaqfranz versum z:\test*.zpaq -fasttxt
Backup
zpaqfranz backup z:\baz c:\zpaqfranz\*.cpp -fasttxt
zpaqfranz backup z:\baz c:\zpaqfranz\*.exe -fasttxt
zpaqfranz backup z:\baz c:\zpaqfranz\*.txt -fasttxt
zpaqfranz versum z:\baz*.zpaq -fasttxt
With a couple more releases I should be ready to start the actual implementation of zpaqfranz-over-TCP. Basically only the index will be stored locally, not the data that will be sent to zpaqfranz-server in the cloud. Really complicated, with all the special cases provided by zpaq, but I am starting to see light at the end of the tunnel
The system will be 100% ransomware insensitive [of course if the server is not compromised!], allowing recovery (at least in intentions) in any situation, even the most catastrophic
Basically I am operating a bottom-up plus divide-et-impera. Work in progress...
Example in this thread Added during refactoring sorting until 10-chars long instead of 40. It doesn't actually invalidate anything, but still it is unpleasant
Every promise is a debt zpaqfranz a z:\58_5 -key pippo => if ./58_5 file|folder does exists, automagically add to the archive
zpaqfranz now can automagically calculate the CRC-32 of the archive (without, of course, re-reading from filesystem), writing down in archivename_crc32.txt file
C:\zpaqfranz>zpaqfranz a z:\1.zpaq *.cpp -fasttxt
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:-fasttxt -hw
Creating z:/1.zpaq at offset 0 + 0 )
Add 2023-07-12 14:00:13 27 89.286.021 ( 85.15 MB) 32T (0 dirs)
27 +added, 0 -removed.
0 + (89.286.021 -> 16.670.812 -> 2.069.455) = 2.069.455 @ 57.38 MB/s
62655: CRC-32 EXPECTED E948770C
62682: Updating fasttxt z:/1_crc32.txt :OK
1.500 seconds (000:00:01) (all OK)
Getting something like that
C:\zpaqfranz>type z:\1_crc32.txt
$zpaqfranz fasttxt|1|2023-07-12 14:00:14|z:/1.zpaq
E948770C 8293084830611972 0 [2.069.455] (0)
In this example the first data (E948770C) is the (expected) CRC-32 of the archive. The second 8293084830611972 , is the getted "quick" hash, the third (0) in this case the initial CRC-32, then filesizes "Quick hash" is the heuristic hash introduced some release earlier
Using the versum command, with -fasttxt, it is possible to check very quickly
C:\zpaqfranz>zpaqfranz versum z:\1.zpaq -fasttxt
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:versum | - command
franz:-fasttxt -hw
66764: Test CRC-32 of .zpaq against _crc32.txt
87163: Bytes to be checked 2.069.455 (1.97 MB) in files 1
66323: OK CRC-32: z:/1.zpaq
====================================================================
66356: TOTAL 1
66357: OK 1
66358: WARN 0
66359: ERROR 0
0.016 seconds (00:00:00) (all OK)
with -quick in (almost) no time
C:\zpaqfranz>zpaqfranz versum z:\1.zpaq -fasttxt -quick
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:versum | - command
franz:-quick -fasttxt -hw
66764: Test QUICK of .zpaq against _crc32.txt
87163: Bytes to be checked 2.069.455 (1.97 MB) in files 1
66323: OK QUICK: z:/1.zpaq
====================================================================
66356: TOTAL 1
66357: OK 1
66358: WARN 0
66359: ERROR 0
0.031 seconds (00:00:00) (all OK)
You can run even with .zpaq (on Linux ".zpaq")
zpaqfranz versum *.zpaq -fasttxt
Why this "thing", most like -checktxt ?
Because the CRC-32 calculation is performed during the writing phase to the disk, so it has minimal impact in terms of time and CPU, and is ONLY performed on the added part
Let's take a concrete example, otherwise it is difficult to understand the incredible usefulness (in certain scenarios, of course)
Suppose you make a backup with a certain tool (e.g. 7z, rar, tar) of a certain folder. Suppose the archive is 500GB in size and resides (as normal) on a slow device, e.g. a NAS with magnetic disks, used by many others
Suppose you want to transfer it to another device (as normal), e.g. with rsync.
This will require reading all 500GB (locally, maybe painfully slow), calculating the relevant checksums (for rsync they are basically md5, high CPU usage), remotely sending all 500GB (=saturating all bandwidth), remotely calculating 500GB (=high I/O and CPU) of md5 hashes, and comparing them.
Now you are paranoid: your archive is full of precious data, therefore you launch a local CRC-32 (for the .7z, rar, tar...) AND a remote CRC-32, just to be sure
So far, so good, zpaqfranz pay the same "cost" (for the FIRST run)
On the 2nd run, with tar, 7z, rar etc, you will be in the exact situation Suppose the new archive is 501GB (in the source folder 2GB changed) Creating (aka: writing) a 501GB giant file, read everything back, calculate md5, calculate (remotely by rsync) 500GB and and and... hours in local, hours in remote, a LOT of I/O local, a LOT of CPUs
With zpaqfranz 58.4 and checktxt...
With zpaqfranz 58.5 and fasttxt...
In future release the 5) step will become "CRC-32 of 1GB"
Therefore here a little (!) Windows batch file
Suppose you want to backup to a remote server (a Linux box) "something", some Windows' data, using a local encryption password
Since you are lazy, you want not only the local copy to be verified, but also the remote one CRC-32 compared with the local, and you want a different e-mail depending on the verification (in case of error or not) BUT DO NOT WANT TO SEND THE PASSWORD TO THE REMOTE SERVER
Since you use an FTTH connection you really want to send the minimum amount of information changed, and you do NOT want to run rsync on huge files (hundreds of GB) that can take hours
We have a key-based authentication (for ssh, then rsync-over-ssh)
First step: make the archive, in this example into k:\franco\test\zpaqfranz_pippo.zpaq Of the two folders c:\zpaqfranz c:\stor with password (key) pippo support for longer than 255 files (-longpath) using CRC-32 for late cloud test (-fasttxt) no ETA (this is a batch file afterall, who cares, -noeta) and we want a BIG confirmation (-big) easier to spot on e-mails
@echo off
date /t >c:\stor\result.txt
time /t >>c:\stor\result.txt
c:\stor\bin\zpaqfranz a k:\franco\test\zpaqfranz_pippo.zpaq c:\zpaqfranz c:\stor -longpath -key pippo -fasttxt -noeta -big >>c:\stor\result.txt
Now we want to list all the versions, just to make sure the update is done (few things are worse than a backup update that does not update anything)
c:\stor\bin\zpaqfranz i k:\franco\test\zpaqfranz_pippo.zpaq -key pippo -noeta >>c:\stor\result.txt
Now we want to (locally) test the archive. Please note: locally. The password "pippo" is NOT sent over internet
c:\stor\bin\zpaqfranz t k:\franco\test\zpaqfranz_pippo.zpaq -key pippo -noeta -big >>c:\stor\result.txt
OK, we make the same thing, for a second archive file (just an example) k:\franco\test\nz_pippo.zpaq
c:\stor\bin\zpaqfranz a k:\franco\test\nz_pippo.zpaq c:\nz -longpath -key pippo -fasttxt -big >>c:\stor\result.txt
c:\stor\bin\zpaqfranz i k:\franco\test\nz_pippo.zpaq -key pippo -noeta >>c:\stor\result.txt
c:\stor\bin\zpaqfranz t k:\franco\test\nz_pippo.zpaq -key pippo -noeta -big >>c:\stor\result.txt
Now we upload everything with --append Only the data changed from the last run will be sended over rsync (on ssh) to the remote Linux box This will usually takes minute
c:\stor\bin\rsync -e "c:\stor\bin\ssh.exe -p 22 -i c:\stor\bin\thekey" -I -r --append --partial --progress --chmod=a=rwx,Da+x /k/franco/test/ [email protected]:/home/theuser/copie/test/ >>c:\stor\result.txt
Now we enforce the upload of the *.txt files (forcing to "refresh" the *_crc32.txt) with --checksum
c:\stor\bin\rsync -e "c:\stor\bin\ssh.exe -p 22 -i c:\stor\bin\thekey" -I -r --include="*.txt" --exclude="*" --checksum --chmod=a=rwx,Da+x /k/franco/test/ [email protected]:/home/theuser/copie/test/ >>c:\stor\result.txt
Now we get the size (of the /home/theuser) folder, and the space free, with the command s BEWARE you may need something like /usr/local/bin/zpaqfranz, it depend on PATH
c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] zpaqfranz s /home/theuser >>c:\stor\result.txt
Run some other remote command, for example ls all things (zpool status, df -h, whatever, just an example)
echo --------- >>c:\stor\result.txt
c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] ls -l '/home/theuser/copie/test/*' >>c:\stor\result.txt
And now remotely test (by CRC-32) the uploaded *.zpaq, with the _crc32.txt, NO PASSWORD sent
c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] zpaqfranz versum '/home/theuser/copie/test/*.zpaq' -fasttxt -noeta -big >>c:\stor\result.txt
Now well'do a very dirty trick, counting the OK in the output log, with grep In this example should be 5 Beware: you need the very latest zpaqfranz here (58.5m+) We make two of them, one for the body, one for the attachment of the email
echo ==================================== >>c:\stor\result.txt
echo ============ COUNT OK =========== >>c:\stor\result.txt
echo ==================================== >>c:\stor\result.txt
echo 5 >c:\stor\countok.txt
echo 5 >c:\stor\countbody.txt
c:\stor\bin\egrep "# # ###!" c:\stor\result.txt -c >>c:\stor\countok.txt
c:\stor\bin\egrep "# # ###!" c:\stor\result.txt -c >>c:\stor\countbody.txt
c:\stor\bin\zpaqfranz last2 c:\stor\countok.txt -big >>c:\stor\result.txt
Pack the report with 7z (reports can become very BIG in case of errors)
date /t >>c:\stor\result.txt
time /t >>c:\stor\result.txt
del c:\stor\report.7z
c:\stor\bin\7z a c:\stor\report.7z c:\stor\result.txt
Now make another results (for email body)
echo ==================================== >c:\stor\body.txt
echo ========== COUNT OK BODY =========== >>c:\stor\body.txt
echo ==================================== >>c:\stor\body.txt
c:\stor\bin\zpaqfranz last2 c:\stor\countbody.txt -big >>c:\stor\body.txt
Finally send two different e-mail (usually you will change even the -to to your primary email in case of error)
if not errorlevel 1 goto va
if errorlevel 1 goto nonva
:nonva
c:\stor\bin\mailsend -t [email protected] -cc [email protected] -f [email protected] -starttls -port 587 -auth -smtp smtp.mymail.com -sub "***ERROR *** Backup (theuser)" -user [email protected] -pass mygoodpassword -mime-type "application/x-7z-compressed" -enc-type "base64" -aname "report.7z" -attach "c:\stor\report.7z" -mime-type "text/plain" -disposition "inline" -attach "c:\stor\body.txt"
goto fine
:va
c:\stor\bin\mailsend -t [email protected] -cc [email protected] -f [email protected] -starttls -port 587 -auth -smtp smtp.mymail.com -sub "Backup (theuser)" -user [email protected] -pass mygoodpassword -mime-type "application/x-7z-compressed" -enc-type "base64" -aname "report.7z" -attach "c:\stor\report.7z" -mime-type "text/plain" -disposition "inline" -attach "c:\stor\body.txt"
:fine
On *nix it is not possible to do a synchronous t (test) on ssh, it depends on the shell creation (it is long to explain, I would say that is enough for now). On Windows, however, you can
You can get a compliance check of a local and a remote file, through CRC-32, by "paying" only the cost of CRC-32 calculation on the remote computer. The remote CRC-32 calculation can also be done, for example, via a crontab for multiple archives by using wildcards ("*.zpaq")
By using a switch -quick you can make heuristic checks (i.e., on the start, middle, and end of files), so you can be fairly sure against the switch --append mismatch of rsync, in a few milliseconds (if you don't want to have the entire MD5 or CRC-32 of the remote file re-calculated. Backup files can be hundreds of gigabytes in size)
If you are paranoid instead, you can use -checktxt, which implies (default) the use of MD5, or (optional) XXH3.
This, however, can get "expensive" for very large backups
In future, of course, this will be a become a zpaqfranz-over-TCP
Convert multiple .zpaq chunks into one backup, or convert a single .zpaq to the new backup format
Convert archive to backup consolidatebackup z:\foo.zpaq -to k:\newbackup -key pippo
Compare the md5s with the .zpaq (s), taking from
zpaqfranz a prova.zpaq c:\dropbox -checksum
zpaqfranz versum "*.zpaq" -checksum
Cross-check of rsync-transferred archives
H:\backup\abc\abc>zpaqfranz versum *.zpaq -checktxt
zpaqfranz v58.4s-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-06-23)
franz:versum | - command
franz:-checktxt -hw
66265: Test MD5 hashes of .zpaq against _md5.txt
66136: Searching for jolly archive(s) in <<*.zpaq>> for extension <<zpaq>>
66288: Bytes to be checked 72.114.708.571 (67.16 GB) in files 4
66323: OK: nas_email.zpaq
66323: OK: nas_gestione.zpaq
66323: OK: nas_nextcloud.zpaq
66323: OK: nextvm.zpaq
===========================================
66356: Total couples 4
66357: OK 4
66358: WARN 0
66359: ERROR 0
320.969 seconds (000:05:20) (all OK)
Tries to prevent corruption of backups launched, for example, from a crontab
Use xxh3 instead of md5 in backups. md5 is good, but xxh3 is faster
Various
When adding data new infos
zpaqfranz a z:\pizza c:\zpaqfranz
zpaqfranz v58.4s-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-06-23)
franz:-hw
Creating z:/pizza.zpaq at offset 0 + 0
Add 2023-06-23 17:51:31 3.192 1.603.682.848 ( 1.49 GB) 32T (234 dirs)
Long filenames (>255) 1 *** WARNING *** (-fix255)
55.40% 00:00:03 ( 847.31 MB)->( 84.06 MB)=>( 151.73 MB) 211.83 MB/sec
Carefully look: the 1.49GB will be stored (linear projection) in 151.73MB
The handling of wildcards is different between Windows and *nix. Basically in the second case the expansion is done, almost always, at the shell level. Now there are specific functions that-even on *nix-do an enumeration of files of the type *.zpaq. This is used, clearly, for commands such as multiple tests
zpaqfranz t "./*.zpaq"
BEWARE OF DOUBLE QUOTES!
Catching Control-C is not so easy or painless
The mighty -longpath switch, on Windows, is for... paths. Therefore should not be used with... files, or wildcards. I realized I didn't spell it explicitly, that by "path" I meant... a "path" Now this should be OK...
C:\Users\utente>zpaqfranz a z:\ok.zpaq * -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq *.* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c: -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:*.* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente\ -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente\* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente\*.* -longpath
With explicit fail otherwise
C:\Users\utente>zpaqfranz a z:\ok.zpaq *.txt -longpath
zpaqfranz v58.3c-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-05-08)
franz:-longpath
38992: INFO: getting Windows' long filenames
59854: -longpath does not work with *, select A PATH!
0.015 seconds (00:00:00) (with warnings)
_This version includes a "smarter" (so to speak, of course) parser for searching the path of virtual machines zfs-stored (in files, NOT on block zfs devices) on proxmox. Far from perfection, in fact. Just an improvement. As someone might guess I'm increasing my cloud server fleet :) Next week I should get a rather big one (~16TB) for further development, stay tuned if you are a "proxxymoxxy" _
zpaqfranz sum j:\ -quick -summary -ssd
This is a "fake" hash, better a similarity estimator. For (smaller than 64KB) file get a full xxhash64, for larger one takes xxhash64 of 16KB (head), 16KB (middle), 16KB (tail).
The use, as can be understood (!), is twofold
1) Rapid estimation of file-level duplication of very large amounts of data Using "exact" systems, i.e., calculating the hashes of each individual file to search for duplicates, is (still) very slow and expensive "Quick" hashing, of course, does not guarantee against "wrong collision" at all (this happens even for small amounts of data) The effect is to depend more on the number of files than on their size, running @ 50GB/s or even more, much more. Sometimes you want to quickly "understand" if a new file server can benefit from de-duplication
2) fast check for backups
As everyone knows (or may be not) my very first contribute to zpaq was a rudimentary implementation of multipart archives, then merged by Mahoney (with his usual high skill).
Unfortunately, however, zpaq is more an archiver rather than a backup system: there are no realistic ways to check the integrity of multipart archives.
There are critical cases where you want to do cloud backups on systems that do NOT allow the --append of rsync (OK, rclone and robocopy, I'm talking about you)
Similarly, computing hashes on inexpensive VPS cloud systems, usually with very slow disks is difficult, already for sizes around ~50GB
This new release creates a text-based index file that keeps the list of multiparts, their size, their MD5 and their quick hash
zpaqfranz backup z:\prova.zpaq *.cpp
Will create automagically create
When you use "?" inside filename, you will get a multipart archive
zpaq a z:\pippo_???????.zpaq *.txt
Every new version, in zpaq, is just appended to the archive, but in this case the file is "splitted" in "pieces". This is almost perfect for a rclone / rsync (without --append) / robocopy, whatever, to send the minimum amount of data.
So far, so good.
zpaq does not handle very well
Therefore...
zpaqfranz testbackup z:\prova.zpaq
This command does a lot of different things, with either the optional switches
The answer is how quickly testing remote "cloud" backups: usually you will
The last point is the key: getting a smaller file (the last multipart) makes everything much faster. You can md5sum the "remote" file, comparing against the stored MD5, that's it Currently (before 58.2) you need to do a full-hash of the entire archive (that can become quite big). Not a big deal for a full-scale Debian o FreeBSD server.
I hope this is clear (?), I'll post a full real world-wiki example here A few examples, better than a thousand words
zpaqfranz testbackup z:\prova
Use the "quick hash" to check if all the pieces are the exact size, and "seems" to be filled with the right data. Almost instantaneous
zpaqfranz testbackup z:\prova -verify
Check all pieces with MD5. Now if everything is OK you are almost sure. In this case the files are expected in the same position of creation
zpaqfranz testbackup z:\prova -paranoid
Compare the binary index vs the zpaq parts. If the data match perfectly you can be confident. For encrypted volumes the password is needed by -key
zpaqfranz testbackup z:\prova -verify -ssd -to z:\restored
Test MD5 (in multithread mode), searching the .zpaqs inside z:\restored
zpaqfranz testbackup z:\prova -range 10: -verify
Check from chunk 10 until the last (range examples: -range 3:5 -range :3, range 10:)
This will return the last partname, usually for scripting
zpaqfranz last z:\prova_????????
Compare the last 2 rows of a textfile, assuming hash name. As you can guess, it facilitates, in scripted backup processing, the comparison of remote hashes with local ones. Refer to the example wiki, I will put up some working scripts.
zpaqfranz last2 c:\stor\confronto.txt -big
To make a md5sum-like, you can use a barrage of switches
zpaqfranz sum *.txt -md5 -pakka -noeta -stdout -nosort
Do not forget -ssd for non spinning drives
Of course NOT
Personally, I don't like splitting backups into many different parts at all, the risk of one being lost, garbled or corrupted is high
However, in certain cases, there is no alternative. I do not mention one of the most well-known Internet service providers, to avoid publicity (after all... they do not pay me :)
I use both of them I am thinking of an evolution of multipart with error correction (not detect, correction), but the level of priority is modest
Unless now zpaqfranz use XXH3 for this kind of detection (-checktxt) But, sometimes, you must choose the fastest among "usual" ones (spoiler: some cheap cloud vendors)
Up to version 57 the hardware acceleration was only available for the Windows version (zpaqfranzhw.exe) From version 58 (obviously still to be tested) it also becomes activatable on different systems (newer Linux-BSD-based AMD/Intel), via the compilation switch -DHWSHA2
zpaqfranz (should) then autodetect the availability of those CPU extensions, nothing is needed by the user
It is possible to enforce with the -hw
To see more "things" use b -debug
If you compile with -DHWSHA2 you will get something like that
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-03-21)
In this example this is a INTEL (JIT) executable, with (kind of) GUI (on Windows), with HW BLAKE3 acceleration, SHA1/2 HW acceleration, win SFX64 bit module (build 55.1)
So far, so good
Then run
zpaqfranz b -debug
If you are lucky you will get something like
(...)
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-03-21)
FULL exename <<C:/zpaqfranz/release/58_1/zpaqfranz.exe>>
42993: The chosen algo 3 SHA-1
1838: new ecx 2130194955
1843: new ebx 563910569
SSSE3 :OK
SSE41 :OK
SHA :OK
DETECTED SHA1/2 HW INSTRUCTIONS
(...)
zpaqfranz will "automagically" runs HW acceleration, because your CPU does have SSSE3, SSE4.1, and SHA extension Of course if you get a "NO"... bye bye
This kind of CPUs should be AMD Zen family (Ryzen, Threadripper, etc), Intel mobile 10th+, Intel desktop 11th+ generation
BTW the old zpaqfranzhw.exe (Win64) is
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1,SFX64 v55.1,(2023-03-21)
Beware: this is SHA1 acceleration, NOT SHA1/2. Therefore you will need to enter the -hw switch manually (to enable)
And don't forget the github star and sourceforce review! (I am becoming like a youtuber who invites people to subscribe to channels LOL)
Rationalisation of help
zpaqfranz
zpaqfranz h
zpaqfranz h h
zpaqfranz h full
In commands t and x (test, extract)
zpaqfranz t *.zpaq ...
zpaqfranz x pippo*.zpaq...
The new gui command open a (rudimentaly) ncurses-based GUI for listing, sorting, selecting and extracting files
Yes, I know, the vim-style syntax is not exactly user friendly, there will be future improvements
Under Windows, compiling with the -DGUI switch, you can do something like
zpaqfranz gui 1.zpaq
The vim-like commands are f F / => find substring Cursor arrow up-down left-right => page up, page down, line up, line down
In this example we want to extract all the .cpp files as .bak from the 1.zpaq archive. This is something you typically cannot do with other archives such as tar, 7z, rar etc.
First f key (find) and entering .cpp Then s (search) every .cpp substring Then r (replace) with .bak Then t (to) for the z:\example folder Finally x to run the extraction
In the medium term, in addition to bug fixes, box filters etc., there will be a PAKKA-style sorted list, or time machine style, with versions of individual files
Julius Erving and Larry Bird Go One on One
A deduplication function at file level is to identify files inside folders that have been 'manually' duplicated e.g. by copy-paste
I did not find portable and especially fast programmes: they often use very... stupid approaches (NxM comparisons), with quite high slowdowns. By using the -ssd switch it is possible to activate the multithread which allows, in the real world, performance above GB/s
To make things clear the file into "-deleteinto" will be (in case) deleted Dry run (no -kill), =hash,=filename,multithread
zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -ssd
Real run (because -kill), 0-files too
zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -zero -kill
Real run, with XXH3, with everything (even file with .zfs). This will delete file with DIFFERENT name, BUT same content
zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -xxh3 -kill -forcezfs
Now support almost every zpaqfranz switch, getting the timestamp from snapshot, not snapshot name
Suppose you have something like that
tank/pippo@franco00000001
tank/pippo@franco00000002
tank/pippo@franco00000003
(...)
tank/pippo@franco00001025
You want to purge those snapshots, but retaining the data, getting everything inside consolidate.zpaq
zpaqfranz zfsadd /tmp/consolidated.zpaq "tank/pippo" "franco" -force
You can get only a folder, read the help!
Then you can purge with
zpaqfranz zfspurge "tank/pippo" "franco" -script launchme.sh
This method is certainly slow, because it requires an exorbitant amount of processing. However, the result is to obtain a single archive that keeps the data in a highly compressed format, which can eventually be extracted at the level of a single version-snapshot
In short, long-term archiving for anti-ransomware policy
This VERY long term archiving of zfs snapshots is now tested for 1000+ snapshots on 300GB+ datasets, should be fine
Example: "unpack" all zfs snapshots (made by zpaqfranz zfsbackup command) from ordinato.zpaq into the new dataset rpool/restored
zpaqfranz zfsreceive /tmp/ordinato.zpaq rpool/restored -script myscript.sh
Then run the myscript.sh