Zpaqfranz Versions Save

Deduplicating archiver with encryption and paranoid-level tests. Swiss army knife for the serious backup and disaster recovery manager. Ransomware neutralizer. Win/Linux/Unix

58.8

9 months ago

58.7

10 months ago

58.6

10 months ago

Bug fixing

fwrite() on 58.5 broke some commands

More developing on -fasttxt (aka: automagically computing the CRC-32 of the archive)

Works on multipart / indexed multipart archive
Not yet 100% tested The -verify switch will run a test-against-the-filesystem [good for debugging]

Encrypted-indexed

zpaqfranz a z:\test_??? c:\zpaqfranz\*.exe -index z:\indez.zpaq -key pippo -fasttxt
zpaqfranz a z:\test_??? c:\zpaqfranz\*.cpp -index z:\indez.zpaq -key pippo -fasttxt -verify
zpaqfranz a z:\test_??? c:\zpaqfranz\*.txt -index z:\indez.zpaq -key pippo -fasttxt -verify
zpaqfranz versum z:\test*.zpaq -fasttxt

Backup

zpaqfranz backup z:\baz c:\zpaqfranz\*.cpp -fasttxt
zpaqfranz backup z:\baz c:\zpaqfranz\*.exe -fasttxt
zpaqfranz backup z:\baz c:\zpaqfranz\*.txt -fasttxt
zpaqfranz versum z:\baz*.zpaq -fasttxt

With a couple more releases I should be ready to start the actual implementation of zpaqfranz-over-TCP. Basically only the index will be stored locally, not the data that will be sent to zpaqfranz-server in the cloud. Really complicated, with all the special cases provided by zpaq, but I am starting to see light at the end of the tunnel

The system will be 100% ransomware insensitive [of course if the server is not compromised!], allowing recovery (at least in intentions) in any situation, even the most catastrophic

Basically I am operating a bottom-up plus divide-et-impera. Work in progress...

Download zpaqfranz

58.5

10 months ago

Fixed a small but nasty bug in t for big files

Example in this thread Added during refactoring sorting until 10-chars long instead of 40. It doesn't actually invalidate anything, but still it is unpleasant

Automagically add files

Every promise is a debt zpaqfranz a z:\58_5 -key pippo => if ./58_5 file|folder does exists, automagically add to the archive

New fasttxt switch / format

zpaqfranz now can automagically calculate the CRC-32 of the archive (without, of course, re-reading from filesystem), writing down in archivename_crc32.txt file

C:\zpaqfranz>zpaqfranz a z:\1.zpaq *.cpp -fasttxt
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:-fasttxt -hw
Creating z:/1.zpaq at offset 0 + 0                                  )
Add 2023-07-12 14:00:13        27         89.286.021 (  85.15 MB) 32T (0 dirs)
27 +added, 0 -removed.

0 + (89.286.021 -> 16.670.812 -> 2.069.455) = 2.069.455 @ 57.38 MB/s
62655: CRC-32 EXPECTED E948770C
62682: Updating fasttxt z:/1_crc32.txt :OK

1.500 seconds (000:00:01) (all OK)

Getting something like that

C:\zpaqfranz>type z:\1_crc32.txt
$zpaqfranz fasttxt|1|2023-07-12 14:00:14|z:/1.zpaq
E948770C 8293084830611972 0 [2.069.455] (0)

In this example the first data (E948770C) is the (expected) CRC-32 of the archive. The second 8293084830611972 , is the getted "quick" hash, the third (0) in this case the initial CRC-32, then filesizes "Quick hash" is the heuristic hash introduced some release earlier

Using the versum command, with -fasttxt, it is possible to check very quickly

C:\zpaqfranz>zpaqfranz versum z:\1.zpaq -fasttxt
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:versum                                    | - command
franz:-fasttxt -hw
66764: Test CRC-32 of .zpaq against _crc32.txt
87163: Bytes to be checked 2.069.455 (1.97 MB) in files 1

66323: OK CRC-32: z:/1.zpaq
====================================================================
66356: TOTAL          1
66357: OK             1
66358: WARN           0
66359: ERROR          0

0.016 seconds (00:00:00) (all OK)

with -quick in (almost) no time

C:\zpaqfranz>zpaqfranz versum z:\1.zpaq -fasttxt -quick
zpaqfranz v58.5o-JIT-GUI-L,HW SHA1/2,SFX64 v55.1,(2023-07-12)
franz:versum                                    | - command
franz:-quick -fasttxt -hw
66764: Test QUICK of .zpaq against _crc32.txt
87163: Bytes to be checked 2.069.455 (1.97 MB) in files 1

66323: OK QUICK: z:/1.zpaq
====================================================================
66356: TOTAL          1
66357: OK             1
66358: WARN           0
66359: ERROR          0

0.031 seconds (00:00:00) (all OK)

You can run even with .zpaq (on Linux ".zpaq")

zpaqfranz versum *.zpaq -fasttxt

Why this "thing", most like -checktxt ?

Because the CRC-32 calculation is performed during the writing phase to the disk, so it has minimal impact in terms of time and CPU, and is ONLY performed on the added part

Let's take a concrete example, otherwise it is difficult to understand the incredible usefulness (in certain scenarios, of course)

Suppose you make a backup with a certain tool (e.g. 7z, rar, tar) of a certain folder. Suppose the archive is 500GB in size and resides (as normal) on a slow device, e.g. a NAS with magnetic disks, used by many others

Suppose you want to transfer it to another device (as normal), e.g. with rsync.
This will require reading all 500GB (locally, maybe painfully slow), calculating the relevant checksums (for rsync they are basically md5, high CPU usage), remotely sending all 500GB (=saturating all bandwidth), remotely calculating 500GB (=high I/O and CPU) of md5 hashes, and comparing them.

Now you are paranoid: your archive is full of precious data, therefore you launch a local CRC-32 (for the .7z, rar, tar...) AND a remote CRC-32, just to be sure

So far, so good, zpaqfranz pay the same "cost" (for the FIRST run)

The backups, however, are typically always repeated, say daily (even more often, say at night as a typical case)

On the 2nd run, with tar, 7z, rar etc, you will be in the exact situation Suppose the new archive is 501GB (in the source folder 2GB changed) Creating (aka: writing) a 501GB giant file, read everything back, calculate md5, calculate (remotely by rsync) 500GB and and and... hours in local, hours in remote, a LOT of I/O local, a LOT of CPUs

  1. local: Read 2GB
  2. local: Write 1GB
  3. local: MD5 of 501GB
  4. local: Send ~1GB
  5. remote: MD5 of 500GB
  6. remote: Write of 1GB

With zpaqfranz 58.4 and checktxt...

  1. local: Read 2GB
  2. local: Write 1GB
  3. local: MD5 of 501GB
  4. local: Send 1GB
  5. remote: Write of 1GB
  6. remote: MD5 of 501GB

With zpaqfranz 58.5 and fasttxt...

  1. local: Read 2GB
  2. local: Write 1GB
  3. local: Send 1GB
  4. remote: Write of 1GB
  5. remote: CRC-32 of 501GB

In future release the 5) step will become "CRC-32 of 1GB"

Real-world Windows example

Therefore here a little (!) Windows batch file

Suppose you want to backup to a remote server (a Linux box) "something", some Windows' data, using a local encryption password

Since you are lazy, you want not only the local copy to be verified, but also the remote one CRC-32 compared with the local, and you want a different e-mail depending on the verification (in case of error or not) BUT DO NOT WANT TO SEND THE PASSWORD TO THE REMOTE SERVER

Since you use an FTTH connection you really want to send the minimum amount of information changed, and you do NOT want to run rsync on huge files (hundreds of GB) that can take hours

We have a key-based authentication (for ssh, then rsync-over-ssh)

First step: make the archive, in this example into k:\franco\test\zpaqfranz_pippo.zpaq Of the two folders c:\zpaqfranz c:\stor with password (key) pippo support for longer than 255 files (-longpath) using CRC-32 for late cloud test (-fasttxt) no ETA (this is a batch file afterall, who cares, -noeta) and we want a BIG confirmation (-big) easier to spot on e-mails

@echo off

date /t  >c:\stor\result.txt
time /t >>c:\stor\result.txt

c:\stor\bin\zpaqfranz a k:\franco\test\zpaqfranz_pippo.zpaq c:\zpaqfranz c:\stor -longpath -key pippo -fasttxt -noeta -big >>c:\stor\result.txt

Now we want to list all the versions, just to make sure the update is done (few things are worse than a backup update that does not update anything)

c:\stor\bin\zpaqfranz i k:\franco\test\zpaqfranz_pippo.zpaq -key pippo -noeta                               >>c:\stor\result.txt

Now we want to (locally) test the archive. Please note: locally. The password "pippo" is NOT sent over internet

c:\stor\bin\zpaqfranz t k:\franco\test\zpaqfranz_pippo.zpaq -key pippo -noeta -big                          >>c:\stor\result.txt

OK, we make the same thing, for a second archive file (just an example) k:\franco\test\nz_pippo.zpaq

c:\stor\bin\zpaqfranz a k:\franco\test\nz_pippo.zpaq c:\nz -longpath -key pippo  -fasttxt -big >>c:\stor\result.txt
c:\stor\bin\zpaqfranz i k:\franco\test\nz_pippo.zpaq -key pippo -noeta                         >>c:\stor\result.txt
c:\stor\bin\zpaqfranz t k:\franco\test\nz_pippo.zpaq -key pippo -noeta -big                    >>c:\stor\result.txt

Now we upload everything with --append Only the data changed from the last run will be sended over rsync (on ssh) to the remote Linux box This will usually takes minute

c:\stor\bin\rsync -e "c:\stor\bin\ssh.exe -p 22 -i c:\stor\bin\thekey"  -I -r --append --partial --progress              --chmod=a=rwx,Da+x /k/franco/test/ [email protected]:/home/theuser/copie/test/ >>c:\stor\result.txt

Now we enforce the upload of the *.txt files (forcing to "refresh" the *_crc32.txt) with --checksum

c:\stor\bin\rsync -e "c:\stor\bin\ssh.exe -p 22 -i c:\stor\bin\thekey"  -I -r --include="*.txt" --exclude="*" --checksum --chmod=a=rwx,Da+x /k/franco/test/ [email protected]:/home/theuser/copie/test/ >>c:\stor\result.txt

Now we get the size (of the /home/theuser) folder, and the space free, with the command s BEWARE you may need something like /usr/local/bin/zpaqfranz, it depend on PATH

c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] zpaqfranz s /home/theuser          >>c:\stor\result.txt

Run some other remote command, for example ls all things (zpool status, df -h, whatever, just an example)

echo --------- >>c:\stor\result.txt
c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] ls -l '/home/theuser/copie/test/*' >>c:\stor\result.txt

And now remotely test (by CRC-32) the uploaded *.zpaq, with the _crc32.txt, NO PASSWORD sent

c:\stor\bin\ssh -p22 -i c:\stor\bin\thekey [email protected] zpaqfranz versum '/home/theuser/copie/test/*.zpaq' -fasttxt -noeta -big >>c:\stor\result.txt

Now well'do a very dirty trick, counting the OK in the output log, with grep In this example should be 5 Beware: you need the very latest zpaqfranz here (58.5m+) We make two of them, one for the body, one for the attachment of the email

echo ==================================== >>c:\stor\result.txt
echo ============ COUNT OK    =========== >>c:\stor\result.txt
echo ==================================== >>c:\stor\result.txt
echo 5 >c:\stor\countok.txt
echo 5 >c:\stor\countbody.txt
c:\stor\bin\egrep "#     # ###!" c:\stor\result.txt -c >>c:\stor\countok.txt
c:\stor\bin\egrep "#     # ###!" c:\stor\result.txt -c >>c:\stor\countbody.txt
c:\stor\bin\zpaqfranz last2 c:\stor\countok.txt -big >>c:\stor\result.txt

Pack the report with 7z (reports can become very BIG in case of errors)

date /t >>c:\stor\result.txt
time /t >>c:\stor\result.txt

del c:\stor\report.7z
c:\stor\bin\7z a c:\stor\report.7z c:\stor\result.txt

Now make another results (for email body)

echo ==================================== >c:\stor\body.txt
echo ========== COUNT OK BODY =========== >>c:\stor\body.txt
echo ==================================== >>c:\stor\body.txt
c:\stor\bin\zpaqfranz last2 c:\stor\countbody.txt -big >>c:\stor\body.txt

Finally send two different e-mail (usually you will change even the -to to your primary email in case of error)

if not errorlevel 1 goto va
if errorlevel 1 goto nonva

:nonva
c:\stor\bin\mailsend -t [email protected] -cc [email protected] -f [email protected] -starttls -port 587 -auth -smtp smtp.mymail.com -sub "***ERROR *** Backup (theuser)" -user [email protected] -pass mygoodpassword -mime-type "application/x-7z-compressed" -enc-type "base64" -aname "report.7z" -attach "c:\stor\report.7z" -mime-type "text/plain" -disposition "inline"    -attach "c:\stor\body.txt"
goto fine

:va
c:\stor\bin\mailsend -t [email protected] -cc [email protected] -f [email protected] -starttls -port 587 -auth -smtp smtp.mymail.com -sub "Backup (theuser)" -user [email protected] -pass mygoodpassword -mime-type "application/x-7z-compressed" -enc-type "base64" -aname "report.7z" -attach "c:\stor\report.7z" -mime-type "text/plain" -disposition "inline"    -attach "c:\stor\body.txt"
:fine

On *nix it is not possible to do a synchronous t (test) on ssh, it depends on the shell creation (it is long to explain, I would say that is enough for now). On Windows, however, you can

Short version (!)

You can get a compliance check of a local and a remote file, through CRC-32, by "paying" only the cost of CRC-32 calculation on the remote computer. The remote CRC-32 calculation can also be done, for example, via a crontab for multiple archives by using wildcards ("*.zpaq")
By using a switch -quick you can make heuristic checks (i.e., on the start, middle, and end of files), so you can be fairly sure against the switch --append mismatch of rsync, in a few milliseconds (if you don't want to have the entire MD5 or CRC-32 of the remote file re-calculated. Backup files can be hundreds of gigabytes in size) If you are paranoid instead, you can use -checktxt, which implies (default) the use of MD5, or (optional) XXH3. This, however, can get "expensive" for very large backups

In future, of course, this will be a become a zpaqfranz-over-TCP

Download zpaqfranz

58.4

11 months ago

New command consolidatebackup

Convert multiple .zpaq chunks into one backup, or convert a single .zpaq to the new backup format

Convert archive to backup            consolidatebackup z:\foo.zpaq -to k:\newbackup -key pippo

New switch -checktxt for command versum

Compare the md5s with the .zpaq (s), taking from

  • filename.md5
  • filename_md5.txt
zpaqfranz a prova.zpaq c:\dropbox -checksum
zpaqfranz versum "*.zpaq" -checksum

Cross-check of rsync-transferred archives

H:\backup\abc\abc>zpaqfranz versum *.zpaq -checktxt
zpaqfranz v58.4s-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-06-23)
franz:versum                                    | - command
franz:-checktxt -hw
66265: Test MD5 hashes of .zpaq against _md5.txt
66136: Searching for jolly archive(s) in <<*.zpaq>> for extension <<zpaq>>
66288: Bytes to be checked 72.114.708.571 (67.16 GB) in files 4

66323: OK: nas_email.zpaq

66323: OK: nas_gestione.zpaq

66323: OK: nas_nextcloud.zpaq

66323: OK: nextvm.zpaq
===========================================
66356: Total couples         4
66357: OK                    4
66358: WARN                  0
66359: ERROR                 0

320.969 seconds (000:05:20) (all OK)

No error access denied for system volume information (on Windows)

Do not allow multiple instance of a running backup

Tries to prevent corruption of backups launched, for example, from a crontab

New switch --backupxxh3

Use xxh3 instead of md5 in backups. md5 is good, but xxh3 is faster

Bug fix

Various

Different update (projected size)

When adding data new infos

zpaqfranz a z:\pizza c:\zpaqfranz
zpaqfranz v58.4s-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-06-23)
franz:-hw
Creating z:/pizza.zpaq at offset 0 + 0
Add 2023-06-23 17:51:31     3.192      1.603.682.848 (   1.49 GB) 32T (234 dirs)
Long filenames (>255)         1 *** WARNING *** (-fix255)
        55.40% 00:00:03  ( 847.31 MB)->(  84.06 MB)=>( 151.73 MB)  211.83 MB/sec

Carefully look: the 1.49GB will be stored (linear projection) in 151.73MB

Support for wildcards (ex. *.zpaq) on *nix

The handling of wildcards is different between Windows and *nix. Basically in the second case the expansion is done, almost always, at the shell level. Now there are specific functions that-even on *nix-do an enumeration of files of the type *.zpaq. This is used, clearly, for commands such as multiple tests

zpaqfranz t "./*.zpaq"

BEWARE OF DOUBLE QUOTES!

Download zpaqfranz

58.3

1 year ago

Some bug fixing

Catching Control-C is not so easy or painless

Some kludges in -longpath

The mighty -longpath switch, on Windows, is for... paths. Therefore should not be used with... files, or wildcards. I realized I didn't spell it explicitly, that by "path" I meant... a "path" Now this should be OK...

C:\Users\utente>zpaqfranz a z:\ok.zpaq * -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq *.* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c: -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:*.* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente\ -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente\* -longpath
C:\Users\utente>zpaqfranz a z:\ok.zpaq c:\users\utente\*.* -longpath

With explicit fail otherwise

C:\Users\utente>zpaqfranz a z:\ok.zpaq *.txt -longpath
zpaqfranz v58.3c-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-05-08)
franz:-longpath
38992: INFO: getting Windows' long filenames
59854: -longpath does not work with *, select A PATH!

0.015 seconds (00:00:00) (with warnings)

Work-in-progress: "smarter" zfsproxbackup

_This version includes a "smarter" (so to speak, of course) parser for searching the path of virtual machines zfs-stored (in files, NOT on block zfs devices) on proxmox. Far from perfection, in fact. Just an improvement. As someone might guess I'm increasing my cloud server fleet :) Next week I should get a rather big one (~16TB) for further development, stay tuned if you are a "proxxymoxxy" _

Download zpaqfranz

58.2

1 year ago

zpaqfranz now...

  • "hopefully" intercept control-c to delete empty 0 bytes long chunks
  • "hopefully" automagically delete 0 bytes long chunks before run
  • "hopefully" intercept control-c to delete 0 bytes long archives
  • get a better scanning... update (every 1 sec)

New hasher QUICK (just a fake hash!)

zpaqfranz sum j:\ -quick -summary -ssd

This is a "fake" hash, better a similarity estimator. For (smaller than 64KB) file get a full xxhash64, for larger one takes xxhash64 of 16KB (head), 16KB (middle), 16KB (tail).

The use, as can be understood (!), is twofold

1) Rapid estimation of file-level duplication of very large amounts of data Using "exact" systems, i.e., calculating the hashes of each individual file to search for duplicates, is (still) very slow and expensive "Quick" hashing, of course, does not guarantee against "wrong collision" at all (this happens even for small amounts of data) The effect is to depend more on the number of files than on their size, running @ 50GB/s or even more, much more. Sometimes you want to quickly "understand" if a new file server can benefit from de-duplication

2) fast check for backups

New command backup

As everyone knows (or may be not) my very first contribute to zpaq was a rudimentary implementation of multipart archives, then merged by Mahoney (with his usual high skill).

Unfortunately, however, zpaq is more an archiver rather than a backup system: there are no realistic ways to check the integrity of multipart archives.

There are critical cases where you want to do cloud backups on systems that do NOT allow the --append of rsync (OK, rclone and robocopy, I'm talking about you)

Similarly, computing hashes on inexpensive VPS cloud systems, usually with very slow disks is difficult, already for sizes around ~50GB

This new release creates a text-based index file that keeps the list of multiparts, their size, their MD5 and their quick hash

Multipart backup with zpaqfranz

zpaqfranz backup z:\prova.zpaq *.cpp

Will create automagically create

  • a multipart archive starting from prova_00000001.zpaq, prova_00000002.zpaq, prova_00000001.zpaq...
  • a textfile index prova_00000000_backup.txt
  • a binary index prova_00000000_backup.index

Why? Spiegone is coming...

When you use "?" inside filename, you will get a multipart archive

zpaq a z:\pippo_???????.zpaq *.txt

Every new version, in zpaq, is just appended to the archive, but in this case the file is "splitted" in "pieces". This is almost perfect for a rclone / rsync (without --append) / robocopy, whatever, to send the minimum amount of data.

So far, so good.

BUT

zpaq does not handle very well

  1. the zero length: if you press control-C during compression, a 0-bytes long pippo_00000XX.zpaq is (can) be made
  2. the "hole" (a missing "piece", pippo001, pippo002, pippo007, pippo008...)
  3. mixing different backups. You can replace one piece of a zpaq multipart archive with another, and zpaq will joyfully consider it, without noticing the error (!). Since each session is "self-sufficient" zpaq not only does not warn the user, but in the case of encryption (i.e., with -key) nasty thing happens.
  4. cannot really (quickly) check the archive for missing parts: if a "piece" is lost, it is possible that everything (from those version to the last) is lost too. Even more, if you hold data from third-party clients, for testing an encrypted archive you need the password, which you simply don't have. And 99.9 percent of backups are encrypted, even the one on LAN-connected NASes.
  5. speed. If you have a 10.000 "pieces" backup, splitted in 10.000 "chunks", with zpaq you really cannot say if everything is OK, unless you run a (lengthy) full-scale test, this can take hours (ex. virtual machine disks)

Therefore...

New command testbackup

zpaqfranz testbackup z:\prova.zpaq

This command does a lot of different things, with either the optional switches

  • -verify enforce a full MD5 check
  • -ssd for multithreaded run (on solid state)
  • -verbose show infos
  • -range from:to to check only "some" pieces
  • -to where-the-zpaq-pieces-are
  • -paranoid

WHAT?

The answer is how quickly testing remote "cloud" backups: usually you will

  • zpaqfranz to a local drive
  • robocopy / rsync / zpaqfranz r to a "remote" location
  • run a remote script (to locally check, locally in the cloud server) || download the remote file, locally, then check back

The last point is the key: getting a smaller file (the last multipart) makes everything much faster. You can md5sum the "remote" file, comparing against the stored MD5, that's it Currently (before 58.2) you need to do a full-hash of the entire archive (that can become quite big). Not a big deal for a full-scale Debian o FreeBSD server.

I hope this is clear (?), I'll post a full real world-wiki example here A few examples, better than a thousand words

zpaqfranz testbackup z:\prova

Use the "quick hash" to check if all the pieces are the exact size, and "seems" to be filled with the right data. Almost instantaneous

zpaqfranz testbackup z:\prova -verify

Check all pieces with MD5. Now if everything is OK you are almost sure. In this case the files are expected in the same position of creation

zpaqfranz testbackup z:\prova -paranoid

Compare the binary index vs the zpaq parts. If the data match perfectly you can be confident. For encrypted volumes the password is needed by -key

zpaqfranz testbackup z:\prova -verify -ssd -to z:\restored

Test MD5 (in multithread mode), searching the .zpaqs inside z:\restored

zpaqfranz testbackup z:\prova -range 10: -verify

Check from chunk 10 until the last (range examples: -range 3:5 -range :3, range 10:)

New command last

This will return the last partname, usually for scripting

zpaqfranz last z:\prova_????????

New command last2

Compare the last 2 rows of a textfile, assuming hash name. As you can guess, it facilitates, in scripted backup processing, the comparison of remote hashes with local ones. Refer to the example wiki, I will put up some working scripts.

zpaqfranz last2 c:\stor\confronto.txt -big

New sum switches

To make a md5sum-like, you can use a barrage of switches

zpaqfranz sum *.txt -md5 -pakka -noeta -stdout -nosort

Do not forget -ssd for non spinning drives

FAQ

Is this a panacea?

Of course NOT
Personally, I don't like splitting backups into many different parts at all, the risk of one being lost, garbled or corrupted is high
However, in certain cases, there is no alternative. I do not mention one of the most well-known Internet service providers, to avoid publicity (after all... they do not pay me :)

Better a or backup ?

I use both of them I am thinking of an evolution of multipart with error correction (not detect, correction), but the level of priority is modest

Why the ancient MD5?

Unless now zpaqfranz use XXH3 for this kind of detection (-checktxt) But, sometimes, you must choose the fastest among "usual" ones (spoiler: some cheap cloud vendors)

Download zpaqfranz

58.1

1 year ago

This is a brand new branch, full of bugs, ehm "features" :)

HW accelerated SHA1/SHA2

Up to version 57 the hardware acceleration was only available for the Windows version (zpaqfranzhw.exe) From version 58 (obviously still to be tested) it also becomes activatable on different systems (newer Linux-BSD-based AMD/Intel), via the compilation switch -DHWSHA2

zpaqfranz (should) then autodetect the availability of those CPU extensions, nothing is needed by the user It is possible to enforce with the -hw
To see more "things" use b -debug

TRANSLATION

If you compile with -DHWSHA2 you will get something like that

zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-03-21)

In this example this is a INTEL (JIT) executable, with (kind of) GUI (on Windows), with HW BLAKE3 acceleration, SHA1/2 HW acceleration, win SFX64 bit module (build 55.1)

So far, so good

Then run

zpaqfranz b -debug

If you are lucky you will get something like

(...)
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-03-21)
FULL exename <<C:/zpaqfranz/release/58_1/zpaqfranz.exe>>
42993: The chosen algo 3 SHA-1
1838: new ecx 2130194955
1843: new ebx 563910569
SSSE3 :OK
SSE41 :OK
SHA   :OK
DETECTED SHA1/2 HW INSTRUCTIONS
(...)

zpaqfranz will "automagically" runs HW acceleration, because your CPU does have SSSE3, SSE4.1, and SHA extension Of course if you get a "NO"... bye bye

This kind of CPUs should be AMD Zen family (Ryzen, Threadripper, etc), Intel mobile 10th+, Intel desktop 11th+ generation

BTW the old zpaqfranzhw.exe (Win64) is

zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1,SFX64 v55.1,(2023-03-21)

Beware: this is SHA1 acceleration, NOT SHA1/2. Therefore you will need to enter the -hw switch manually (to enable)

RECAP

  • With -DHWSHA2 enabled, zpaqfranz will detect and use the HW acceleration, if it thinks your CPU supports it
  • If, for some reason, you want to force its use, even on CPUs that do not officially have these extensions, use the switch -hw; usually you will get a segmentation fault or something like that (depending on the operating system), not my fault
  • If you want to know if zpaqfranz "thinks" that your CPU is enabled, use zpaqfranz b -debug and look at the output
  • Will you get a huge improvement in compression times? No, not really. You will have the biggest difference if you use SHA256 hashing functions, which benefit so much from the acceleration. SHA1 much less (the software version is already very fast)
  • Is -DHWSHA2 faster than -DHWSHA1 ? In fact, no. SHA1 is "just a tiny bit" faster. Why? Too long to explain.
  • Why does my even relatively modern Intel CPU not seem to support it? Who knows, the short version: not my fault. Even relatively recent CPUs have not been equipped by the manufacturer Intel
  • Does it work on SPARC-ARM-PowerPC-whatever-strange-thing? Of course NO
  • Is it production-safe? Of course NOT. As the very first release some nasty things can happend

Luke, remember. The more feedback, the more bug-fixing. Luke, report bugs, use the Force...

And don't forget the github star and sourceforce review! (I am becoming like a youtuber who invites people to subscribe to channels LOL)

Other news

Some refactoring, to became more "Mac-friendly" (here the risk of introducing bugs is considerable, sorry, I will correct them as I go along)

Using MD5 instead of XXH3 in checktxt (supporting Hetzner storagebox, there is still work to be done)

Some "GUI" improvement (In perspective, I am preparing the possibility of selecting some files to extract, but it still needs development)

No more dd embedded (smaller source size)

Download zpaqfranz

57.5

1 year ago

Changed help

Rationalisation of help

zpaqfranz
zpaqfranz h
zpaqfranz h h
zpaqfranz h full

Multioperation (with wildcards)

In commands t and x (test, extract)

zpaqfranz t *.zpaq ...
zpaqfranz x pippo*.zpaq...

Initial (kind of) text based GUI (Windows)

The new gui command open a (rudimentaly) ncurses-based GUI for listing, sorting, selecting and extracting files
Yes, I know, the vim-style syntax is not exactly user friendly, there will be future improvements

Under Windows, compiling with the -DGUI switch, you can do something like

zpaqfranz gui 1.zpaq

The vim-like commands are f F / => find substring Cursor arrow up-down left-right => page up, page down, line up, line down

    • => move line + - : => goto line m M => set minsize Maxsize d D => set datefrom Dateto q Q ESC => exit F1 sort name, F2 sort size, F3 sort date, F4 sort ext, F5 sort hash F6 show size, F7 show date, F8 show hash, F9 show stdout t => change -to s => searchfrom r => replace to x => extract visible rows

In this example we want to extract all the .cpp files as .bak from the 1.zpaq archive. This is something you typically cannot do with other archives such as tar, 7z, rar etc.

With a "sort of" WYSIWYG 'composer'

First f key (find) and entering .cpp Then s (search) every .cpp substring Then r (replace) with .bak Then t (to) for the z:\example folder Finally x to run the extraction

Example

In the medium term, in addition to bug fixes, box filters etc., there will be a PAKKA-style sorted list, or time machine style, with versions of individual files

Download zpaqfranz

57.4

1 year ago

New command: 1on1

Deduplicate a folder against another one, by filenames and checksum, or only checksum

Julius Erving and Larry Bird Go One on One

A deduplication function at file level is to identify files inside folders that have been 'manually' duplicated e.g. by copy-paste

I did not find portable and especially fast programmes: they often use very... stupid approaches (NxM comparisons), with quite high slowdowns. By using the -ssd switch it is possible to activate the multithread which allows, in the real world, performance above GB/s

To make things clear the file into "-deleteinto" will be (in case) deleted Dry run (no -kill), =hash,=filename,multithread

zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -ssd

Real run (because -kill), 0-files too

zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -zero -kill

Real run, with XXH3, with everything (even file with .zfs). This will delete file with DIFFERENT name, BUT same content

zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -xxh3 -kill -forcezfs

Updated zfs-something commands

zfsadd

Now support almost every zpaqfranz switch, getting the timestamp from snapshot, not snapshot name

Suppose you have something like that

tank/pippo@franco00000001
tank/pippo@franco00000002
tank/pippo@franco00000003
(...)
tank/pippo@franco00001025

You want to purge those snapshots, but retaining the data, getting everything inside consolidate.zpaq

zpaqfranz zfsadd /tmp/consolidated.zpaq "tank/pippo" "franco" -force

You can get only a folder, read the help!

Then you can purge with

zpaqfranz zfspurge "tank/pippo" "franco" -script launchme.sh

This method is certainly slow, because it requires an exorbitant amount of processing. However, the result is to obtain a single archive that keeps the data in a highly compressed format, which can eventually be extracted at the level of a single version-snapshot

In short, long-term archiving for anti-ransomware policy

Improved zfsreceive

This VERY long term archiving of zfs snapshots is now tested for 1000+ snapshots on 300GB+ datasets, should be fine

Example: "unpack" all zfs snapshots (made by zpaqfranz zfsbackup command) from ordinato.zpaq into the new dataset rpool/restored

zpaqfranz zfsreceive /tmp/ordinato.zpaq rpool/restored -script myscript.sh

Then run the myscript.sh

Download zpaqfranz