Archive::Tyd 2.0

Noah Petherbridge
kirsle
Posted by Noah Petherbridge on Tuesday, August 21 2012 @ 06:28:45 PM
A long time ago I wrote about my plans to rewrite my Archive::Tyd project. Well, I've finally gotten around to doing that recently, and I feel like talking a little about it here.

I decided to keep things simple, and went with a plain text, ASCII based file format for the Tyd archives. Originally I was planning to make it a binary file format, but I didn't wanna have to deal with C-style data types (which would probably end up imposing limits on me, for example a 32-bit number has a maximum value of about 4 billion, which caused problems with the FAT32 file system by limiting maximum file sizes to 4 GB). So, plain text keeps things much simpler.

First, I'll show you what an example Tyd2 archive looks like:

TYD2:SHA1:46698a6530d53ca7004719bcad5095efaa09420a

[header]
name=Untitled Archive
packager=Archive::Tyd/2.00

[file:/file1.txt]
asize=68
atime=1345597264
checksum=97503ea37402b56b429c5210e9cfcd843c38b486
chmod=33204
ctime=1345597264
index=0
mtime=1345597264
size=51

[file:/file3.txt]
asize=32
atime=1345597264
checksum=37c9daa7930605795dbc00753e7d93c0da50b9e5
chmod=33204
ctime=1345597264
index=1
mtime=1345597264
size=24

[content]
VGhpcyBpcyBmaWxlIG51bWJlciBvbmUuCgpJdCBpcyBhIHZlcnkgc21hbGwgZmlsZS4K
VGhpcwppcwp0aGUKdGhpcmQKZmlsZS4K
The archive resembles an INI file in some ways. The very first line begins with "TYD2" as a sort of magic number for the file format, and then the checksum algorithm that's being used throughout the entire archive (SHA1 in this case), and then the SHA1 checksum of the entire archive itself (this is, the checksum of the entire file after the first line). This way the archive can easily self-validate.

Then we have the [header] section, with archive headers. Tyd 2.0 supports pluggable "file mangling algorithms" which will let you compress or encrypt the file contents in any way that you want. If an algorithm is being used, one of the headers will be "algorithm" and will name the algorithm being used; for example, "algorithm=CipherSaber". In this example, no algorithm is used.

Then there are [file:*] sections. There's one for each file in the archive. Each file contains a handful of attributes, like their creation and modification times, chmod permissions, file size and "archive size" (the size of the Base64-encoded data in the archive itself), and most importantly, an index number. This is how Archive::Tyd is able to "pluck" the file's data out of the [content] section.

In the [content] section lies the Base64-encoded data that belongs to each file mentioned in the file table, with one on each line. So, for the file whose index is "0", the very first line after the [content] section belongs to that file. The file with index "1" has the second line, and so on.

When an algorithm is used to mangle the file data, the data gets mangled before being encoded in Base64. So, for example, if you use CipherSaber to password-encrypt your file data, their data is encrypted and then encoded to Base64 (so you can't simply Base64-decode the data; you'd still have to decrypt the result with the CipherSaber password that was used).

You could just as well create a compression algorithm to use for this, for example something that uses Compress::Deflate7, but I'm not sure how useful compression will be considering the Tyd file format itself is kind of bloated (since it's ASCII based and not binary). But to each their own.

Anyway, this project is still in development. I plan to at least work out a way to get RSA encryption and signing to work before I release this module to CPAN. One idea I have for RSA signatures would be:

There would be support for custom blocks in the file, so you could create a [signature] block to hold the RSA signature, and a method to get the entire file table as a string. So, you could create your archive, get the file table out of it, cryptographically sign the table using your RSA private key, and then include the resulting signature directly inside the Tyd archive itself.

Also, I have yet to think up a way to support encrypting the file table itself, in cases where the file names and attributes are to be considered sensitive information as well.

Stay tuned.

You can check out the progress so far on GitHub: https://github.com/kirsle/Archive-Tyd.

Categories:

[ Blog ]

Comments

There are 0 comments on this page.

Add a Comment

Your name:
Your Email:
Message:
Comments can be formatted with Markdown, and you can use
emoticons in your comment.

If you can see this, don't touch the following fields.