A long time ago I wrote about my plans to rewrite my Archive::Tyd project. Well, I've finally gotten around to doing that recently, and I feel like talking a little about it here.
I decided to keep things simple, and went with a plain text, ASCII based file format for the Tyd archives. Originally I was planning to make it a binary file format, but I didn't wanna have to deal with C-style data types (which would probably end up imposing limits on me, for example a 32-bit number has a maximum value of about 4 billion, which caused problems with the FAT32 file system by limiting maximum file sizes to 4 GB). So, plain text keeps things much simpler.
First, I'll show you what an example Tyd2 archive looks like:
TYD2:SHA1:46698a6530d53ca7004719bcad5095efaa09420a [header] name=Untitled Archive packager=Archive::Tyd/2.00 [file:/file1.txt] asize=68 atime=1345597264 checksum=97503ea37402b56b429c5210e9cfcd843c38b486 chmod=33204 ctime=1345597264 index=0 mtime=1345597264 size=51 [file:/file3.txt] asize=32 atime=1345597264 checksum=37c9daa7930605795dbc00753e7d93c0da50b9e5 chmod=33204 ctime=1345597264 index=1 mtime=1345597264 size=24 [content] VGhpcyBpcyBmaWxlIG51bWJlciBvbmUuCgpJdCBpcyBhIHZlcnkgc21hbGwgZmlsZS4K VGhpcwppcwp0aGUKdGhpcmQKZmlsZS4KThe archive resembles an INI file in some ways. The very first line begins with "TYD2" as a sort of magic number for the file format, and then the checksum algorithm that's being used throughout the entire archive (SHA1 in this case), and then the SHA1 checksum of the entire archive itself (this is, the checksum of the entire file after the first line). This way the archive can easily self-validate.
Then we have the
[header] section, with archive headers. Tyd 2.0 supports pluggable "file mangling algorithms" which will let you compress or encrypt the file contents in any way that you want. If an algorithm is being used, one of the headers will be "algorithm" and will name the algorithm being used; for example, "
algorithm=CipherSaber". In this example, no algorithm is used.
Then there are
[file:*] sections. There's one for each file in the archive. Each file contains a handful of attributes, like their creation and modification times, chmod permissions, file size and "archive size" (the size of the Base64-encoded data in the archive itself), and most importantly, an index number. This is how Archive::Tyd is able to "pluck" the file's data out of the
[content] section lies the Base64-encoded data that belongs to each file mentioned in the file table, with one on each line. So, for the file whose index is "0", the very first line after the
[content] section belongs to that file. The file with index "1" has the second line, and so on.
When an algorithm is used to mangle the file data, the data gets mangled before being encoded in Base64. So, for example, if you use CipherSaber to password-encrypt your file data, their data is encrypted and then encoded to Base64 (so you can't simply Base64-decode the data; you'd still have to decrypt the result with the CipherSaber password that was used).
You could just as well create a compression algorithm to use for this, for example something that uses Compress::Deflate7, but I'm not sure how useful compression will be considering the Tyd file format itself is kind of bloated (since it's ASCII based and not binary). But to each their own.
Anyway, this project is still in development. I plan to at least work out a way to get RSA encryption and signing to work before I release this module to CPAN. One idea I have for RSA signatures would be:
There would be support for custom blocks in the file, so you could create a
[signature] block to hold the RSA signature, and a method to get the entire file table as a string. So, you could create your archive, get the file table out of it, cryptographically sign the table using your RSA private key, and then include the resulting signature directly inside the Tyd archive itself.
Also, I have yet to think up a way to support encrypting the file table itself, in cases where the file names and attributes are to be considered sensitive information as well.
You can check out the progress so far on GitHub: https://github.com/kirsle/Archive-Tyd.
A while back when I was a little more serious than usual about actually programming a game, I put some time into programming my own archiving format. I called it Tyd (sounds like "tied"). It was pretty simple: basically the contents of multiple files are all thrown together into a single file, and then the whole archive is encrypted using CipherSaber, a symmetric key cipher algorithm.
The idea was that there could be a common archive file format (Tyd) that could be used by multiple applications or games, and each application would have its own password for its Tyd archive, that only the application and the developer knows. This would make it at least a little bit difficult for the app's end users to open up the archive and poke around at its contents. Compare this to Blizzard's MPQ archive format used by all their games, where users can easily open them up and get at their contents. With Tyd, they'd need to reverse engineer each application that uses a Tyd archive to open that app's archives.
You can see Archive::Tyd's CPAN page for more details.
This was limited though because, since the whole entire file was encrypted together, the application would need to load the whole archive into memory to be able to use it. So, while it was fine for small archives containing small files, a larger archive would consume too much memory. So forget about storing a lot of MP3s and MPEGs in a Tyd archive unless you're operating a supercomputer.
Also, there was no way to verify that a password to an archive was entered correctly, short of trying to decode it and see if you only get gibberish out of it.
So I started piecing together ideas for a successor to Tyd, which will still be called Tyd (version 2.0). The basic requirements are:
To facilitate "streaming", when the archive is encrypted or compressed, each file is only affected one block at a time. By default the block size is 512 bytes, so when a file is added to an encrypted archive, 512 bytes are encrypted at a time and separately. When reading the file back from the archive, one block at a time is read, decrypted, and returned to the caller (the block size after encryption is surely greater than 512 bytes; when compressed, less than 512 bytes).
For the actual encryption and compression algorithms I'll be using existing CPAN modules to implement known algorithms.
The new Archive::Tyd algorithm is intended to have basically these features:
This way the application can be built to know the public key so it can read the archive, and any user who reverse engineers the application can only get the public key -- so they can get read-only access to the archive, but have a much harder time modifying it or changing its contents without the secret key. IIRC this would be similar to Blizzard's MPQ, in that the DLL that reads MPQ files for their games doesn't include the functions needed to write/modify MPQ files, giving the end users read-only access to the file's contents.
Anyway, no ETA yet, this is a big project. (Well, not really, the heavy lifting of encryption/compression is done by third party modules, all I need to do is program the wrapper code).