Tagged as: Archive::Tyd

Archive::Tyd 2.0
August 22, 2012 by Noah
A long time ago I wrote about my plans to rewrite my Archive::Tyd project. Well, I've finally gotten around to doing that recently, and I feel like talking a little about it here.

I decided to keep things simple, and went with a plain text, ASCII based file format for the Tyd archives. Originally I was planning to make it a binary file format, but I didn't wanna have to deal with C-style data types (which would probably end up imposing limits on me, for example a 32-bit number has a maximum value of about 4 billion, which caused problems with the FAT32 file system by limiting maximum file sizes to 4 GB). So, plain text keeps things much simpler.

First, I'll show you what an example Tyd2 archive looks like:


name=Untitled Archive



The archive resembles an INI file in some ways. The very first line begins with "TYD2" as a sort of magic number for the file format, and then the checksum algorithm that's being used throughout the entire archive (SHA1 in this case), and then the SHA1 checksum of the entire archive itself (this is, the checksum of the entire file after the first line). This way the archive can easily self-validate.

Then we have the [header] section, with archive headers. Tyd 2.0 supports pluggable "file mangling algorithms" which will let you compress or encrypt the file contents in any way that you want. If an algorithm is being used, one of the headers will be "algorithm" and will name the algorithm being used; for example, "algorithm=CipherSaber". In this example, no algorithm is used.

Then there are [file:*] sections. There's one for each file in the archive. Each file contains a handful of attributes, like their creation and modification times, chmod permissions, file size and "archive size" (the size of the Base64-encoded data in the archive itself), and most importantly, an index number. This is how Archive::Tyd is able to "pluck" the file's data out of the [content] section.

In the [content] section lies the Base64-encoded data that belongs to each file mentioned in the file table, with one on each line. So, for the file whose index is "0", the very first line after the [content] section belongs to that file. The file with index "1" has the second line, and so on.

When an algorithm is used to mangle the file data, the data gets mangled before being encoded in Base64. So, for example, if you use CipherSaber to password-encrypt your file data, their data is encrypted and then encoded to Base64 (so you can't simply Base64-decode the data; you'd still have to decrypt the result with the CipherSaber password that was used).

You could just as well create a compression algorithm to use for this, for example something that uses Compress::Deflate7, but I'm not sure how useful compression will be considering the Tyd file format itself is kind of bloated (since it's ASCII based and not binary). But to each their own.

Anyway, this project is still in development. I plan to at least work out a way to get RSA encryption and signing to work before I release this module to CPAN. One idea I have for RSA signatures would be:

There would be support for custom blocks in the file, so you could create a [signature] block to hold the RSA signature, and a method to get the entire file table as a string. So, you could create your archive, get the file table out of it, cryptographically sign the table using your RSA private key, and then include the resulting signature directly inside the Tyd archive itself.

Also, I have yet to think up a way to support encrypting the file table itself, in cases where the file names and attributes are to be considered sensitive information as well.

Stay tuned.

You can check out the progress so far on GitHub:

Tags: 0 comments | Permalink
July 21, 2009 by Noah
A while back when I was a little more serious than usual about actually programming a game, I put some time into programming my own archiving format. I called it Tyd (sounds like "tied"). It was pretty simple: basically the contents of multiple files are all thrown together into a single file, and then the whole archive is encrypted using CipherSaber, a symmetric key cipher algorithm.

The idea was that there could be a common archive file format (Tyd) that could be used by multiple applications or games, and each application would have its own password for its Tyd archive, that only the application and the developer knows. This would make it at least a little bit difficult for the app's end users to open up the archive and poke around at its contents. Compare this to Blizzard's MPQ archive format used by all their games, where users can easily open them up and get at their contents. With Tyd, they'd need to reverse engineer each application that uses a Tyd archive to open that app's archives.

You can see Archive::Tyd's CPAN page for more details.

This was limited though because, since the whole entire file was encrypted together, the application would need to load the whole archive into memory to be able to use it. So, while it was fine for small archives containing small files, a larger archive would consume too much memory. So forget about storing a lot of MP3s and MPEGs in a Tyd archive unless you're operating a supercomputer.

Also, there was no way to verify that a password to an archive was entered correctly, short of trying to decode it and see if you only get gibberish out of it.

So I started piecing together ideas for a successor to Tyd, which will still be called Tyd (version 2.0). The basic requirements are:

  • It has to support live reading (the calling app should be able to read a file directly out of the archive without extracting it to disk first). Archive::Zip doesn't handle this.
  • It has to have some protection of some sort, to ensure that the app that owns it has a much easier time reading it than the app's end users do.
  • It has to be able to handle very large archives. Archive::Zip, again, can't handle this.
So I typed up a spec sheet and began programming. The new Tyd archive will come with a complete set of headers (giving the archive a name and comment, and indicating whether the archive is encrypted, compressed, or raw, including some verification checks for decrypting, and a few other things like this). It will also contain a file table in the headers, so that a list of files can be pulled from the archive quickly, along with their attributes. And then the actual contents of the files follow all of that.

To facilitate "streaming", when the archive is encrypted or compressed, each file is only affected one block at a time. By default the block size is 512 bytes, so when a file is added to an encrypted archive, 512 bytes are encrypted at a time and separately. When reading the file back from the archive, one block at a time is read, decrypted, and returned to the caller (the block size after encryption is surely greater than 512 bytes; when compressed, less than 512 bytes).

For the actual encryption and compression algorithms I'll be using existing CPAN modules to implement known algorithms.

The new Archive::Tyd algorithm is intended to have basically these features:

  • Archive Headers:
    • A name and comment, author's name, generator's name
    • A flag to define whether the archive is compressed, encrypted, or neither (raw).
    • Block size for reading/writing the archive.
  • Encrypted Archives:
    • Supported Encryption Algorithms:
      • CipherSaber (symmetric key; used with old Archive::Tyd)
      • RSA encryption (public-key)
      • AES, DES, 3DES, RC2, RC4
      • Others?
    • If it uses a symmetric-key encryption such as CipherSaber, the MD5 or SHA1 sum of the correct password will be included to verify that the decryption will work. One catch -- the MD5 or SHA1 sum is, itself, encrypted using the same algorithm. So with CipherSaber, you give it the password, it attempts to decode the MD5/SHA1 sum with it, and then compares the sum with the original password you gave it. This way it can verify you have the right password while keeping the password relatively safe from cracking.
  • Compressed Archives:
    • Supported Compression Algorithms:
      • Deflate
      • LZF
      • GZip
With public-key encryption used on the archive, this would basically mean that only the application developer would be able to create or modify an archive that their application uses. Instead of a single password being used to both create and read archives (as with CipherSaber), a secret key is needed to create it, and a public key is used to decode it.

This way the application can be built to know the public key so it can read the archive, and any user who reverse engineers the application can only get the public key -- so they can get read-only access to the archive, but have a much harder time modifying it or changing its contents without the secret key. IIRC this would be similar to Blizzard's MPQ, in that the DLL that reads MPQ files for their games doesn't include the functions needed to write/modify MPQ files, giving the end users read-only access to the file's contents.

Anyway, no ETA yet, this is a big project. (Well, not really, the heavy lifting of encryption/compression is done by third party modules, all I need to do is program the wrapper code).

Tags: 0 comments | Permalink