Like most games, MechAssault uses large resource archives to store all of the assets used for a certain level in the game. Generally speaking, these are catalog files which simply act as containers for several other files. Exploring MechAssault’s root directory, there are plenty of large files with the .mgf extension and file names which match the names for each “level” in the game: multiplayer maps, campaign missions, and levels used for in-game cinematics. There are a few others too: text.mgf, movies.mgf, and common.mgf. These MGF files store mostly the same data between them, as most assets are re-used throughout all levels in the game, though there are still some unique assets per level.
In this blog post, I will discuss how I deconstructed the MGF file format and reveal all the information I have so far. Please note that I don’t know everything about the format, so some information may be incomplete. This is an ongoing project so I will inevitably learn more about the format over time.
Decompressing the files
Initially, most of the MGF files (excluding text.mgf and movies.mgf) are compressed, making it impossible to actually read or decipher anything using a hex editor. Compression essentially works by rearranging the data to pack it much more efficiently and reduce disk space, which ends up scrambling the data we’re interested in. Fortunately, these files are compressed with zlib, a commonly used and free compression library. Because it’s so common, there exist decompression tools, which is where this journey starts.
To decompress the files, there’s a great command-line tool made by Luigi Auriemma called offzip. Luigi has a lot of great tools for these type of reverse engineering projects – if you’re interested in reverse engineering games yourself (especially older ones), you’ll probably find something of use on his website. The following command + arguments works best for this tool:
offzip.exe -a -1 filename.mgfwhere
-a extracts all the decompressed data, and
-1 ensures only one file is output. Using the following batch script is a handy way of decompressing all the files at once:
for %%i in (*.mgf) do offzip.exe -a -1 %%i decompressed\%%i
With the files decompressed, investigation can begin!
Investigating the MGF file
Breaking down the file structure is essentially a long, ongoing task of inspecting the file with a hex editor and looking for patterns, testing values, and overall a lot of trial and error. Without any documentation, this task can be very hard, comparable to finding a needle in a haystack. Throughout this blog post, I will note some useful tips for those who are interested in getting started with projects like this one.
Overview of the MGF file structure
First and foremost, MGF files are binary files, which means all the data is encoded in raw binary format, not human-friendly text. If they were text files, this job would be an awful lot easier, but text files are very inefficient for computers – binary data can just be dumped in to memory and the CPU understands it, but text needs to be parsed, which can take a lot more time. Some files stored within the archive are text files, but I will explore these in later posts.
Another point worth mentioning is that data is stored in little-endian format. MechAssault was an original Xbox game, meaning it ran on Xbox hardware – an Intel x86 CPU. Intel x86 architecture reads data in little-endian format, which means no byte swapping is necessary. Read more about byte endian-ness here.
MGF files are composed of 5 main chunks:
- Header – Stores offsets and lengths of the following sections
- File entry table – Stores information about each file in the archive
- Directory relationship table – Stores information about each folder in the archive, and which files belong in which folders
- Strings – Null-terminated strings for file and folder names
- File data – Stores all file data for every file in the archive
Binary files often start with a small chunk known as a header (a chunk is just a block of data for a specific purpose) which provides information about locations of data in the file and how said data may be formatted. This is usually in the form of offsets (pointers to locations in the file) and sizes (length in bytes), which help the program reading the file find the data it needs. Because binary files are not human-friendly, commonly used formats should be accompanied with some documentation that explains how and where data is stored. For example, the Microsoft WAVE soundfile format or Paul Bourke’s data formats. Because MechAssault uses a proprietary game engine, there is no public documentation on the MGF file format.
MGF files are no different – they start with a header that is always 64 bytes long. Below is a table describing the MGF file header:
|Offset||Length and data type||Description|
|0||4 bytes, char*||MGF file signature – always reads |
|4||1 byte, char||Version – always 02 for MechAssault 1 archives, always 04 for MechAssault 2 archives.|
|5||1 byte, char||Unknown – always 01|
|6||2 bytes, char*||Unknown – always reads “ZZ”|
|8||4 bytes, int||Padding – always 0|
|12||4 bytes, unsigned int||Number of files in the archive|
|16||4 bytes, unsigned int||Length of file entry chunk in bytes|
|20||4 bytes, unsigned int||Offset of file entry chunk (always 64 because the file entry chunk starts immediately after the header)|
|24||4 bytes, unsigned int||Number of directories (files and folders) in the archive|
|28||4 bytes, unsigned int||Length of directory relationship chunk in bytes|
|32||4 bytes, unsigned int||Offset of directory relationship chunk|
|36||4 bytes, unsigned int||Length of directory strings chunk|
|40||4 bytes, unsigned int||Offset of directory strings chunk|
|44||20 bytes||Padding – all zeros|
Immediately following the header begins a list of 32-byte structures which provide information about each file in the archive. The number of entries in the list is stored in the header at offset 12. Using the information provided by the header, it is simple to calculate how long each structure is by simply dividing the length of the file entry chunk (offset 16 in the header) by the number of files. There is also a noticeable pattern in the data itself which repeats every 32 bytes. File entries are only 32 bytes long for MechAssault 2 archives – MechAssault 1 archives are 28 bytes long and the fields are rearranged.
Below is a table describing the file entry structure in a MechAssault 2 archive:
|Offset||Length and data type||Description|
|0||4 bytes, unsigned int||File index – increments for each file entry, though there are gaps|
|4||8 bytes, unsigned long long||64-bit UUID|
|12||4 bytes, unsigned int||File length|
|16||4 bytes, unsigned int||File length (again)|
|20||4 bytes, time_t||UNIX timestamp – last modified date|
|24||4 bytes, unsigned int||Offset to the file’s data|
|28||4 bytes||Unknown – probably padding, always 0x00F71200. Only in MA2 archives.|
The most difficult piece of data to identify was the UUID – these values always appear very random and the only way to confirm that they were UUIDs was to compare these file entries across different MGF files, as many files are reused across different archives. Discovering that these 8 bytes were identical for file entries describing the exact same files, it can be determined that they are UUIDs.
- When looking for offsets, first of all make sure that the value is less than the total size of the file (seems obvious) and greater than the offset where the suspected value is stored (rarely are offsets going to point backwards in the file). Then, use your hex editors “go to” function to go to the offset. Inspect the data – is there a sudden change in the pattern? If there is, you’ve probably got an offset.
- When looking for lengths stored near known offsets, simply add the suspected length value to the offset and use go to again – another change in the data’s pattern? You may have found the end of that block of data.
- 4 byte integers are very common for simple data fields in binary files, even if the values stored in them never use all 4 of those bytes. This is probably because 4 byte integers fit snuggly in to most modern CPU registers. If you’re finding chunks of data scattered evenly with groups of 1, 2 or 3 0s in your hex editor, it’s probably a bunch of 4 byte integers.
Directory relationship table
After the file entries, the next chunk of data is a list of structures that describe the hierarchical directory structure of the archive, including definitions of folders and files, as well as indexes which point to each directory’s parent. There are also offsets which point to strings stored in the following chunk, identifying the directory names. These structures are 24 bytes long, composed of 6 32-bit integers. I know, “directory relationship table” isn’t a great name, but I can’t think of anything else.
Below is a table describing an entry in the directory relationship table:
|Offset||Length and data type||Description|
|0||4 bytes, int||Unknown|
|4||4 bytes, int||Parent index|
|8||4 bytes, int||Unknown|
|12||4 bytes, int||Unknown|
|16||4 bytes, int||Offset to file/folder name|
|20||4 bytes, int||Unknown|
This section remains the most mysterious part of the MGF file so far as I have yet to understand what some of the fields are. Earlier I mentioned there were gaps in the index fields of the file entries. It is here where these gaps are explained – the gaps occur because they are the indexes of folders, and the file entry table does not store folder entries. Each entry here stores the index of its parent. Because the parent index never refers to an index that can be found in the list of file entries, this is how I deduced that these were parent indices.
Although I do not understand some of the fields, most of them are either unique to files or folders. For example, the field at offset 20 is always -1 for files.
- 0xFFFFFFFF is -1 in signed twos-complement form.
This section is very simple – it is just a large chunk full of null-terminated strings, referred to by the previous chunk. All strings in this section are the names of every file and folder in the archive. The first string is always “MGF ” and the second is always a backslash, which is the root directory of every archive.
The final chunk of the MGF file format contains all of the actual file data for every file in the archive, pointed to by the file entries in the file entry table chunk. There are many types of files contained in the archives, identifiable with the extensions that can be found in the relevant file names. Plenty of them are plain text files, though there are still many binary files for assets such as textures, vertex buffers, level data, and so on.
With the MGF file format (mostly) deconstructed, it will now be much easier to inspect the individual files stored within them to better understand the game’s engine and how it uses these assets. In future blog posts, I will explore more of the files in depth and add the ability to preview the assets in MGF Explorer.