Deconstructing MechAssault

This blog accompanies one of my personal projects: MGF Explorer. I started this project back in 2018 but have only gotten round to writing a blog for it now (in 2020), so most of the posts to begin with are based on development from over a year ago. Deconstructing MechAssault is a long term, ongoing project that I will chip away at when I have free time. If I figure out something significant or just feel like talking about it, I’ll post something here!

Deconstructing MechAssault – The MGF file

Like most games, MechAssault uses large resource archives to store all of the assets used for a certain level in the game. Generally speaking, these are catalog files which simply act as containers for several other files. Exploring MechAssault’s root directory, there are plenty of large files with the .mgf extension and file names which match the names for each “level” in the game: multiplayer maps, campaign missions, and levels used for in-game cinematics. There are a few others too: text.mgf, movies.mgf, and common.mgf. These MGF files store mostly the same data between them, as most assets are re-used throughout all levels in the game, though there are still some unique assets per level.

In this blog post, I will discuss how I deconstructed the MGF file format and reveal all the information I have so far. Please note that I don’t know everything about the format, so some information may be incomplete. This is an ongoing project so I will inevitably learn more about the format over time.

Decompressing the files

Initially, most of the MGF files (excluding text.mgf and movies.mgf) are compressed, making it impossible to actually read or decipher anything using a hex editor. Compression essentially works by rearranging the data to pack it much more efficiently and reduce disk space, which ends up scrambling the data we’re interested in. Fortunately, these files are compressed with zlib, a commonly used and free compression library. Because it’s so common, there exist decompression tools, which is where this journey starts.

To decompress the files, there’s a great command-line tool made by Luigi Auriemma called offzip. Luigi has a lot of great tools for these type of reverse engineering projects – if you’re interested in reverse engineering games yourself (especially older ones), you’ll probably find something of use on his website. The following command + arguments works best for this tool: offzip.exe -a -1 filename.mgfwhere -a extracts all the decompressed data, and -1 ensures only one file is output. Using the following batch script is a handy way of decompressing all the files at once:

for %%i in (*.mgf) do offzip.exe -a -1 %%i decompressed\%%i

With the files decompressed, investigation can begin!

Investigating the MGF file

Breaking down the file structure is essentially a long, ongoing task of inspecting the file with a hex editor and looking for patterns, testing values, and overall a lot of trial and error. Without any documentation, this task can be very hard, comparable to finding a needle in a haystack. Throughout this blog post, I will note some useful tips for those who are interested in getting started with projects like this one.

Overview of the MGF file structure

First and foremost, MGF files are binary files, which means all the data is encoded in raw binary format, not human-friendly text. If they were text files, this job would be an awful lot easier, but text files are very inefficient for computers – binary data can just be dumped in to memory and the CPU understands it, but text needs to be parsed, which can take a lot more time. Some files stored within the archive are text files, but I will explore these in later posts.

Another point worth mentioning is that data is stored in little-endian format. MechAssault was an original Xbox game, meaning it ran on Xbox hardware – an Intel x86 CPU. Intel x86 architecture reads data in little-endian format, which means no byte swapping is necessary. Read more about byte endian-ness here.

MGF files are composed of 5 main chunks:

  • Header – Stores offsets and lengths of the following sections
  • File entry table – Stores information about each file in the archive
  • Directory relationship table – Stores information about each folder in the archive, and which files belong in which folders
  • Strings – Null-terminated strings for file and folder names
  • File data – Stores all file data for every file in the archive
Diagram of an MGF file’s structure

Header

Binary files often start with a small chunk known as a header (a chunk is just a block of data for a specific purpose) which provides information about locations of data in the file and how said data may be formatted. This is usually in the form of offsets (pointers to locations in the file) and sizes (length in bytes), which help the program reading the file find the data it needs. Because binary files are not human-friendly, commonly used formats should be accompanied with some documentation that explains how and where data is stored. For example, the Microsoft WAVE soundfile format or Paul Bourke’s data formats. Because MechAssault uses a proprietary game engine, there is no public documentation on the MGF file format.

MGF files are no different – they start with a header that is always 64 bytes long. Below is a table describing the MGF file header:

OffsetLength and data typeDescription
04 bytes, char*MGF file signature – always reads mgf (including space at the end).
41 byte, charVersion – always 02 for MechAssault 1 archives, always 04 for MechAssault 2 archives.
51 byte, charUnknown – always 01
62 bytes, char*Unknown – always reads “ZZ”
84 bytes, intPadding – always 0
124 bytes, unsigned intNumber of files in the archive
164 bytes, unsigned intLength of file entry chunk in bytes
204 bytes, unsigned intOffset of file entry chunk (always 64 because the file entry chunk starts immediately after the header)
244 bytes, unsigned intNumber of directories (files and folders) in the archive
284 bytes, unsigned intLength of directory relationship chunk in bytes
324 bytes, unsigned intOffset of directory relationship chunk
364 bytes, unsigned intLength of directory strings chunk
404 bytes, unsigned intOffset of directory strings chunk
4420 bytesPadding – all zeros

File entries

Immediately following the header begins a list of 32-byte structures which provide information about each file in the archive. The number of entries in the list is stored in the header at offset 12. Using the information provided by the header, it is simple to calculate how long each structure is by simply dividing the length of the file entry chunk (offset 16 in the header) by the number of files. There is also a noticeable pattern in the data itself which repeats every 32 bytes. File entries are only 32 bytes long for MechAssault 2 archives – MechAssault 1 archives are 28 bytes long and the fields are rearranged.

Below is a table describing the file entry structure in a MechAssault 2 archive:

OffsetLength and data typeDescription
04 bytes, unsigned intFile index – increments for each file entry, though there are gaps
48 bytes, unsigned long long64-bit UUID
124 bytes, unsigned intFile length
164 bytes, unsigned intFile length (again)
204 bytes, time_tUNIX timestamp – last modified date
244 bytes, unsigned intOffset to the file’s data
284 bytesUnknown – probably padding, always 0x00F71200. Only in MA2 archives.

The most difficult piece of data to identify was the UUID – these values always appear very random and the only way to confirm that they were UUIDs was to compare these file entries across different MGF files, as many files are reused across different archives. Discovering that these 8 bytes were identical for file entries describing the exact same files, it can be determined that they are UUIDs.

Tips:

  • When looking for offsets, first of all make sure that the value is less than the total size of the file (seems obvious) and greater than the offset where the suspected value is stored (rarely are offsets going to point backwards in the file). Then, use your hex editors “go to” function to go to the offset. Inspect the data – is there a sudden change in the pattern? If there is, you’ve probably got an offset.
  • When looking for lengths stored near known offsets, simply add the suspected length value to the offset and use go to again – another change in the data’s pattern? You may have found the end of that block of data.
  • 4 byte integers are very common for simple data fields in binary files, even if the values stored in them never use all 4 of those bytes. This is probably because 4 byte integers fit snuggly in to most modern CPU registers. If you’re finding chunks of data scattered evenly with groups of 1, 2 or 3 0s in your hex editor, it’s probably a bunch of 4 byte integers.

Directory relationship table

After the file entries, the next chunk of data is a list of structures that describe the hierarchical directory structure of the archive, including definitions of folders and files, as well as indexes which point to each directory’s parent. There are also offsets which point to strings stored in the following chunk, identifying the directory names. These structures are 24 bytes long, composed of 6 32-bit integers. I know, “directory relationship table” isn’t a great name, but I can’t think of anything else.

Below is a table describing an entry in the directory relationship table:

OffsetLength and data typeDescription
04 bytes, intUnknown
44 bytes, intParent index
84 bytes, intUnknown
124 bytes, intUnknown
164 bytes, intOffset to file/folder name
204 bytes, intUnknown

This section remains the most mysterious part of the MGF file so far as I have yet to understand what some of the fields are. Earlier I mentioned there were gaps in the index fields of the file entries. It is here where these gaps are explained – the gaps occur because they are the indexes of folders, and the file entry table does not store folder entries. Each entry here stores the index of its parent. Because the parent index never refers to an index that can be found in the list of file entries, this is how I deduced that these were parent indices.

Although I do not understand some of the fields, most of them are either unique to files or folders. For example, the field at offset 20 is always -1 for files.

Tips

  • 0xFFFFFFFF is -1 in signed twos-complement form.

Strings

This section is very simple – it is just a large chunk full of null-terminated strings, referred to by the previous chunk. All strings in this section are the names of every file and folder in the archive. The first string is always “MGF ” and the second is always a backslash, which is the root directory of every archive.

File data

The final chunk of the MGF file format contains all of the actual file data for every file in the archive, pointed to by the file entries in the file entry table chunk. There are many types of files contained in the archives, identifiable with the extensions that can be found in the relevant file names. Plenty of them are plain text files, though there are still many binary files for assets such as textures, vertex buffers, level data, and so on.

Conclusion

With the MGF file format (mostly) deconstructed, it will now be much easier to inspect the individual files stored within them to better understand the game’s engine and how it uses these assets. In future blog posts, I will explore more of the files in depth and add the ability to preview the assets in MGF Explorer.