Gang Garrison 2 Rust Rewrite

A map from Gang Garrison 2 with a wireframe overlay on all the surfaces.
Created:
  • gg2
  • gang garrison 2
  • blog
  • devlog
  • rust lang
  • rust language
  • rust
  • programming

Gang Garrison 2 (GG2) is a demake of the game Team Fortress 2 (TF2). It was originally developed as a entry for 2008 TIGSource "Bootleg Demake" jam but continued development through open-source.

I consider myself a die-hard TF2 player. With a grand total of 72 minutes in the game, I know a thing or two. In addition, I've never played GG2 properly. So you might ask why would I take notice of this game? Well just like everything else I've done in my life "I was bored."

One of my friends Alec is a much more real die-hard TF2/GG2 fan and I've heard my fair share about both games from him. Over a year ago, he started development on what would become his Python rewrite of the server. He did this largely because the game lacked a dedicated server. Unlike an integrated server that runs inside of a client, a dedicated server is an independent program that could be ran for instance in a terminal.

Some important context on GG2 is that it was built on GameMaker 8. This is a massively outdated game engine released in late 2009. GameMaker isn't regarded as the most portable game engine ever. It mostly is only good for Windows development.1 The single biggest reason why game can't be ported to platforms such as Linux is that it's stuck on this old game engine. This isn't just a portability concern either. GameMaker 8 isn't capable of running without graphics (i.e a dedicated server) and has performance issues with large amounts of players.

Which Game Engine Is Right

Your first thought might be Unity, Unreal, or even Godot, but each of these have incompatibilities. Unity—and Unreal frankly—suffers from the age ol' problem of capitalism. I frankly don't trust Unity in the long term to make healthy decisions for developers and gamers. So Godot wins by default, right? Well, no. I do love Godot, but it's meant for making Godot games and not GameMaker games. GG2 and GameMaker have a very specific way of doing networking, physics, and much more. It would be hard for any of these three game engines to act like a completely different one. We will want a much lower level game engine.

If the main big game engines don't work then why not make it from scratch. Since I fairly comfortable with Rust, I already knew that I would probably be using it. We could create a Rust game with Vulkan and have full control over the entire stack. Trouble is: this would be a huge under taking; there's a reason game engines exist. A balance between complete control and development time was apparent.

Being in the Rust ecosystem has made me aware—and even try out—Bevy the game engine. It's a very rustic approach to a game engine and features an entity component system (ECS). The important aspect of Bevy is that it offers a high level of modularity compared to most other game engines. Bevy at its heart is just an ECS, but offers many plugins that can be added to bring it up to a full blown game engine. With me having prior experience with it and it offering the right level of control, I felt that this more than warranted its use.

The Networking Stack

GG2's client is nothing without being able to talk to a server. We should start with the networking just to make sure any of this is fusible. Firstly though, who does the client and server even communicate and what do they even say? I read though both Alec's server and the official code base to answer this very question. Alec himself also helped me along in our frequent chats.

The underlying way the client and server talk is through a TCP WebSocket. To put it simply: it's a two way pipe for raw bytes to be sent back and forth. TCP isn't favored by games because of its slow speeds, but for what the game does it should be fine.2 Anyways, like many of the less favorable decisions I've been forced to make, GG2 only supports this method.

The server and client send packets of data that we'll call messages. A message is simply an 8-bit number defining the message kind and then a payload of dynamic length. We'll cover some examples later.

Bevy offers a high amount of modularity compared to other game engines. At its core, Bevy is really just an ECS. This means that textures, scheduling, physics, and more are all added with plugins instead of being included with the core of the engine. So I'll need to either find a WebSocket plugin for Bevy or make my own. Using an already made crate would be preferable.

Spicy Networking for Bevy looks like a perfect fit at first glance with it using TCP and meant for Bevy. After looking at it for a little longer, issues arise. Firstly, it hasn't been updated in 3 years. Secondly, it provides its own solution for determining message kinds. It's messages contain the length of the entire packet, a string of the message kind, and a payload. Even with the crate utilizing a different message structure, it's still a good starting place.

Spicy Networking makes a good blueprint for how to do networking in bevy. So we can start by reimplementing parts of it. First we'll need the network client; not to be confused the game client, it's where all the packets of data will be sent and received. The reason it has to be separate from the game client is since it's asynchronous and is always running in the background. This requires a runtime which Bevy doesn't provide. So whenever the client wants to connect to a sever, it whirs up the network client. This spawns a Tokio runtime which splits into two worker threads. One packages messages from Bevy to be sent off to the server and the other receives messages and passes them off to Bevy.

After frankensteining the code together to work both with a modern version of Bevy and with GG2's message format, we can start implementing some messages and send them off to the server. The best place to start is the player joining handshake since it's the first thing the server and client do.

Player Joining

The goal is to establish a network connection with a server and have a player join as a spectator.

A infographic showing the relation of each server and client message. It shows a client hello message is sent to the server. The server responds with it's own Hello. Then the client responds with a Reserve Slot. The server replies back with the same message. The client then sends a Player Join. The server now reponds with serveral messages: Change Map, Join Update, Player Change Class, Player Change Team, Full Update, Message String.

The handshake looks something like the above. Once connected to the socket, the client sends HELLO. The server responds with some basic information. The message kind get inserted/collected before all packets are serialized/deserialized. So only the payload needs to be implemented for each message. Right now we only need to worry about serializing client packets and deserializing server packets since this is just a client. There is the possibility of implementing the other half of the networking stack, but this is out of the projects scope for now.

Client -> Server:

struct ClientHello {
    pub protocol: Uuid,
}

impl GGMessage for ClientHello {
    const KIND: PacketKind = PacketKind::Hello;

    // 0x00, [128-bit protocol UUID]
    fn serialize(self, buffer: &mut Vec<u8>) -> Result<()> {
        let protocol_bytes = self.protocol.into_bytes();
        buffer.extend(protocol_bytes.iter());
        Ok(())
    }
}

Server -> Client:

struct ServerHello {
    pub server_name: String,
    pub map_name: String,
    pub map_md5: Option<u128>,
    pub plugins: Vec<()>,
}

impl GGMessage for ServerHello {
    const KIND: PacketKind = PacketKind::Hello;

    // 0x00,
    // [8-bit string length], [server name],
    // [8-bit string length], [map MD5 string],
    // [8-bit number of plugins],
    // [16-bit string length], [plugins string]
    fn deserialize<I: Iterator<Item = u8>>(payload: &mut I) -> Result<Self> {
        let server_name = payload.read_utf8_short_string()?;
        let map_name = payload.read_utf8_short_string()?;

        let map_md5 = payload.read_md5()?;

        let _plugins_amounts = payload.next().ok_or(Error::UnexpectedEOF)?;
        let _plugins_raw = payload.read_utf8_long_string()?;

        Ok(Self {
            server_name,
            map_name,
            map_md5,
            plugins: Vec::new(), // Plugins not implemented yet.
        })
    }
}

Once we get the server information from the HELLO, we send a RESERVE_SLOT. This just contains the player name. The server responds with an empty message of the same type. This tells the client that it's time to send an empty PLAYER_JOIN message.

At this point the player is connected to the server, but we don't know what's going on. The server sends back many packets from here on out. If we are just spectating, then there's no need to send anything else. The server will send back messages such as CHANGE_MAP, JOIN_UPDATE, PLAYER_CHANGE_CLASS, PLAYER_CHANGE_TEAM, FULL_UPDATE, and lastly MESSAGE STRING. These all tell the client the full current state of the server. After this, the server sends infrequent messages to keep the client state in sync with the server's. We will look at these later.

Map Loading

We're given a lot of information about what is going on with the server, but it's all useless without actually showing. This is why I decided to start focusing on map loading.

Maps in GG2 are images with extra metadata describing wall collisions and more. It's all stored within a PNG making maps very portable.3 Loading an image is very simple in Bevy, but the metadata no so much.

Bevy has assets, such as images and models, which can be loaded with the asset server. You can create custom assets and loaders to suit your needs. I opted to go this direction since it will plays nicely with the Bevy ecosystem. Essentially, Bevy gives a async buffer of the raw asset and it's your job to deserialize it.

A PNG is made up of a bunch of chunks that contain different information such as image data, timestamps, and more. The one we care about is zTXt. 4 It's described as storing compressed text. It has a byte the defines which compression method is used. The only one provided by the PNG spec so far is a zlib deflate datastream.

This is were I would say I found a crate that did all of this for me. Which yes there is one, but I noticed this only after implementing my own PNG parser. There is a chance that my is faster given that it's hyper-focused on only one thing, but that has yet to be tested. Let's just say it's 500% better in some metric so I don;t have to ever touch that thing again.

The first thing we need to do is confirm the file is even a PNG at all. This can be done by checking the file's signature.

b"\x89PNG\x0d\x0a\x1a\x0a"

These bytes at the beginning of the file signify that it's a PNG. Once we confirm these are present, we move on to the chunks.

A chunk is made up of a 32-bit length, 32-bit chunk type, chunk data, and a 32-bit cyclic redundancy check (CRC). The length is of the chunk data; the chunk type is a 4 character ASCII string; the chunk data holds… well… the data of the chunk; the CRC is a for error-detection and we'll get into that later.

Since we only care about the zTXt chunk, we can skip over any chunk with that type. This saves having to read the entire chunk which can be big. Once we stumble upon the chunk, we yank out the chunk data and parse it properly.

The data found in a zTXt chunk is a keyword (null-terminated string of text), a byte determining the compression method, and lastly the compressed text. GG2 uses a hardcoded keyword which is Gang Garrison 2 Level Data. So we check for the keyword and then deal with the text.

The compressed text is deflated and stored in a zlib datastream. This would be very simple if it wasn't for me not reading the full paragraph about the zlib part. I just saw the deflate part and started writing away. After literal hours of trouble shooting, I switch to a different function provided by the same crate I was already using and it worked magically. Please read documentation fully if you don't want to end up like me.

After decompressing the text with the right algorithm, it's time to validate the CRC. This is a check to confirm none of that data has been altered. While we probably could get away without checking this, the crate I'm already using makes it very easy to calculate the value. Once validated, it's time to move onto understanding the text we just decompressed.

Map Entity Data

Let's print out the map data we got from the PNG and see what it looks like.

A large block of map data printed out in a terminal.

The data contains multiple lines of text. Starting with {ENTITIES}, then with a JSON-like format, another tag {END ENTITIES}, {WALKMASK}, 2 numbers, kilobytes of "random" text, and lastly a {END WALKMASK}. It's fairly safe to assume that the entities will always come before the walk mask since that's how GG2 does it, but we can do better. Unlike possible python-based servers, we can deal with entities and walk mask in which ever order the map provides them.

First let's tackle the entity data. This is a big list entities which describe points-of-interest around the map such as control points, spawn rooms, and more. This JSON-like format is actually called Gang Garrison Object Notation (GGON). As the name implies, it's custom built and there isn't already a crate to parse it. This is the part where we do some trickery.

We are currently dealing with data that is formatted something like:

[{type:number,number:1.23},{type:string,text:hi there}]

Months back, Alec and I were searching for a library/format to parse GGON with only minimal changes to the data. He ended up finding Hjson which will serve us nicely. Here is the same data in Hjson.

[
    {
        type: number
        number: 1.23
    }
    {
        type: string
        text: hi there
    }
]

The important differences to note are the new lines after every field and no commas. This can be fixed by replacing all , with a new line (\n) and all } with \n}.

[{type:number
number:1.23
}
{type:string,
text:hi there
}]

While this is different from the first example, the deserializer doesn't complain and loads the data just fine. So it's safe to continue.

The crate I picked is serde-based. This means that it gets all the advanced derive macros that serde offers. If we wanted to create type-safe Rust type capable of (de)serializing the example above, it would look something like:

// `PartialEq` and `Eq` are just for the `assert_eq`
#[derive(Deserialize, Serialize, PartialEq, Eq)]
#[serde(tag = "type", rename_all = "snake_case")]
enum MapEntity {
    Number {
        number: f32,
    },
    String {
        text: String,
    },
}

fn main() {
    let ggon = "[{type:number,number:1.23},{type:string,text:hi there}]";

    // Convert the GGON to Hjson
    let hjson = ggon
        .replace(',', "\n")
        .replace('}', "\n}");

    let expected = vec![
        MapEntity::Number { number: 1.23 },
        MapEntity::String { text: "hi there".to_string() },
    ];

    let parsed = serde_hjson::from_str::<Vec<MapEntity>>(&hjson).unwrap();

    // These should be the exact same
    assert_eq!(expected, parsed);
}

What I like about serde is how data driven it is. Instead of writing the logic to read and write the data, you just describe the data and the rest is done through macros. Now all we need to do is describe every possible entity. Luckily, I found every single one inside GG2's GitHub repo.

An entity is stored as a type, position, and then some additional option fields. We can use composition to mix and match these optional fields. For instance everything has an x and y position, but only some store the scale. We can combine these to make a transform.

#[derive(Debug, Deserialize)]
struct Position {
    pub x: u32,
    pub y: u32,
}

#[derive(Debug, Deserialize)]
struct Scale {
    #[serde(rename = "xscale")]
    pub x_scale: f32,
    #[serde(rename = "yscale")]
    pub y_scale: f32,
}

#[derive(Debug, Deserialize)]
pub struct Transform {
    #[serde(flatten)]
    pub position: Position,
    #[serde(flatten)]
    pub scale: Scale,
}

#[derive(Debug, Deserialize)]
#[serde(tag = "type")]
struct MapEntity {
    #[serde(rename = "spawnroom")]
    SpawnRoom(Transform),
    #[serde(rename = "redspawn")]
    RedSpawn(Position),
    /* ... */
};

You may notice that some of the fields such as position have a flatten decorator above them. This is tells serde to copy all the fields from inside the struct into the parent. If you didn't add the decorator to the position and scale, it would look something like:

{
    type: spawnroom
    position: {
        x: 0
        y: 0
    }
    scale: {
        x_scale: 0.0
        y_scale: 0.0
    }
}

Compared to if you flatten them:

{
    type: spawnroom
    x: 0
    y: 0
    x_scale: 0.0
    y_scale: 0.0
}

We are now able to deserialize the entirety of the entities. But that's only a fraction of the map data. The walk mask is still left.

Map Walk Mask

Both the server and client need to know the bounds of the level. The server only sends over the position of the players every 7th frame. This means that for the rest of the time the client needs to apply physics to all players to approximate their position. This requires knowing where all the floors and walls are on the map.

In the official game, it looks up from a big table of boolean values that state whether each pixel is collidable. This takes up somewhere in the range of tens of kilobytes and requires multiple lookups to determine if the player is colliding at all. The problem arises from how this is a non-standard way to implement collisions, meaning we can't use a standard physics engine, right?

A large block of map data printed out in a terminal.

Before I get ahead of myself, we first need to actually parse the walk mask from the map data. Similar to the entities, we need to read the data in between the {WALKMASK} and {END WALKMASK}. The first two numbers are the height and width and then the last very long line of data is the actual walk mask.

The GG2 developers had an interesting problem to solve: how do you store a large amount of binary data as text? For simplicity sake, I would argue for base64. Each byte would store 8 tiles (a bit per tile). This is similar to their solution, but they went with base64 shifted so the space character is zero (aka. air). Every character represents 6-bits of the final data; there is no padding add to the end too. This isn't too troublesome to read through and convert into a list of boolean values.

Physics engines work with shape colliders—such as cuboids and triangles—that are at specific points in the world.

A Gang Garrison 2 map with each tile of wall covered with a triangulated quad.

We can take the tiles and treat each one as a quad. In turn, these each can be made into two triangles. The physics engine understands triangles. Now there are some huge performance issues with this approach. Enabling the debug render brings the game to it's knees as it tries to deal with 87,420 quads or 174,840 triangles! A better solution for generating the quads is needed.

What if we instead converted the tiles into bigger sections using a greedy mesher. This would combine many tiles into one quad.

A group of tiles representing walls and air. The wall tiles are each separated by a border.A group of tiles representing walls and air. The wall tiles are now joined together in as few number of rectangles.

The goal is to create an algorithm that turns the first image into the second image. This will give us a much smaller number of colliders to work with. My first approach was to generate all the quads like the first try, but slowly combine them. I kind of followed a guide, but frankly I didn't understand it very well and opted for a rewrite. Anyways here's what that first attempt spat out:

A Gang Garrison 2 map with each tile of wall covered with triangulated quads that are now combined horizontally.

This get's us down to 1060 quads. While this is a huge improvement, it's clear that it's missing many areas that can be combined. For instance in the top left, the quad could consume multiple rows down.

Since I didn't really understand that devlog I was following, I instead opted for my own solution. We don't care about speed; lowering the quad count was all that mattered. A simple but accurate mesher is the goal. This would start with a big list of boolean values and create quads from them. With each quad it generates, it sets the tile to false. This means it can only see tiles that haven't already been added to our mesh. It works from top left to bottom right until every tile is consumed.

A Gang Garrison 2 map with each tile of the wall covered with triangulated quads that now cover much larger areas.

Now we are down to 701 quads. It's no longer missing obvious optimizations, but it's not perfect. You can see though how some quads are broken up by others. This is most prominent with the hills. The left and right sides of the hills are broken up by the middle quad. To fix this we can only expand a quad down if there isn't any tiles occupying the space on the sides of the row below it.

A Gang Garrison 2 map with each tile of the wall covered with triangulated quads that now use the least number of quads to fill the area.

With the last optimization, we get down to 562 quads; a whopping 156x improvement since the first iteration.

Putting It All Together

We have the positions of the players, map data, and a physics engine. So we can now start trying to bring them together. Currently the physics engine isn't in parity with the server so for now the player won't have physics.

The Bevy client is on the left. It's spectating Alec's Python server. The official client is on the right.

Footnotes

  1. MacOS is also apparently supported but GG2 doesn't currently support it.
  2. UDP is much faster and would be better suited for video games. It can result in more complicated networking though.
  3. Some websites strip this data when you upload an image, so it's not perfect.
  4. https://www.w3.org/TR/png-3/#11zTXt