Scripting a Spelunky 2 Speedrun | Part 2: I'm (Not) a Robot

The first step in any tool-assisted speedrun is simulating inputs. But how do you even do that outside of an emulator?

Scripting a Spelunky 2 Speedrun | Part 2: I'm (Not) a Robot

To Library or Not to Library?

Before doing anything else I had to do some research: how was I going to simulate keyboard inputs and send them to the OS? I wanted to use Go for this, since I like the language, and it's also fairly easy to integrate with other low-level API's (if necessary). Initially I was looking at various libraries to handle this for me, but most of them were either made to be cross-platform (which I'm normally a big fan of, but since I only needed Windows support it mostly just complicated things), or they couldn't do quite what I wanted them to do.

Enter Win32

After a bunch of research and testing code with different libraries, I decided to have a look at how they work under the hood. That is where I came across a Win32 API: SendInput.

Docs for the Win32 SendInput function. Should be simple enough, right?

At first I wanted to stay away from using the Win32 API directly, since I expected it to be fairly difficult and error-prone. But I figured the least I could do was write some code to try it out. I had never integrated with DLLs using Go, and the Windows docs only talk about C++, so figuring that out would be the first step. Luckily I came across this Medium post which does exactly that! So with the help of that I tried to figure it out step by step. Let's walk though the process.

The first step is to import the DLL, and get the SendInput procedure from it:

import "syscall"
var (
	user32        = syscall.NewLazyDLL("user32.dll")
	sendInputProc = user32.NewProc("SendInput")
)
Loading the user32 DLL and SendInput process.

So far, so good. Now we have a process sendInputProc which we should be able to call. But with what? Looking back at the docs, the function has 3 parameters: cInputs, pInputs, and cbSize. These are (respectively) the number of structures in the pInputs array, an array of input structures, and the byte size of a single input structure. It also notes that if this size is incorrect, the function fails. Promising stuff.

Sending Keyboard Inputs

The input structures required for the SendInput function.

Looking at the input structures, there doesn't seem to be anything too weird. Other than some C++ syntax and types I was unfamiliar with, the fields seemed simple enough. Luckily the documentation explains the structs fairly well. First we have an INPUT struct which contains an input type (mouse, keyboard, or hardware), and the actual input data which corresponds to the type. Let's first focus on keyboard input, since that's the most important. A KEYBDINPUT struct contains a couple of fields to specify a (virtual) key, and whether you're pressing or releasing the key.

Recreating the structures in Go gave me this:

type input struct {
	inputType uint32
	ki        keyboardInput
}

type keyboardInput struct {
	wVk         uint16
	wScan       uint16
	dwFlags     uint32
	time        uint32
	dwExtraInfo uint64
}
The keyboard input structs recreated.

Seems good (spoiler: it's not). Now let's call the SendInput process, log the results, and see if it actually works. Making a small function to press the enter key (virtual key code 0x0D) resulted in this:

func sendInput() {
    var i input
    i.inputType = 1 // Keyboard
    i.ki.wVk = 0x0D // Enter

    ret, _, err := sendInputProc.Call(
        1,
        uintptr(unsafe.Pointer(&i)),
        unsafe.Sizeof(i),
    )
    log.Printf("ret: %v error: %v", ret, err)
}
Calling the SendInput process.

It should return 1 (the amount of inputs it processed), and of course no error. Aaand... ret: 0 error: The parameter is incorrect. Now that's a generic error message if I've ever seen one. It doesn't tell you which parameter is incorrect, or why. Luckily after a quick DuckDuckGo search I found a GitHub issue with some example code. Their code was very close to mine, with one small difference:

type input struct {
    inputType uint32
    ki        keyboardInput
    padding   uint64
}
A correct input struct for keyboard input, with 8 bytes of padding.

They added a padding field to the input struct! But why? Apparently the SendInput process expects the input struct to be exactly 40 bytes, and in my code it was only 32 bytes, so we need an extra 8 bytes (i.e. one uint64). This has to do with the way the C++ input structure works. Because it's a union of structs (MOUSEINPUT, KEYBDINPUT, and HARDWAREINPUT) it expects all of those to be the same size. Where the 40 bytes comes from I still have no clue though. I must've read through the SendInput docs a dozen times but I couldn't find it anywhere, so the example code definitely saved me!

Let's try calling SendInput again, this time with the correct struct. Now we get: ret: 1 error: Access is denied. Success! It processed the input (hence the 1), and a newline appeared in my editor. We still get an error though, but it turns out that's because calling a process from Go will always return the last syscall error. So we just need to check if the return value is 1, and if so we can ignore the error.

Sending Mouse Inputs

At this point I was really excited to have this working, and I figured it would also be cool to simulate mouse input. I could already see this evolving to do all sorts of automation tasks, although those pesky "I'm Not a Robot" checks might get in the way of that. Anyway, back to the docs:

The mouse input structure.

Again, this seemed fairly straightforward, and I created the following struct:

type mouseInput struct {
    dx          int64
    dy          int64
    mouseData   uint32
    dwFlags     uint32
    time        uint32
    dwExtraInfo uintptr
}
The mouse input struct recreated.

People who actually know C or C++ may have already spotted the mistake here, but I sure didn't, so let's continue first. At the time of writing, the Go language doesn't have a concept of unions or generics, so I also had to create a wrapping input struct specifically for mouse inputs:

type mInput struct {
    inputType uint32
    mi        mouseInput
}
The wrapping mouse input struct.

That should be all I need, so let's try sending some mouse input, for example to move the mouse horizontally and vertically:

func sendMouseInput() {
    var i mInput
    i.inputType = 0 // Mouse
    i.mi.dx = 200 // 200 pixels right
    i.mi.dy = 200 // 200 pixels down
    i.mi.dwFlags = 0x0001 // Move

    ret, _, err := sendInputProc.Call(
        1,
        uintptr(unsafe.Pointer(&i)),
        unsafe.Sizeof(i),
    )
    log.Printf("ret: %v error: %v", ret, err)
}
Calling the SendInput process with mouse input.

ret: 0 error: The parameter is incorrect. Uh oh, here we go again. This one took me a bit longer to figure out, but I'll spare you the details. Remember the Medium post I mentioned earlier? It also contains a really nice overview of which types in Go correspond to types in C:

type (
    BOOL          uint32
    BOOLEAN       byte
    BYTE          byte
    DWORD         uint32
    DWORD64       uint64
    HANDLE        uintptr
    HLOCAL        uintptr
    LARGE_INTEGER int64
    LONG          int32
    LPVOID        uintptr
    SIZE_T        uintptr
    UINT          uint32
    ULONG_PTR     uintptr
    ULONGLONG     uint64
    WORD          uint16
)
C types with their corresponding Go types.

This is where I learned a LONG in C is not the same as a long in Java or C# (i.e. a 64-bit integer). In C a LONG is 32 bits! My reaction was something along the lines of:

Apparently LONG is not long.

Anyway, I updated the struct, so that dx and dy are now an int32 instead of int64:

type mouseInput struct {
    dx          int32
    dy          int32
    mouseData   uint32
    dwFlags     uint32
    time        uint32
    dwExtraInfo uintptr
}
The correct mouse input struct.

And sure enough ret: 1 error: The operation completed successfully. Interesting error message, but it worked! My mouse moved 200 pixels to the right and down. For a moment I felt as though I had just discovered fire.

Satisfied with the results, I spent some time cleaning up the code, and created functions that allowed me to easily send input to the OS. If you want to have a look at (or use) the resulting code, you can check it out on GitHub.

Virtual Keys and Scan Codes

Now that the SendInput calls are working nicely, let's take a small step back and look at what we're actually trying to do: sending keystrokes to Spelunky 2. So far I had been testing keyboard input by sending virtual key inputs to my editor, but I didn't try it in game. I quickly wrote some code that waited a couple of seconds and pressed 0x43 (the "c" key, this throws a bomb in Spelunky). The initial wait time allowed me to run the code and then quickly switch my active window to the game. I ran it and... nothing happened. Maybe I need to press and release the key? Still nothing. Maybe I need to hold it for a bit to allow the game to detect it? I added a second wait between the press and release call, but to no avail. Running the code while my editor was active did put down a "c", so it was definitely doing something. Frustrated, I looked at all the options I had in the keyboard input struct, and then I saw it: wScan.

Testing the keyboard input was a source of many compile errors.

So far I had been sending virtual key codes using SendInput, but it also allows you to send scan codes. Virtual keys and scan codes both describe a key, but they do so in different ways, and programs can use one or the other. Games and game engines will generally use scan codes rather than virtual key codes to detect user input. Unfortunately for me, scan codes don't seem to have a default mapping from a character to a code in the same way that virtual key codes do. This meant I had to find out which scan codes Spelunky uses for which actions. This was easy enough to solve, I just set up some code to loop over the scan codes 0 to 255 (I assumed the keys I needed would be somewhere in that range) and send them to the game one by one. To make sure they would be registered, I sent a press, then waited for 100 milliseconds, and sent a release. Fairly quickly I started to see some action! First a rope, then a jump, whip and a bomb. By running a couple of different tests with slower timings I was quickly able to narrow down the scan codes used for all the actions:

enter      = 28
up         = 200
left       = 203
right      = 205
down       = 208
a (use)    = 30
d (rope)   = 32
z (jump)   = 44
x (attack) = 45
c (bomb)   = 46
The scan codes used by Spelunky 2.

Going back to my original test code, I modified it to press down, press c, wait a bit, and release both keys. Much to my satisfaction, this made my character duck and put down a bomb. I'd never been happier to see the game over screen!

The first successful test.

Next Up...

With the input simulation all working, it's time to look at how we're going to use this to actually script a full Spelunky 2 run. With the current setup it's still quite a bit of code to send a bunch of inputs, and ideally I'd like anyone who's interested in scripting runs to be able to do so (without programming knowledge). With that in mind, we're going to go through the process of designing and creating a custom scripting language for simulating inputs in the next post. See you there!