February 13, 2017

Windows Won't Let My Program Crash

It's been known for a while that windows has a bad habit of eating your exceptions if you're inside a WinProc callback function. This behavior can cause all sorts of mayhem, like your program just vanishing into thin air without any error messages due to a stack overflow that terminated the program without actually throwing an exception. What I didn't realize is that it also eats assert(), which makes debugging hell, because the assertion would throw, the entire user callback would immediately terminate without any stack unwinding, and then windows would just... keep going, even though the program is now in a laughably corrupt state, because only half the function executed.

While trying to find a way to fix this, I discovered that there are no less than 4 different ways windows can choose to eat exceptions from your program. I had already told the kernel to stop eating my exceptions using the following code:
HMODULE kernel32 = LoadLibraryA("kernel32.dll");
  assert(kernel32 != 0);
  tGetPolicy pGetPolicy = (tGetPolicy)GetProcAddress(kernel32, "GetProcessUserModeExceptionPolicy");
  tSetPolicy pSetPolicy = (tSetPolicy)GetProcAddress(kernel32, "SetProcessUserModeExceptionPolicy");
  if(pGetPolicy && pSetPolicy && pGetPolicy(&dwFlags))
    pSetPolicy(dwFlags & ~EXCEPTION_SWALLOWING); // Turn off the filter 
However, despite this, COM itself was wrapping an entire try {} catch {} statement around my program, so I had to figure out how to turn that off, too. Apparently some genius at Microsoft decided the default behavior should be to just swallow exceptions whenever they were making COM, and now they can't change this default behavior because it'd break all the applications that now depend on COM eating their exceptions to run properly! So, I turned that off with this code:
CoInitialize(NULL); // do this first
if(SUCCEEDED(CoInitializeSecurity(NULL, -1, NULL, NULL, RPC_C_AUTHN_LEVEL_PKT_PRIVACY,
  RPC_C_IMP_LEVEL_IMPERSONATE, NULL, EOAC_DYNAMIC_CLOAKING, NULL)))
{
  IGlobalOptions *pGlobalOptions;
  hr = CoCreateInstance(CLSID_GlobalOptions, NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&pGlobalOptions));
  if(SUCCEEDED(hr))
  {
    hr = pGlobalOptions->Set(COMGLB_EXCEPTION_HANDLING, COMGLB_EXCEPTION_DONOT_HANDLE);
    pGlobalOptions->Release();
  }
}
There are two additional functions that could be swallowing exceptions in your program: _CrtSetReportHook2 and SetUnhandledExceptionFilter, but both of these are for SEH or C++ exceptions, and I was throwing an assertion, not an exception. I was actually able to verify, by replacing the assertion #define with my own version, that throwing an actual C++ exception did crash the program... but an assertion didn't. Specifically, an assertion calls abort(), which raises SIGABRT, which crashes any normal program. However, it turns out that Windows was eating the abort signal, along with every other signal I attempted to raise, which is a problem, because half the library is written in C, and C obviously can't raise C++ exceptions. The assertion failure even showed up in the output... but didn't crash the program!
Assertion failed!

Program: ...udio 2015\Projects\feathergui\bin\fgDirect2D_d.dll
File: fgEffectBase.cpp
Line: 20

Expression: sizeof(_constants) == sizeof(float)*(4*4 + 2)
No matter what I do, Windows refuses to let the assertion failure crash the program, or even trigger a breakpoint in the debugger. In fact, calling the __debugbreak() intrinsic, which outputs an int 3 CPU instruction, was completely ignored, as if it simply didn't exist. The only reliable way to actually crash the program without using C++ exceptions was to do something like divide by 0, or attempt to write to a null pointer, which triggers a segfault.

Any good developer should be using assertions to verify their assumptions, so having assertions silently fail and then corrupt the program is even worse than ignoring they exist! Now you could have an assertion in your code that's firing, terminating that callback, leaving your program in a broken state, and then the next message that's processed blows up for strange and bizarre reasons that make no sense because they're impossible.

I have a hard enough time getting my programs to work, I didn't think it'd be this hard to make them crash.

February 12, 2017

Owlboy And The Tragedy of Human Nature

Owlboy is a game developed by D-Pad studios over a protracted 9 year development cycle. Every aspect of the game is a work of art, meticulously pieced together with delicate care. While it has mostly gotten well-deserved positive reviews from critics, some people have voiced disappointment at the story arc and how the game was eventually resolved. Note: I'm about to talk about how the game ends, so an obligatory warning that there are massive spoilers ahead. If you haven't already played it, go buy it, right now.



On the pirate mothership, it is revealed that the titular "owlboy" is not referring to Otus, but is actually referring to the mysterious cloaked figure, who is actually Solus. Otus merely serves as a distraction so that Solus could steal the relics from the pirates. Once the heroes follow Solus up to Mesos and beat him into submission, it is revealed that Solus masterminded the events of the entire game, promising the pirates power in exchange for retrieving the relics, then using Otus as a distraction to steal the relics and use them to power the Anti-Hex to save the world.

Unfortunately, as the heroes immediately point out, this resulted in the destruction of Advent and the deaths of countless innocent people. Had Solus just asked for help, all of this could have been avoided, and the world could have been saved without incident. Solus admits that he felt he had no choice, as he simply had nobody he could trust. Our heroes offer to help finish the ritual, at which point Molstrom shows up and ruins everyone's day. Solus' methods have finally backfired on him, and he is now too injured to complete the ritual—but Otus isn't. In an act of desperation, he imbues Otus with the power of the artifacts, and Otus is able to complete the anti-hex in his stead, obliterating Molstrom in the process and saving the world.

A lot of people take issue with this ending for two reasons: One, it means almost everything you fought for in the game technically meant nothing, because you are actually working against Solus the entire time. Two, the entire thing could have been avoided if Solus had just trusted someone, anyone, instead of engineering a ridiculously convoluted plot to get what he needed through deceit and betrayal. Otus and Solus here represent a hero and an anti-hero. They are both fundamentally good people with flawed goals for opposite reasons. Solus is doing the right thing for the wrong reasons, whereas Otus is doing the wrong thing for the right reasons. Solus is doing the right thing, which is saving the world, but he achieves it by sacrificing thousands of innocent lives and betraying everyone. Otus is unknowingly dooming the world to destruction, but only because he and everyone else is operating on faulty information. He always chooses to do the right thing, and to trust others.

This is important, because the way the game ends is crucial to the narrative of the story and the underlying moral. Solus may have nearly saved the world, despite his questionable methods, but he would have failed at the end. Molstrom would have found him and easily defeated him, taking all the relics and then destroying whatever was left of the world as it rose into space. Only after Solus explained everything to Otus was Otus able to finally do the right thing for the right reasons, thanks to the friends he made on his journey holding Molstrom back. By doing this, Solus allows Otus to atone for failing to secure any of the relics, and Otus absolves Solus of the evil he had committed in the name of saving the world by annihilating Molstrom with the anti-hex.

The whole point of this story is that it doesn't matter how many times you fail, so long as you eventually succeed. Otus may have failed to save the world over and over and over, but at the very end, as the world is coming apart at the seams, he is finally able to succeed, and that's what matters. This moral hit me particularly hard because I instantly recognized what it represented - failing to ship a game over and over and over. Owlboy's story is an allegory for it's own development, and on a broader scale, any large creative project that has missed deadlines and is falling behind. It doesn't matter how many times you've failed, because as long as you keep trying, eventually you'll succeed, and that's what really matters. The game acknowledges that humans are deeply flawed creatures, but if we work together, we can counteract each other's flaws and build something greater than ourselves.

This is why I find it depressing that so many people object to Solus' behavior. Surely, no one would actually do that? However, at the beginning of the game, Solus' defining character moment is being bullied and abused by the other owls. His response to this abuse is to become withdrawn, trusting no one, determined to do whatever is necessary without relying on anyone else. This is exactly what happens to people who have been abused or betrayed and develop trust issues. These people will refuse help that they are offered, pushing other people away, often forcing an intervention to try and break through the emotional wall they have built to try and keep themselves from being hurt again. That's why you have to fight Solus first, because he's spent his whole life building a mental block to keep other people out, and Otus has to break through it to force him to finally accept help.

Far from being unbelievable, Owlboy's plot is entirely too real, and like any good work of art, it forces us to confront truths that make us uncomfortable. Owlboy forces us to confront the tragedy of human nature, and deal with the consequences. At the same time, it shows us that, if we persevere, we can overcome our flaws, and build a better world, together.

February 1, 2017

DirectX Is Terrifying

About two months ago, I got a new laptop and proceeded to load all my projects on it. Despite compiling everything fine, my graphics engine that used DirectX mysteriously crashed upon running. I immediately suspected either a configuration issue or a driver issue, but this seemed weird because my laptop had a newer graphics card than my desktop. Why was it crashing on newer hardware? Things got even more bizarre once I narrowed down the issue - it was in my shader assignment code, which hadn't been touched in almost 2 years. While I initially suspected a shader compilation issue, there was no such error in the logs. All the shaders compiled fine, and then... didn't work.

Now, if this error had also been happening on my desktop, I would have immediately started digging through my constant assignments, followed by the vertex buffers assigned to the shader, but again, all of this ran perfectly fine on my desktop. I was completely baffled as to why things weren't working properly. I had eliminated all possible errors I could think of that would have resulted from moving the project from my desktop to my laptop: none of the media files were missing, all the shaders compiled, all the relative paths were correct, I was using the exact same compiler as before with all the appropriate updates. I even updated drivers on both computers, but it stubbornly refused to work on the laptop while running fine on the desktop.

Then I found something that nearly made me shit my pants.
if(profile <= VERTEX_SHADER_5_0 && _lastVS != shader) {
  //...
} else if(profile <= PIXEL_SHADER_5_0 && _lastPS != shader) { 
  //...
} else if(profile <= GEOMETRY_SHADER_5_0 && _lastGS != shader) {
  //...
}
Like any sane graphics engine, I do some very simple caching by keeping track of the last shader I assigned and only setting the shader if it had actually changed. These if statements, however, have a very stupid but subtle bug that took me quite a while to catch. They're a standard range exclusion chain that figures out what type of shader a given shader version is. If it's less than say, 5, it's a vertex shader. Otherwise, if it's less than 10, that this means it's in the range 5-10 and is a pixel shader. Otherwise, if it's less than 15, it must be in the range 10-15, ad infinitum. The idea is that you don't need to check if the value is greater than 5 because the failure of the previous statement already implies that. However, adding that cache check on the end breaks all of this, because now you could be in the range 0-5, but the cache check could fail, throwing you down to the next statement checking to see if you're below 10. Because you're in the range 0-5, you're of course below 10, and the cache check will ALWAYS succeed, because no vertex shader would ever be in the pixel shader cache! All my vertex shaders were being sent in to directX as pixel shaders after their initial assignment!

For almost 2 years, I had been feeding DirectX total bullshit, and had even tested it on multiple other computers, and it had never given me a single warning, error, crash, or any indication whatsoever that my code was completely fucking broken, in either debug mode or release mode. Somehow, deep in the depths of nVidia's insane DirectX driver, it had managed to detect that I had just tried to assign a vertex shader to a pixel shader, and either ignored it completely, or silently fixed my catastrophic fuckup. However, my laptop had the mobile drivers, which for some reason did not include this failsafe, and so it actually crashed like it was supposed to.

While this was an incredibly stupid bug that I must have written while sleep deprived or drunk (which is impressive, because I don't actually drink), it was simply impossible for me to catch because it produced zero errors or warnings. As a result, this bug has the honor of being both the dumbest and the longest-living bug of mine, ever. I've checked every location I know of for any indication that anything was wrong, including hooking into the debug log of directX and dumping all it's messages. Nothing. Nada. Zilch. Zero.

I've heard stories about the insane bullshit nVidia's drivers do, but this is fucking terrifying.

Alas, there is more. I had been experimenting with direct2D as an alternative because, well, it's a 2D engine, right? After getting text rendering working, a change in string caching suddenly broke the entire program. It broke in a particularly bizarre way, because it seemed to just stop rendering halfway through the scene. It took almost an hour of debugging for me to finally confirm that the moment I was rendering a particular text string, the direct2D driver just stopped. No errors were thrown. No warnings could be found. Direct2D's failure state was apparently to simply make every single function call silently fail with no indication that it was failing in the first place. It didn't even tell me that the device was missing or that I needed to recreate it. The text render call was made and then every single subsequent call was ignored and the backbuffer was forever frozen to that half-drawn frame.

The error itself didn't seem to make any more sense, either. I was passing a perfectly valid string to Direct2D, but because that string originated in a different DLL, it apparently made Direct2D completely shit itself. Copying the string onto the stack, however, worked (which itself could only work if the original string was valid).

The cherry on top of all this is when I discovered that Direct2D's matrix rotation constructor takes degrees, not radians, like every single other mathematical function in the standard library. Even DirectX takes radians!

WHO WRITES THESE APIs?!