

The minimal solution to this is to add a roll component to your camera state. As a consequence, no matter how you implement the controls, you will find that in some orientations the camera rolls strangely, because the effect of trying to do the math with this information is that every frame the roll is picked/reconstructed based on the pitch and yaw. Two numbers can represent a look-direction vector but they cannot represent the third component of camera orientation, called roll (rotation about the “depth” axis of the screen). The problem is that two numbers, pitch and yaw, provide insufficient degrees of freedom to represent consistent free rotation behavior in space without any “horizon”.
