The tests should be validating the public surface area. While now and then there may be things that can only be validated with internals, a lot of these being necessary is an indication that the wrong surface area is exposed. In many cases, it's expected there should be a way to exercise the functionality via public API, even if requiring mocking or the like.