I haven't found a better way, my solution was to:
1. remove any AK listener components from all cameras in the editor.
2. programmatically, create a new game object with a new listener and have it follow/move with the active camera. I set this object to be "do not destroy"
3. When scene changes happen, I wrote code that iterates through any listeners in the scene, and disables all of them from being the active listener except the one I manually created. I.e. call SetDefaultListener(false).
4. In wwise unity global settings, you can turn off the "automatically attach listeners to cameras" setting.
Sorry, I wish there was a better way. As far as I can tell, the "pop" is actually when, for a split second, during a scene switch, two listeners are active (one for the old camera about to be deleted, one for the new camera being loaded). The two listeners seems to cause the audio to be rendered twice, resulting in a momentary volume doubling effect. That's what the pop is. The only way I found to prevent it is to never let the situation occur where there are more than 1 default listeners active.
It would be nice if wwise had a better way to deal with this, because it's a really tough problem to diagnose why it's happening. And you'd expect something as common as scene changes to be handled out of the box more gracefully.