People talk a lot about up‑front vocals as if they’re the be-all and end-all of pop production, but in practice the lead vocal’s position in a mix’s depth perspective often changes through a song’s timeline. This production provides a particularly good illustration of this point if you observe the depth relationship between the lead vocal and the programmed clap/snare backbeat. To begin with, the vocal’s high-frequency enhancement, stereo widening and minimal reverb bring it substantially forward of the narrower and more reverberant clap. But halfway through the verse at 0:28 they move onto a more equal footing, as the vocal gains more reverb and the backbeat gains a brighter and more aggressive extra layer. The prechorus (0:42) continues the progression, with a thicker double-tracking effect appearing on the vocal and a harder snare layer joining the backbeat at 0:50, so that by the time the chorus hits at 0:56 the snare is well out in front of the vocal.
But why do all this? Well, the way I see it is that, as the vocal recedes, it supports an illusion that the backing track is increasing in power, relatively speaking, a fiction that’s supported by the increasing reverberance of the track as a whole. Then, when verse two suddenly reverts the perspective to bring the vocal to the front once more, that helps refresh the listener’s focus on the singer as she delivers a new set of lyrics. Or, to put it another way, films would be a lot less interesting if you only ever used one camera position and zoom setting, so why should music mixes be any different?