There's actually seven separate voice lines for each player line - One for each custom voice. Since (mostly) each line is different, different data is required. Each line has a full set of subtitles, lipsync, cooldowns, etc attached to the wav file with the actual voice line. This gives us flexibility for unique conversations based on player voice selection, like the walk-n-talk while disguised as Cyrus in SR3. We don't take advantage of that system as much as we could due to memory limitations (seven player voices eats up a huge chunk of disc space, each unique conversation requires a new variation for all lines).
As far as cutscenes/cinematics go, the subtitles are hard-baked into the localized strings and called from the cutscene table file. All voice lines in all cutscenes have the same content, the wav is triggered by a switch in Wwise that is determined by the player's voice selection. It's a bit convoluted but it works.
As far as he/she/it... yeah, the writers just try to avoid pronouns when referring to the player. Hence, lots of lines are written as "the Boss did something" instead of simply saying "he did something" - Despite the ability to have unique conversations, this also helps avoid confusion in cases where the player decides to have a female voice with a male body or what-have-you. Occasionally a player voice line will refer to the player character as the same gender as the voice, which is unfortunate but a bit hard to avoid.