Machine unlearning removes designated concepts or memorized data from
pre-trained models. Recent work has made strong progress on speaker
identity unlearning in zero-shot text-to-speech (ZS-TTS), but quietly
assumes all unlearning requests arrive at once — an unrealistic
assumption, since privacy-motivated removals arrive sequentially over time.
We show this assumption breaks state-of-the-art methods: unlearning each
new speaker fully revives previously unlearned speakers, reintroducing
the very privacy risk unlearning was meant to eliminate. We present
CORTIS (Cumulative
ORThogonal Identity
Suppression), the first framework for continual speaker
identity unlearning in ZS-TTS that requires no access to previously
unlearned speaker data. CORTIS combines Fisher-information-based
parameter masking with orthogonal projection against subspaces spanned
by prior unlearning updates. On VoiceBox, CORTIS unlearns each requested
speaker while keeping previously unlearned speakers forgotten across
long request sequences, substantially outperforming sequential
application of prior methods.