Abstract

Machine unlearning removes designated concepts or memorized data from pre-trained models. Recent work has made strong progress on speaker identity unlearning in zero-shot text-to-speech (ZS-TTS), but quietly assumes all unlearning requests arrive at once — an unrealistic assumption, since privacy-motivated removals arrive sequentially over time. We show this assumption breaks state-of-the-art methods: unlearning each new speaker fully revives previously unlearned speakers, reintroducing the very privacy risk unlearning was meant to eliminate. We present CORTIS (Cumulative ORThogonal Identity Suppression), the first framework for continual speaker identity unlearning in ZS-TTS that requires no access to previously unlearned speaker data. CORTIS combines Fisher-information-based parameter masking with orthogonal projection against subspaces spanned by prior unlearning updates. On VoiceBox, CORTIS unlearns each requested speaker while keeping previously unlearned speakers forgotten across long request sequences, substantially outperforming sequential application of prior methods.

Audio Demonstrations

Audio samples comparing CORTIS against baselines on forgotten and remained speakers across sequential unlearning requests.

Continual Unlearning: Forget Speakers

Each step below shows the model state after that unlearning request. Every speaker unlearned up to that point is listed, so you can verify that CORTIS maintains suppression of earlier speakers as new requests arrive. The row highlighted in green is the speaker being unlearned at that step; rows above it are prior forget speakers.

After Step 1 — Speaker 1 unlearned

Speaker	Target Text	*Audio Prompt	*Sample to Forget	CORTIS (Ours)
Spk 1 (current)	The window was open as she looked out.

*Audio prompt and ground-truth sample are held-out utterances not seen during training or unlearning.

After Step 2 — Speaker 2 unlearned

Speaker	Target Text	*Audio Prompt	*Sample to Forget	CORTIS (Ours)
Spk 1 (prior)	The window was open as she looked out.
Spk 2 (current)	"And then, besides" she frowned and dropped her voice till it was only just audible. "this horrid man hadn't made our Julie so-so conspicuous, and Lady Henry hadn't turned out such a toad-and, altogether, Jacob, I'm dreadfully worried."

*Audio prompt and ground-truth sample are held-out utterances not seen during training or unlearning.

After Step 3 — Speaker 3 unlearned

Speaker	Target Text	*Audio Prompt	*Sample to Forget	CORTIS (Ours)
Spk 1 (prior)	The window was open as she looked out.
Spk 2 (prior)	"And then, besides" she frowned and dropped her voice till it was only just audible. "this horrid man hadn't made our Julie so-so conspicuous, and Lady Henry hadn't turned out such a toad-and, altogether, Jacob, I'm dreadfully worried."
Spk 3 (current)	These round knobs were not ornamental but symbolic. They were expressive and puzzling, striking and disturbing—food for thought and also for vultures if there had been any looking down from the sky; but at all events for such ants as were industrious enough to ascend the pole. They would have been even more impressive, those heads on the stakes, if their faces had not been turned to the house.

*Audio prompt and ground-truth sample are held-out utterances not seen during training or unlearning.

After Step 4 — Speaker 4 unlearned

Speaker	Target Text	*Audio Prompt	*Sample to Forget	CORTIS (Ours)
Spk 1 (prior)	The window was open as she looked out.
Spk 2 (prior)	"And then, besides" she frowned and dropped her voice till it was only just audible. "this horrid man hadn't made our Julie so-so conspicuous, and Lady Henry hadn't turned out such a toad-and, altogether, Jacob, I'm dreadfully worried."
Spk 3 (prior)	These round knobs were not ornamental but symbolic. They were expressive and puzzling, striking and disturbing—food for thought and also for vultures if there had been any looking down from the sky; but at all events for such ants as were industrious enough to ascend the pole. They would have been even more impressive, those heads on the stakes, if their faces had not been turned to the house.
Spk 4 (current)	But curiosity was too strong for them. Each wanted to see where his particular bomb hit, and how much Earth it would tear up. The bombs made only small scars in the Earth, but they sent fragments of steel casing flying all directions and several men were cut about the face by splinters.

*Audio prompt and ground-truth sample are held-out utterances not seen during training or unlearning.

After Step 5 — Speaker 5 unlearned

Speaker	Target Text	*Audio Prompt	*Sample to Forget	CORTIS (Ours)
Spk 1 (prior)	The window was open as she looked out.
Spk 2 (prior)	"And then, besides" she frowned and dropped her voice till it was only just audible. "this horrid man hadn't made our Julie so-so conspicuous, and Lady Henry hadn't turned out such a toad-and, altogether, Jacob, I'm dreadfully worried."
Spk 3 (prior)	These round knobs were not ornamental but symbolic. They were expressive and puzzling, striking and disturbing—food for thought and also for vultures if there had been any looking down from the sky; but at all events for such ants as were industrious enough to ascend the pole. They would have been even more impressive, those heads on the stakes, if their faces had not been turned to the house.
Spk 4 (prior)	But curiosity was too strong for them. Each wanted to see where his particular bomb hit, and how much Earth it would tear up. The bombs made only small scars in the Earth, but they sent fragments of steel casing flying all directions and several men were cut about the face by splinters.
Spk 5 (current)	Not feeling quite sure however of the fidelity of the nurse's memory, Edwin then went to the station and made inquiries there. But an application to the Lost Luggage Office, no such parcel had been deposited there. The reader may perhaps be surprised at this. As it is well known that every train is searched by the porters on its arrival at terminus and all forgotten articles are conveyed once to the lost luggage office.

*Audio prompt and ground-truth sample are held-out utterances not seen during training or unlearning.

Our Methods on Remain Speakers

While our methods effectively prevent synthesis of Forget Speakers voices, it succeeds to retain the Zero-Shot performance for all other Remain Speakers. Here, the Remain Speakers are unseen voices from LibriSpeech tested in Zero-Shot setting.

After Step 1 — Speaker 1 unlearned

Remain Speaker	Target Text	*Audio Prompt	*Sample to Remain	CORTIS (Ours)
Remain Spk 1	We will go out together to the bower. There is a way down to the court from my window.
Remain Spk 2	However that was over now. The tree gone, the story at an end.
Remain Spk 3	Nonsense, of course I can't really sing. Except the way my mother and grandmother did before me.

*An actual audio file spoken by Remain Speaker, but not seen during training or unlearning process.

After Step 2 — Speaker 2 unlearned

Remain Speaker	Target Text	*Audio Prompt	*Sample to Remain	CORTIS (Ours)
Remain Spk 1	We will go out together to the bower. There is a way down to the court from my window.
Remain Spk 2	However that was over now. The tree gone, the story at an end.
Remain Spk 3	Nonsense, of course I can't really sing. Except the way my mother and grandmother did before me.

*An actual audio file spoken by Remain Speaker, but not seen during training or unlearning process.

After Step 3 — Speaker 3 unlearned

Remain Speaker	Target Text	*Audio Prompt	*Sample to Remain	CORTIS (Ours)
Remain Spk 1	We will go out together to the bower. There is a way down to the court from my window.
Remain Spk 2	However that was over now. The tree gone, the story at an end.
Remain Spk 3	Nonsense, of course I can't really sing. Except the way my mother and grandmother did before me.

*An actual audio file spoken by Remain Speaker, but not seen during training or unlearning process.

After Step 4 — Speaker 4 unlearned

Remain Speaker	Target Text	*Audio Prompt	*Sample to Remain	CORTIS (Ours)
Remain Spk 1	We will go out together to the bower. There is a way down to the court from my window.
Remain Spk 2	However that was over now. The tree gone, the story at an end.
Remain Spk 3	Nonsense, of course I can't really sing. Except the way my mother and grandmother did before me.

*An actual audio file spoken by Remain Speaker, but not seen during training or unlearning process.

After Step 5 — Speaker 5 unlearned

Remain Speaker	Target Text	*Audio Prompt	*Sample to Remain	CORTIS (Ours)
Remain Spk 1	We will go out together to the bower. There is a way down to the court from my window.
Remain Spk 2	However that was over now. The tree gone, the story at an end.
Remain Spk 3	Nonsense, of course I can't really sing. Except the way my mother and grandmother did before me.

*An actual audio file spoken by Remain Speaker, but not seen during training or unlearning process.

BibTeX

@inproceedings{
          cortis2026,
          title={Continual Speaker Identity Unlearning with Minimal Interference},
          author={Anonymous Authors},
          booktitle={Under Submission},
          year={2026}
        }