ICTY Fabricating evidence with voice cloning

Possible abuse of voice conversion technology in the context of ICTY proceedings is an issue that has so far not been sufficiently addressed. Join us on Telegram , Twitter , and VK . Contact us: info@strategic-culture.su A few days ago, podcaster Candance Owens made an allegation that to many viewers who are unfamiliar with technological advancements in the field of voice reproduction must have sounded startling. The subject of her exposé was an alleged statement made by the assassinated American political activist Charlie Kirk. A few days before his death, Kirk supposedly stated in Colorado Springs at a meeting of his organisation, Turning Point USA, that should anything happen to him he appoints his wife Erica as his successor as TP USA chief executive officer. Many of those who attended the meeting claimed not to recall Charlie Kirk saying the words subsequently attributed to him. Yet an audio recording was soon produced of an utterance apparently in Charlie Kirk’s voice where he is heard to have said exactly that. In her comment, Owens points to ties between the new management of TP USA and a Hollywood company specialising in the production of deep fakes , including voice cloning. She has suggested that the disputed realistic sounding statement by Charlie Kirk may have been a concoction of that company’s audio engineers.

We take no position in this controversy, having commented on Kirk’s tragic assassination several months ago, shortly after it occurred. But it should be stated that just by raising the issue Candance Owens has performed a public service. Our focus is on the remarkable capabilities of voice cloning technology in another context.

Those capabilities and the time frame when they became operational are pertinent to the operation of the International Criminal Tribunal for the Former Yugoslavia [ICTY] and may affect many of the verdicts it has handed down, in particular in relation to Srebrenica. Many of those verdicts were based on forensically unauthenticated but highly incriminating voice recordings.

In light of Owens’ allegations and the technical information she shared, it may safely be concluded that authentic-sounding voice fabrication is entirely possible. Critical questions must therefore be raised about the nature and reliability of audio evidence admitted by the Hague Tribunal because such evidence has demonstrably influenced many of its judgements. Was such technology available roughly a quarter of a century ago, at the time that ICTY trials were taking place? If so, was evidence that could have been tainted by the use of such technology properly scrutinised for authenticity before being admitted by the court?

The answer to the first question is definitively affirmative. The answer to the second question is unambiguously negative. For inexplicable reasons, defence lawyers at ICTY never demanded forensic authentication of the numerous audio recordings submitted by the Prosecution although they clearly ran counter to the interests of their clients. This issue was dealt with extensively in our volume “The Hague Tribunal, Srebrenica, and the Miscarriage of Justice” [2019], pp. 96 – 116.

Defense attorneys at the Hague were perhaps inadequately informed of the current state of scientific progress in this area, but it so happens that technological advancements have had a striking impact on the integrity of audio evidence. Just as it is now possible to create authentic-looking but completely false DNA readings, it is also possible to generate an authentic-sounding voice that does not belong to the purported speaker. The technology is known as “voice conversion” or “voice morphing.” It is defined as “modifying the speech signal of one speaker (the source speaker) so that it sounds as if it had been spoken by a different speaker (the target speaker).” This is how a group of researchers in this field describes it:

“ Voice conversion (VC) is an area of speech processing that deals with the conversion of the perceived speaker identity. In other words, the speech signal uttered by a first speaker, the source speaker, is modified to sound as if it was spoken by a second speaker, referred to as the target speaker.”

The scientists also indicate some of the applications of voice conversion technology :

“The term voice conversion refers to the modification of speaker identity by modifying the speech signal uttered by a source speaker to sound as if it was spoken by a target speaker. In general, a voice conversion system is first trained using speech data from both the source and the target speakers, and then the trained models can be used for performing the actual conversion. Potential applications for voice conversion include security related usage (hiding the identity of the speaker), entertainment applications, and text-to-speech (TTS) synthesis in which voice conversion techniques can be used for creating new and personalized voices in a cost-efficient way.”

Like many similar technologies, this one also is broadly dual-use. It clearly has benign applications (as in the dubbing of foreign films whilst preserving the original voice texture of the actors) but there is also a potential for abuse.

For a telling illustration of voice conversion technology’s downside, we are indebted to the BBC when it committed an editorial slip-up whilst overzealously attempting to deceive the public. The BBC thus inadvertently furnished irrefutable proof that technology for the production of authentic-sounding but misleading audio effects is not science fiction.

In his The Truth Seeker broadcast, Russian RT channel host Daniel Bushell demonstrated how in 2013 these technological breakthroughs were actually misused for political purposes in the context of the Syrian crisis. On that particular occasion the fabrication was launched to buttress false accusations against the Syrian government that it was “killing its own people” and to discredit President Assad.

In August of 2013 the BBC broadcast a clip where an alleged doctor was claiming that Syrian forces had committed a “napalm” attack, killing a considerable number of innocent civilians. That dramatic statement, however, did not prove effective to accomplish the desired goal, which was to mobilize Western public opinion behind a military intervention in Syria. In September of 2013, exactly the same video was rebroadcast, with the same actors and identical mis-en-scène , but with a key difference. The doctor’s statement was digitally altered so that she was now being heard saying that the attack was committed not as before, using “napalm,” but with “chemical weapons.” In both video clips, the speaker’s voice sounded exactly the same.

Without a professional analysis of the audio record, an ordinary layman would never manage to perceive the hoax, nor would he suspect that the words being heard are in fact a digitally altered fake which no natural person had ever uttered.

The impression that this was an example of some new, cutting edge technology for digitally morphing the human voice, which came into existence sometime around 2013, would be completely false. As far back as February 1999, the Washington Post disclosed the existence of the same type of technology that was later used in 2013 . Even back then, which is precisely when the Hague Tribunal Prosecution was preparing its evidence for Srebrenica and other trials, that technology had reached an enviably high level of development. So much so – we learn from the Washington Post – that scientists at the Los Alamos National Laboratory in New Mexico, to whom credit for this invention is due, tried to impress their military and political superiors by making a demonstration of their discovery’s potential applications. They digitally altered the voice of a high-ranking U.S. general in such an impressive fashion that his authentic sounding voice could be heard as he seditiously agitated for a coup d’ état .

That is positive proof that digital voice alteration, capable of generating the illusion that someone said something that he never did or would not have said, is a practical possibility. The sole way to verify authenticity and remove doubts is to perform a professional forensic examination of the audio record. At the ICTY, that had never been done, and certainly not in any Srebrenica trial.

Falsification of evidence by recreating the defendant’s voice and making him say, to his detriment, things he may never have uttered is a possible and very alarming application of this new digital technology. Possible abuse of voice conversion technology in the context of ICTY proceedings is an issue that has so far not been sufficiently addressed. All audio intercept recordings that were put in evidence and influenced factual findings made by the various ICTY Srebrenica chambers must be thoroughly examined by competent and independent forensic specialists. Until that is done the integrity of that audio evidence will remain under a cloud of doubt.

Absent verification, there are sufficient grounds for suspecting inherent untrustworthiness of audio recordings used at ICTY (and, by extension, its Sarajevo clone, the State War Crimes Court of Bosnia and Herzegovina, which follows identical procedures). It is therefore warranted, even at this late stage, to sound the alarm. The trial records of those courts, particularly with respect to Srebrenica, where the greatest concentration of purported intercept and audio evidence abuse may have occurred, should be carefully combed. The authenticity of all audio evidence and intercepts relied on by the court should be subjected to thorough and independent forensic scrutiny. ICTY (and Sarajevo court) evidence of this nature that was admitted by ICTY but would fail to meet fundamental criminal case admission standards in national jurisdictions should be excluded from consideration. The verdicts of both courts should be modified accordingly to reflect the exclusion of any such flawed evidence.