Putting the simultaneous translation tool "transcribby AI" from Deutsche Telekom MMS to the test: at the last Agritechnica (12 to 18 November 2023), the world's leading trade fair for agricultural technology organised by Deutsche Landwirtschafts-Gesellschaft (DLG). November 2023), the world's leading trade fair for agricultural technology organised by the Deutsche Landwirtschafts-Gesellschaft (DLG), the required application was used for the first time and in parallel on five stages. Photo: DLG e.V.

Part 2: Simultaneous translations

KI im Eventmanagement

Communication across borders: The event industry brings people from all over the world together to build bridges between cultures, promote dialogue and create spaces in which visions become tangible. AI-supported translation technologies open up new possibilities and promise a dialogue that can be conducted independently of the global diversity of languages.

Click here for part 1: Feedback solutions

EuroTier 2024 will run until 15 November at the exhibition centre in Hanover. The world's leading trade fair for professional animal husbandry and livestock management regularly attracts over 100,000 visitors from all over the world every two years, alternating with Agritechnica. DLG Service GmbH, the specialist organisation of the German Agricultural Society (DLG) and organiser of the two leading trade fairs in Hanover, is concerned with more than just providing the 2,200 registered exhibitors from 52 countries with a marketplace for their innovations, products and services. With an accompanying specialist programme, the trade fair will be a platform for several leading events in the international livestock sector.

The organiser's approach is to use simultaneous conferences, congresses and events to bring together livestock farmers, the agribusiness and experts for networking and professional exchange and to create a relaxed atmosphere for the exchange of ideas through a mix of international keynotes, roundtables, award ceremonies and subsequent get-togethers. For example, one day before the official opening of this year's EuroTier, the poultry industry took centre stage at the "International Poultry Day" in the Convention Center of the exhibition grounds. Three speakers - from Italy, the USA and Germany - kicked off the "International Poultry Conference" with keynote speeches on sustainability in the poultry industry.

In an increasingly globalised world, internationality is becoming more and more important, and not just in the congress and trade fair landscape. Language barriers and different cultural backgrounds are an obstacle to the organisers' ambitious goal of facilitating seamless communication and thus promoting knowledge transfer and networking. In order to reach as large an audience as possible, the DLG relied on simultaneous translation into German and English for both its poultry conference and the subsequent networking format "International Poultry Event" - without, however, resorting to the traditional solution of hiring human simultaneous interpreters.

Photo by Vok Dams

"The growing demand for inclusivity in the B2B sector calls for solutions that appeal equally to all participants. AI-driven audio-to-audio translation offers an effective way of making international events accessible.".

Jan Filipzik, Senior Manager Marketing & AI at Vok Dams Events and Live Marketing.

Marcus Vagt, Project Manager EnergyDecentral and Head of Trade Fairs & Events at DLG Service GmbH, explains: "In view of the constantly improving quality of AI-based translation tools, we therefore looked around for a simple and practical solution." The non-profit organisation wanted a tool that would make stage presentations and panels easier to understand for the audience and simultaneously translate them into written form (speech-to-text). At the same time, the text output was to be optimised for a stage monitor and translated live into other languages. There was also the requirement to make the transcribed text and its translation accessible via a QR code on the audience's mobile devices so that they could also select the desired language and call up an audio output (text-to-speech). Deutsche Telekom MMS has developed a suitable application for this based on AI language models from Azure AI Services, which enable speech recognition, speech synthesis and language translation: "transcribbyAI" is the name of the transcription and translation tool, for which organisers only need a client such as a laptop, via which the audio stream is received and the required web app is opened for data processing. According to Telekom's digital service provider, integrated, automatically scaling cloud functions and a message broker ensure quick and uncomplicated use at conferences and trade fairs. The solution recognises the source language from the audio data and can translate into over 100 languages individually for each user.

KI - Lexicon

Neural Machine Translation (NMT) NMT systems are based on artificial neural networks whose main components are composed of an encoder that processes the input text, a decoder that generates the translation and an attention mechanism that focuses on relevant parts of the input text. The neural network is trained with a large amount of parallel text data in different languages and learns to recognise patterns and relationships between words and phrases in the different languages. The training is carried out without explicit linguistic rules, but through statistical analyses of the training data. Automatic Speech Recognition (ASR) ASR is the first step in the process of full speech understanding and is responsible for converting spoken language into text. Modern ASR systems such as Whisper from Open AI or wav2vec 2.0 from Facebook often use end-to-end deep learning approaches such as transformer-based architectures. Natural Language Understanding (NLU) NLU processes the text generated by the ASR to understand the meaning and intent of the user. NLU systems typically extract intents and entities from the user text and are often based on large language models (LLMs).

Transcription takes place in parallel to the spoken word, with the tool making adjustments such as capitalisation and punctuation during pauses in speech; any inappropriate expressions are automatically replaced by symbols in the text. It differentiates between speakers based on voice colour. Because it was also specially designed for public institutions and is therefore operated on German servers with certified IT security in compliance with the GDPR, transcribbyAI proved to be more than just a cost-effective alternative for DLG after its trial run at the last Agritechnica: "Automatic speech recognition is exactly what we need and makes simultaneous translation at our trade fairs so much easier," says Vagt.

Digitale Inklusion

He continues: "With the written and translated content, we offer the audience, experts and partners an additional service and increase the added value of our events." Finally, the tool also offers options for export and further processing, integration into external systems and customisation to the respective brand design. What's more, thanks to digital inclusion, more people - such as those with visual and hearing impairments - would also benefit from improved access to content. "In addition to the aspect of accessibility, the big advantage for us as organisers is that we can choose from a much larger pool of speakers," says Dr Andreas Närmann, one of the DSAG working group spokespersons for HR. Närmann co-initiated the first presentation with AI support at this year's Personnel Days organised by the German-speaking SAP User Group (DSAG) in the Osnabrückhalle: With the help of OpenAI Whisper and the cloud-based Google Translate, the Location team was able to present its own AI-based real-time translation technology to around 900 attendees. The Whisper automatic speech recognition (ASR) system is a powerful model that uses an encoder-decoder architecture based on transformers and has been trained with a data set of 680,000 hours of multilingual and multitask-supervised data. According to the developer OpenAI, it can process accents, background noises and (technical) terms well.

Speech to text: At the beginning of June 2024, the Osnabrückhalle used artificial intelligence for the first time for real-time translations during a presentation at this year's Personnel Days organised by the German-speaking SAP User Group (DSAG). An English-language presentation was provided live with German subtitles. Photo: Osnabrückhalle

It should be able to transcribe around 60 different languages into text and translate them into English - but not entirely without technological hurdles: "Many parameters have to be set precisely - for example, when a sentence ends and when a part is translated or corrected. Different language speeds of the speakers require individual adjustments that cannot be implemented dynamically, which can occasionally lead to mistranslations," says Shawn Hellmann, AI & Event Technology employee at Osnabrückhalle. The German captions are ultimately created using Google's free translator. This also reveals the limitations that organisers should be aware of when using AI-supported simultaneous translation tools.

Word-perfect, but without cultural understanding

In order to analyse and translate voice data, many of these systems have to use cloud services, which can raise data protection concerns. The processing and possibly storage of conversations and speeches - especially for sensitive or confidential topics - is an important point that event organisers should include in their planning and communication to ensure the trust of participants. In addition to technical limitations and disruptions, one of their biggest problems is the (still) lacking ability to comprehensively analyse context: human interpreters often take into account the previous discussion or the entire context of a presentation in order to find the right tone and choice of words. AI-supported tools, on the other hand, usually only process a few sentences at a time and therefore have difficulty recognising further contextual connections. This can lead to misunderstandings, especially in longer speeches or presentations. Simultaneous machine translation also (still) finds it difficult to recognise and translate cultural nuances and idiomatic expressions. It is still easier for human interpreters, who bring not only language skills but also cultural understanding to the table, to translate linguistic images or region-specific expressions correctly and to capture the intention of a statement.

Many AI translation tools can often be easily integrated into existing event management platforms or mobile event apps or can be used browser-based via a mobile phone. Photo: Kudo AI

short, human interpreting remains the highest quality form of linguistic accessibility. You also get the emotion and nuance of a real human voice and the expertise of someone trained in the terminology and context of specific industries or topics," says the web conferencing platform Kudo, which offers both a network of over 12.000 professional interpreters and, since January 2023, AI-based language translation (Kudo AI) for simpler use cases. This is because "the quality of AI language translation today is nevertheless high - higher than most people expect", the American company with a branch in Geneva is certain. The Kudo AI language translator can now handle over 45 languages and can also be implemented as a non-stand-alone application in event apps such as Eventmobi or webcasting platforms such as GlobalMeet or Microsoft Teams, but can also be used at live events. Like transcribbyAI, it can translate from language to language, allowing speakers to be heard in their preferred language without having to follow the subtitles. However, the special feature of the tool is the conversation mode that has been available since this summer: users of the platform now have the additional option of activating subtitles at increased speed and thus having a conversation in real time. When translating back and forth, the subtitles appear on the screen in the desired language with a maximum delay of 1-2 seconds, according to Kudo. The tool Interprefy AI, the German company Silutions GmbH has been working with in a similar way for some time now. The headphones from Silutions not only make it possible to block out the ambient noise in a congress or trade fair hall and thus improve the flow of information, but also to translate the event via up to 10 different channels. "Compared to traditional interpreters, [audio-to-audio translation] is more efficient and allows speakers to speak in their native language. This keeps the messages real and clear," summarises Jan Filipzik, Senior Manager Marketing & AI at VOK DAMS Events and Live Marketing. "Every guest," says Filipzik, "is reached in their own language, regardless of whether it is spoken by many or just one participant. Language barriers are eliminated and the content takes centre stage. This increases engagement and ensures greater satisfaction." So while modern AI-supported simultaneous translation tools already offer many advantages, the technologies are only at the beginning of their development. However, simply overcoming language barriers does not simultaneously lead to mutual understanding. Continuous learning and fine-tuning of the models to domain-specific data are therefore important for optimising performance, increasing accuracy and more precise context analysis. Every use and targeted training helps here - as the DLG is currently implementing at all Expert Stages of EuroTier, EnergyDecentral and Inhouse Farming - Feed & Food Show.

Justine Hein

Share this article

Order eMag