Multilingual speech recognition and speaker diarization systems and applications
Speaker diarization is applied to the analysis of various audio data, including media broadcasts, conference conversations, online content on social networks, and court hearings. It also contributes to improving the accuracy of speech recognition in multi-speaker environments.
Speech recognition and diarization tasks can be performed in (a) offline or (b) streaming (real-time) modes. In the former case, the input typically consists of a single speech recording, where the entire content can be utilized, including multiple processing passes, with no strict limitation on the required computation time. The latter case, which is more complex, has become crucial today as data exploration fields face an increasing challenge: the rapid analysis of varied data streams, ideally in real-time.
[…]