Voice-based digital assistants are on the rise. These systems process a stream of audio data and extract information from it. Such an audio stream often contains multiple voices. For example, think of a telephone conference held in a meeting room where several people speak into a single microphone.
While software which translates the speech into text is available today, many applications benefit from another piece of information - an answer to the question who spoke when. Xelera Technologies provides an AI module for speech processing systems which distinguishes voices and splits the multi-voice audio stream into separate stream according to the different speakers within a conversation.
The Speaker Diarization module distinguishes voices of unknown speakers (pre-training of known speakers not required). It also performs a speaker identification because of its ability to remember identified voice profiles. Downstream, the Speaker Recognition module (also referred to the Speaker Diarization module) can be combined with speech-to-text and natural language processing frameworks in order to assign a speaker label to the recognized text.
The application developers can connect to the module via a Python API, a REST API, and a C++ API. The module is available for on-premises deployments as well as a cloud service. In you are interested, request a live demo at by sending an email to sales@xelera.io.