Domain: Speech & Audio AI
Maps audio signals to linguistic units.
Aligns transcripts with audio timestamps.
Generates audio waveforms from spectrograms.
Temporal and pitch characteristics of speech.
Identifying speakers in audio.
Converting audio speech into text, often using encoder-decoder or transducer architectures.
Generating human-like speech from text.
Generating speech audio from text, with control over prosody, speaker identity, and style.
Changing speaker characteristics while preserving content.
Detects trigger phrases in audio streams.