Abstract: Given a piece of text, a video clip, and reference audio, the movie dubbing (also known as Visual Voice Cloning, V2C) task aims to generate speeches that clone reference voice and align well ...
Abstract: Neural network models have achieved state-of-the-art performance on grapheme-to-phoneme (G2P) conversion. However, their performance relies on large-scale pronunciation dictionaries, which ...
Sony Entertainment Group has outlined plans to expand the use of artificial intelligence in anime and video games as part of its 2025 corporate report. The company confirmed that machine-learning ...
This is the official repository of our paper g2pW: A Conditional Weighted Softmax BERT for Polyphone Disambiguation in Mandarin (INTERSPEECH 2022). (This work was tested with PyTorch 1.7.0, CUDA 10.1, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results