Python Mel Spectrogram

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

WavTTS is an end-to-end zero-shot TTS framework that generates speech directly in the raw waveform space, without relying on intermediate acoustic representations such as mel-spectrograms, VAE latents ...

note

The Approach of Thunder

When investigating lightning, focusing not only on the light (the flash) but also on the sound (the thunder) is an effective approach. Thunder is a pressure wave in the atmosphere and carries ...

Scientific Research Publishing

UNESCO (2021) Towards Sustainable Preservation and Accessibility of Documentary Heritage.

ABSTRACT: The aim of this research is to develop a speech synthesis model tailored towards Nigerian languages by leveraging natural language processing tool such as FastSpeech 2 and meta-tts for ...

Scientific Research Publishing

Tan, Y. and Jehom, W.J. (2024) Preservation, Digital Technology & Culture, 53, 165-177.

ABSTRACT: The aim of this research is to develop a speech synthesis model tailored towards Nigerian languages by leveraging natural language processing tool such as FastSpeech 2 and meta-tts for ...

IEEE

Noise Pollution Classification Using Deep Learning

Abstract: With the rise and rapid growth in industrialization as well as urbanization, noise pollution has become a significant yet often overlooked threat to our environment. Transportation, human ...

Frontiers

A deep learning-based data augmentation method for marine mammal call signals

In marine ecology research, it is crucial to accurately identify the marine mammal species active in the target area during the current season, which helps researchers understand the behavioral ...

Nature

VocalMind: A Stereotactic EEG Dataset for Vocalized, Mimed, and Imagined Speech in Tonal Language

Speech BCIs based on implanted electrodes hold significant promise for enhancing spoken communication through high temporal resolution and invasive neural sensing. Despite the potential, acquiring ...

eLife

Unsupervised discovery of family specific vocal usage in the Mongolian gerbil

This valuable study provides an experimental paradigm and state-of-the-art analysis method for studying the existence of call types and transition differences among Mongolian gerbil families in a ...

Frontiers

SR-TTS: a rhyme-based end-to-end speech synthesis system

Deep learning has significantly advanced text-to-speech (TTS) systems. These neural network-based systems have enhanced speech synthesis quality and are increasingly vital in applications like ...

GitHub

Masked Spectrogram Modeling using Masked Autoencoders (MSM-MAE)

🎉 The successor to this repository, Masked Modeling Duo (M2D), is now available. If you are starting a new project, please use M2D instead of this repository. The table below compares EVAR benchmark ...

Nature

Environmental sound classification using temporal-frequency attention based convolutional neural network

Environmental sound classification is one of the important issues in the audio recognition field. Compared with structured sounds such as speech and music, the time–frequency structure of ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results