text-to-audio-latent-diffusion
-
Updated
Aug 25, 2023 - Python
text-to-audio-latent-diffusion
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
An audiobook sound effect generator that transforms SRT files into immersive audio experiences. It parses SRT files, uses ChatGPT to create sound effect prompts, generates sounds via the ElevenLabs API, and syncs the audio on an MP3 timeline.
AI Audio Framework 🎵
Production-ready voice agents and speech pipelines: STT → LLM/Agent → TTS, voice receptionists, telephony, call recording, tool/function calling. Built with Twilio, OpenAI Whisper, ElevenLabs, Vapi/Retell, FastAPI, WebSockets, ffmpeg; designed for deployment, monitoring, and real-world reliability
VoxForge Pro is a premium, offline audiobook generator powered by Kokoro-82M & Chatterbox TTS. Transform PDFs and text into professional audio using 47 lifelike voices across 6 languages. Features include voice cloning, smart OCR for scanned documents, and multi-speaker narration support.
SoundScroll is an AI audiobook generator
This project demonstrates real-time audio processing using Python. It captures audio from a microphone, converts the speech to text, and then synthesizes the text back to speech using a different voice. This can be useful for applications such as voice changers, real-time translation, and more.
AI-based music mood generation and remix system using MusicGen
Streamlining Text-to-Speech Tasks Using Google Colab
Add a description, image, and links to the ai-audio-generation topic page so that developers can more easily learn about it.
To associate your repository with the ai-audio-generation topic, visit your repo's landing page and select "manage topics."