Meta has released a new open-source AI tool called AudioCraft. The company claims that this tool is designed to enable both professional musicians and everyday users to create audio and music from simple text prompts.
AudioCraft is made up of three models: MusicGen, AudioGen, and EnCodec. MusicGen is trained using Meta’s own music library and can generate music from text inputs. On the other hand, AudioGen is trained in public sound effects and can generate audio based on text inputs. Additionally, the EnCodec decoder has been improved, allowing for higher-quality music generation with fewer unwanted artifacts.
Use of new AudioCraft Tool
Meta is making their pre-trained AudioGen models available, which will let users generate environmental sounds and sound effects like dogs barking, cars honking, or footsteps on a wooden floor. Furthermore, Meta is sharing all the model weights and code for the AudioCraft tool. This new tool has multiple applications, including music composition, sound effects generation, compression algorithms, and audio generation.
By open-sourcing these models, Meta aims to give researchers and practitioners access to train their own models using their own datasets.
Meta claims that generative AI has made significant strides in images, video, and text, but audio has not seen the same level of development. AudioCraft addresses this gap by providing a more accessible and user-friendly platform for generating high-quality audio.
In its official blog, Meta explains that creating realistic and high-fidelity audio is particularly challenging as it involves modeling complex signals and patterns at different scales. Music, being a composition of local and long-range patterns, presents a unique challenge in audio generation.
AudioCraft is capable of producing high-quality audio over long durations. The company claims it simplifies the design of generative models for audio, making it easier for users to experiment with the existing models.