Synergy DSP Enters the AudioML Space with Unsupervised Learning in Future VSTs

Leon M.

Dec 28, 20244 min read

Introduction

At Synergy DSP, we’ve always pushed the boundaries of digital signal processing. Our journey has taken us from pioneering high-fidelity audio plugins to designing cutting-edge machine learning (ML) models for audio. Now, we’re excited to announce the next step in our evolution: entering the AudioML space with unsupervised learning, using our Fusion DSP and Fusion ML libraries to power a new generation of audio tools—specifically VST plugins that take advantage of these advanced ML techniques.

In this blog post, we’ll explore the motivation behind this leap, provide a glimpse into our research efforts, and share a roadmap for how unsupervised learning will shape the future of our VST offerings.

The Convergence of DSP & ML

Why AudioML?

As audio content creation becomes more sophisticated, we see greater demand for tools that can address the subtleties of sound design—tools that move beyond fixed algorithms or meticulously hand-tuned filters. Traditional audio plugins often rely on expert knowledge embedded into well-known DSP structures. While these techniques are powerful, they can be limited when confronted with nuanced tasks or large volumes of novel audio content.

Machine Learning (ML) presents a potent alternative. Rather than relying solely on pre-defined logic, ML-powered systems learn from data directly. This adaptability opens up possibilities for automation, creativity, and efficiency impossible with traditional DSP alone.

The Rise of Unsupervised Learning

Within the realm of ML, unsupervised learning is rapidly gaining traction. Supervised learning requires labeled examples (e.g., a dataset of instruments labeled as “guitar,” “violin,” “piano”), whereas unsupervised learning thrives on unlabeled datasets—finding patterns, structures, or clusters in the data without needing explicit labels.

For audio, the sheer amount of unlabeled data available is staggering. From raw samples recorded in nature to vast libraries of music, the opportunities to unearth latent patterns are enormous. Unsupervised learning can reveal hidden structures, blend sound characteristics, and open doors to entirely new approaches in audio processing and synthesis.

Synergy DSP’s Journey

Fusion DSP & Fusion ML

At Synergy DSP, we’ve been working on two foundational libraries:

Fusion DSP Our flagship DSP library, built with high-performance algorithms and meticulously engineered for low latency and high-quality audio processing. Fusion DSP powers our existing VSTs—both commercial and custom solutions.
Fusion ML Our internal machine learning engine, optimized for audio tasks. Fusion ML features cutting-edge architectures, specialized input layers for raw audio, and advanced transformations to handle the complexities of time-frequency analysis.

These two libraries have grown side by side, and now we’re at the point where integrating them can unlock new frontiers in real-time audio processing.

Unsupervised Learning Meets Audio

What Does Unsupervised Audio Processing Look Like?

Typical audio ML solutions—like speech recognition or instrument classification—depend on labeled datasets. But our approach focuses on more emergent tasks:

Automatic Feature Extraction Rather than handcrafting features like spectral centroids or MFCCs, unsupervised models learn to detect and extract meaningful audio features on their own—whether it’s a subtle timbral quality or a recurring rhythmic pattern.
Audio Texture Synthesis By learning the “signature” of a sound in an unlabeled context, unsupervised models can generate novel textures that faithfully capture the essence of the original. This is particularly exciting for sound designers and composers who need new sonic palettes.
Generative Sound Design Think of an unsupervised model that’s trained on thousands of ambient recordings. It could then generate entirely new ambiences or transitions that capture and blend the stylistic nuances of all those sources—without relying on labeled data.
Adaptive Noise Reduction Models that self-adjust to new noise profiles by clustering different segments of audio and automatically learning how to reduce or remove unwanted artifacts, all without needing explicitly labeled noise examples.

Our Path Forward

1. Research

Latent Space Exploration

We will test generative models that create a “latent space” representation of audio. Early results are expected to show promise in allowing us to morph between diverse sounds (e.g., from a bird call to an eerie synth texture) in an intuitive manner.

Style Transfer for Audio

Building on the idea of style transfer in images, we will explore how to transfer the “style” of one sound to another—completely transforming the timbre, texture, or ambiance in real time.

2. Next-Gen VST Plugins

We believe that VST plugins will be the perfect medium for delivering these new capabilities to audio professionals. Imagine a “Synergy Unsupervised Reverb” plugin that automatically learns the reverberant profile of your project’s acoustic footprint and adapts itself in real-time to new recordings or changes in the mix.

3. User-Centric Design & Workflow

Despite all this advanced technology, we’re committed to keeping the user experience simple and intuitive. While we focus on robust unsupervised models under the hood, the interface will empower audio engineers, producers, and composers to dial in the perfect sound quickly. Our design philosophy emphasizes minimal friction, letting you focus on creativity rather than technical minutiae.

Challenges & Opportunities

While the potential is exciting, we’re aware of the challenges:

Computational Complexity Real-time ML-based audio processing can be CPU-intensive. At Synergy DSP, we’re looking into investing in GPU offloading, efficient quantization, and specialized kernels to keep latency low.
Data Privacy We’re exploring on-device training and inference where possible, ensuring that sensitive audio data remains private.
User Education Unsupervised ML is a new frontier for many audio professionals. We’ll provide comprehensive documentation, tutorials, and community outreach to help users get the most out of these new tools.
Consistency & Stability Machine learning models can sometimes yield unpredictable results. Our engineering team is dedicated to extensive testing and refining to ensure consistent, stable output for every plugin release.

Looking Ahead

Synergy DSP’s entry into the AudioML space with unsupervised learning is a long-term commitment. We see this not as a one-off experiment, but as a game-changing evolution of how audio processing tools will be developed and used in the future.

From generative sound design to adaptive processing, the possibilities are limitless. We believe these innovations will empower audio professionals to explore uncharted territory, open up creative horizons, and break free from the constraints of traditional plugin technology.

Closing Thoughts

The intersection of Fusion DSP and Fusion ML marks an exciting new chapter for Synergy DSP. By embracing unsupervised learning, we aim to provide cutting-edge VST plugins that combine top-tier DSP performance with unprecedented ML-driven insights.

Stay tuned for upcoming announcements, beta testing opportunities, and collaborative projects. In the meantime, thank you for joining us on this journey. We can’t wait to hear what you’ll create with our next generation of AudioML tools!