TRAMBA: A new hybrid transformer and Mamba based architecture

Source: @Columbia @ Northwestern U #ai

Voice super resolution and enhancement for mobile and wearable platforms

Researchers from Northwestern University and Colombia University have launched hybrid transformer TRAMBA and Mamba architectures to enhance acoustics and bone-conduction speech in mobile and wearable platforms. Previously, the adoption of bone conduction speech enhancement technology in such platforms faced challenges due to labor-intensive data collection and performance gaps between models. TRAMBA solves this problem by using a widely available audio-speech dataset for pre-training and using a small amount of bone conduction data for fine-tuning. It uses a single wearable accelerometer to reconstruct understandable speech, demonstrating versatility across multiple acoustic modalities. TRAMBA is integrated into wearable and mobile platforms to achieve real-time voice super resolution and significantly reduce power consumption. This is also the first study to use only a single head-mounted accelerometer to sense understandable speech.

At the macro level, the TRAMBA architecture integrates improved U-Net structures and self-attention mechanisms in the down-sampling and up-sampling layers, and integrates Mamba in the narrow bottleneck layer. TRAMBA runs on a single channel audio window of 512ms and preprocesses acceleration data from the accelerometer. Each downsampling block consists of a 1D convolutional layer with LeakyReLU activation, followed by a robust adjustment layer called Scale-Attention Only Characteristic Linear Modulation (SAFiLM). SAFiLM uses a multi-head attention mechanism to learn scaling factors to enhance feature representations. The bottleneck layer uses Mamba, which is known for its efficient memory usage and Transformer-like attention mechanism. However, due to the gradient disappearance problem, the transformer remains only in the down-sample and up-sample blocks. Residual connections are used to promote gradient flow and optimize deeper networks, improving training efficiency.

If you want to learn more, you can click on the link below the video.
Thank you for watching this video. If you like it, please subscribe and like it. thank

Quick reading: https://marktechpost.com/2024/05/08/tramba-a-novel-hybrid-transformer-and-mamba-based-architecture-for-speech-super-resolution-and-enhancement-for-mobile-and-wearable-platforms/

Thesis: https://arxiv.org/abs/2405.01242

Oil tubing: