This thesis work addressed the problem of compression/decompression of audio data, specifically the implementation and optimization of the codec Adaptive Differential Pulse Code Modulation (ADPCM). Audio codecs play an important role in audio systems since memory space management and hardware cost requirements are an important aspect of the design. This issue has progressed significantly in recent decades, especially with the digitization of audio, due to the need to transmit or store data using increasingly higher bit rates and compression ratios, compared to the PCM format signal. The latter format contains a lot of information that is not needed during playback because it is inaudible to the human ear. Starting from the current state of the art in the field of PCM data compression/decompression, after a careful evaluation of what is available in the literature, it was possible to think about selecting and optimizing a compression/decompression algorithm by creating a spectral model that takes advantage of the perceptual system. In this context, this thesis work was focused, therefore, on the implementation, optimization and application of audio compression algorithms with the aim of obtaining a good trade-off between compression ratio, human perception, audio signal fidelity and implementation practicality in Embedded systems. From the considerations made, it can be guessed that the object of study of this work is of close relevance and the subject of continuous research that this work has sought to investigate. The work covered in this thesis is the result of the internship activity carried out at KORG Italy S.p.A, a company known and appreciated worldwide in the music field for the design, realization and production of highly innovative digital pianos and home keyboards. Going into more detail, the study involved the improvement of KORG's proprietary audio data compression/decompression algorithm, natively developed in the Visual Studio environment and then tested through NU-Tech software. This implementation allowed simulation of the results in realtime, with the aim of seeking optimization in terms of higher compression ratio and audio fidelity while maintaining a computational capacity to allow real-time PCM data decompression, which is necessary in the music field to decompress compressed sound samples stored in flash and used to synthesize sound using wavetable methodology. Therefore, a detailed analysis of the KORG algorithm was made and then the ADPCM algorithm was identified as the most suitable code for the objectives of this thesis. In particular, the ADPCM codec was implemented in Visual Studio and analyzed starting from the IMA ADPCM standard. The core of the work then involved the optimization of the adaptive predictor ITU G.726. Testing measurements, implemented in MATLAB, were carried out at the acoustic, spectral and numerical-qualitative level with the calculation of indices, such as Root-Mean-Square Error(RMSE), Signal to Noise Ratio(SNR) and Itakura-Saito, taking into account the speed and thus of data compression/decompression times. The strength of the ADPCM codec results from the integration of the decoder directly into the encoder, this avoids quantization errors that lead to signal drift. The optimization of the predictor has seen improvements in signal fidelity over the standard version by using a second-order predictor.
In questo lavoro di tesi è stato affrontato il problema della compressione/decompressione dei dati audio, in particolare dell’implementazione ed ottimizzazione del codec Adaptive Differential Pulse Code Modulation (ADPCM). I codec audio hanno un ruolo importante nei sistemi audio dal momento che la gestione dello spazio in memoria e le esigenze relative ai costi hardware rappresentano un aspetto importante della progettazione. Negli ultimi decenni questa tematica ha registrato progressi significativi soprattutto con la digitalizzazione dell’audio, grazie alla necessità di trasmettere o memorizzare dati utilizzando bit rate e rapporti di compressione sempre più alti, rispetto al segnale in formato PCM. Quest’ultimo formato contiene molte informazioni non necessarie in fase di riproduzione poiché non udibili per l’orecchio umano. Partendo dallo stato dell’arte attuale in ambito di compressione/decompressione dei dati PCM, dopo una attenta valutazione di quanto disponibile in letteratura è stato possibile pensare di selezionare ed ottimizzare un algoritmo di compressione/decompressione creando un modello spettrale che sfrutti il sistema percettivo. In questo contesto, questo lavoro di tesi è stato focalizzato, quindi, sull’implementazione, ottimizzazione ed applicazione di algoritmi di compressione audio con il fine di ottenere un buon trade-off tra rapporto di compressione, percezione umana, fedeltà del segnale audio e praticità implementativa nei sistemi Embedded. Dalle considerazioni fatte si può intuire come l’oggetto di studio di questo lavoro sia di stretta attualità ed oggetto di continua ricerca che questo lavoro ha cercato di approfondire. Il lavoro trattato in questa tesi è frutto dell’attività di tirocinio svolta presso KORG Italy S.p.A, azienda conosciuta ed apprezzata nel mondo in campo musicale per la progettazione, realizzazione e produzione di pianoforti digitali e home keyboards altamente innovativi. Andando più nel dettaglio, lo studio ha riguardato il miglioramento dell’algoritmo di compressione/decompressione dati audio di proprietà della KORG, sviluppato nativamente in ambiente Visual Studio è poi testato attraverso il software NU-Tech. Questa implementazione ha permesso la simulazione dei risultati in realtime, con il fine di cercare una ottimizzazione in termini di maggiore rapporto di compressione e fedeltà audio mantenendo una capacità computazionale che permetta la decompressione dati PCM in tempo reale, necessaria in campo musicale per decomprimere i campioni di suono compressi memorizzati in flash ed utilizzati per sintetizzare il suono con metodologia wavetable. È stata quindi fatta un’analisi dettagliata dell’algoritmo KORG per poi individuare, nell’algoritmo ADPCM, il codice più adeguato agli obiettivi di questa tesi. In particolare, il codec ADPCM è stato implementato in Visual Studio ed analizzato partendo dallo standard IMA ADPCM. Il cuore del lavoro ha riguardato poi l’ottimizzazione del predittore adattivo ITU G.726. Le misure di testing, implementate in MATLAB, sono state effettuate a livello acustico, spettrale e numerico-qualitativo con il calcolo di indici, quali Root-Mean-Square Error(RMSE), Signal to Noise Ratio(SNR) ed Itakura-Saito, tenendo conto della velocità e quindi dei tempi di compressione/decompressione dati. Il punto di forza del codec ADPCM risulta dall’integrazione del decoder direttamente nell’encoder, questo evita gli errori di quantizzazione che portano alla deriva il segnale. L’ottimizzazione del predittore ha visto miglioramenti in termini di fedeltà del segnale rispetto alla versione standard, grazie all’utilizzo di un predittore del secondo ordine.
Tecniche avanzate di elaborazione del segnale digitale per la compressione e la decompressione audio in tempo reale
FIORENTINO, IVANA MICHELA
2021/2022
Abstract
This thesis work addressed the problem of compression/decompression of audio data, specifically the implementation and optimization of the codec Adaptive Differential Pulse Code Modulation (ADPCM). Audio codecs play an important role in audio systems since memory space management and hardware cost requirements are an important aspect of the design. This issue has progressed significantly in recent decades, especially with the digitization of audio, due to the need to transmit or store data using increasingly higher bit rates and compression ratios, compared to the PCM format signal. The latter format contains a lot of information that is not needed during playback because it is inaudible to the human ear. Starting from the current state of the art in the field of PCM data compression/decompression, after a careful evaluation of what is available in the literature, it was possible to think about selecting and optimizing a compression/decompression algorithm by creating a spectral model that takes advantage of the perceptual system. In this context, this thesis work was focused, therefore, on the implementation, optimization and application of audio compression algorithms with the aim of obtaining a good trade-off between compression ratio, human perception, audio signal fidelity and implementation practicality in Embedded systems. From the considerations made, it can be guessed that the object of study of this work is of close relevance and the subject of continuous research that this work has sought to investigate. The work covered in this thesis is the result of the internship activity carried out at KORG Italy S.p.A, a company known and appreciated worldwide in the music field for the design, realization and production of highly innovative digital pianos and home keyboards. Going into more detail, the study involved the improvement of KORG's proprietary audio data compression/decompression algorithm, natively developed in the Visual Studio environment and then tested through NU-Tech software. This implementation allowed simulation of the results in realtime, with the aim of seeking optimization in terms of higher compression ratio and audio fidelity while maintaining a computational capacity to allow real-time PCM data decompression, which is necessary in the music field to decompress compressed sound samples stored in flash and used to synthesize sound using wavetable methodology. Therefore, a detailed analysis of the KORG algorithm was made and then the ADPCM algorithm was identified as the most suitable code for the objectives of this thesis. In particular, the ADPCM codec was implemented in Visual Studio and analyzed starting from the IMA ADPCM standard. The core of the work then involved the optimization of the adaptive predictor ITU G.726. Testing measurements, implemented in MATLAB, were carried out at the acoustic, spectral and numerical-qualitative level with the calculation of indices, such as Root-Mean-Square Error(RMSE), Signal to Noise Ratio(SNR) and Itakura-Saito, taking into account the speed and thus of data compression/decompression times. The strength of the ADPCM codec results from the integration of the decoder directly into the encoder, this avoids quantization errors that lead to signal drift. The optimization of the predictor has seen improvements in signal fidelity over the standard version by using a second-order predictor.File | Dimensione | Formato | |
---|---|---|---|
TesidefinitivaA.pdf
accesso aperto
Descrizione: Compressione/Decompressione dati audio
Dimensione
6.5 MB
Formato
Adobe PDF
|
6.5 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.12075/9279