Audio Analysis and Transcription

Student/in

Vorname: Lisa

Name: Gan

Kurzbeschreibung

Transcribing music is a time-consuming and laborious process. In order to aid human experts in the field of polyphonic music transcription, many different automated approaches have been proposed. This thesis presents a user-friendly open-source Python prototype that accurately transcribes and generates polyphonic piano music. To achieve this, several models from the Magenta environment were picked and conjoined into one coherent application. The models are based on recurrent neural networks combined with long short-term memories to transcribe raw piano audio and produce new musical sequences. For the purpose of visualization, the prototype integrates MuseScore 3 to create sheet music. A graphical user interface was built in order to increase user-friendliness. Test results reveal that this prototype accurately detects notes within seconds and generates sequences that sound unique and harmonic.

Betreuer

Stefan Röhrl

Research

State of the Art

Anthem Score: Software for Automatic Music Transcription (https://www.lunaverus.com/)
Melody Scanner: Software for Automatic Music Transcription (https://melodyscanner.com/)
Scorecloud: Software for Automatic Music Transcription (https://scorecloud.com/)

Python Audio Library List:

Python Libraries:

Madmom: Audio Signal Processing Library (https://github.com/CPJKU/madmom)
Librosa: Python Package for Music and Audio Analysis (https://github.com/librosa/librosa)
Mir_eval: Python Library that evaluates music information retrieval systems (https://github.com/craffel/mir_eval)
PyAudioAnalysis: Library for Audio Extraction, classification, segmentation and application (https://github.com/tyiannak/pyAudioAnalysis)
Aubio: Library to label music and sounds (https://github.com/aubio/aubio)
Mingus: Package for Python used by programmers, musicians, composers and researchers to make and analyse music (https://github.com/bspaans/python-mingus)
PyAudio: Python bindings for PortAudio audio input and output (http://people.csail.mit.edu/hubert/pyaudio/

Open source code:

wav2mid: Polyphonic Piano Music Transcription with Deep Neural Networks (https://github.com/jsleep/wav2mid)
Rtmonoaudio2midi: Real-Time Note Recognition in Monophonic Audio Stream (https://github.com/aniawsz/rtmonoaudio2midi)
CMU 15-112 Project: Musical Transcription and Pitch Detection in Python (https://github.com/nhfruchter/music-transcription)
An AI Approach to Automatic Natural Music Transcription (https://github.com/mbereket/music-transcription)
A small framework for conducting Melody Extraction experiments (https://github.com/kukas/music-transcription)
Onsets and Frames Transcription: https://github.com/tensorflow/magenta/tree/master/magenta/models/onsets_frames_transcription
audio-to-midi-melodia: (https://github.com/justinsalamon/audio_to_midi_melodia)
Applying a second voice onto an existing midi data: https://www.twilio.com/blog/generate-music-python-neural-networks-magenta-tensorflow
https://github.com/DivyanshMalhotra/Audio-Note-detection
https://github.com/AyushKaul/Musical-Note-detection
https://github.com/laveshjain11/python-note-detection
https://github.com/miromasat/pitch-detection-librosa-python
https://github.com/NFJones/audio-to-midi

After thoroughly testing and evaluating the existing open source codes, I came to the conclusion that only the Magenta repository provides beneficial scripts and models for accurate music processing. Furthermore, the creators are still actively researching and developing this project, so it has got the most potential. This thesis therefore operates within the powerful Magenta environment to create an audio transcription and generation program.

Implementation

Program IO

The prototype of this thesis can be separated into three different fields. Automatic music transcription, automatic music generation and MIDI to sheet music transcription. The first two fields utilize different models and datasets provided by Magenta, while an external software deals with the MIDI to sheet music transcription. This graphic display the overall idea of this prototype. The three different components are marked in blue.
In order to achieve user friendliness, building a graphical user interface proves to be mandatory, as it connects all the fields into a coherent program.

Program Structure

This graphic displays the exact structure of this program. The python scripts are marked in yellow. Signals and Slots system provides a seamless transition between the scripts and external applications. This way, input parameters are all automatically passed on where needed.
In the beginning the user can choose to select a WAV audio or MIDI file from their local explorer. Optionally, they can record a WAV file on their microphone with recording.py. To use the transcription feature transcription.py, a Wave file must be selected. Then, this script will transcribe the Wave audio into a MIDI file and save it in the same directory as the input file.

This program automatically passes on the MIDI file to other features that all automatically generate new sequences. In Python, these scripts are referred to as melody.py, chords.py, improv.py and bach.py.

By the end, MuseScore3 transcribes the produced MIDI to Sheet Music and saves it as a PDF file.

Music Transcription

The main libraries are Magenta and Tensorflow. This script contains machine learning algorithms that detect onsets and frames in raw piano audio. First, the function create_example() reads in the specified WAV file and processes it to the fitting format. The library called librosa, which features powerful music processing functions, handles the procedure. This step is necessary because the input audio needs to have a certain representation before feeding it into the algorithm. After preprocessing the audio input, the main function transcribe(argv, config\_map, data\_fn) detects and transcribes musical notes from the WAV file. The input parameters are preset as the algorithm uses a pre-trained checkpoint to predict onset and frames. The function creates a new MIDI file which is saved in the same folder as the WAV. The filename consists of the original WAV filename with ".MIDI" added at the end.

Music Generation

melody.py
This feature creates a new melody and adds it on top of the Primer MIDI, creating a continuation. It operates on a fully monophonic basis. This means, that if necessary, the algorithm will transform the Primer MIDI into a single stream of note events. For this, it filters out the highest note in each harmony and discards the rest in order to keep the existing melody. Then, a new monophonic sequence is produced and attached to the existing piece. The sheet music consists of one single voice.
chords.py
This script finds suitable chords to a Primer MIDI which is transformed into a monophonic melody. Hence, it automatically fits a harmony onto an existing musical piece. The chords consist of one base tone and the corresponding triad which are played as semibreves. If the Primer MIDI is short, the algorithm will generate additional notes to reach the appropriate length. Therefore, it also continues the melody. This program renders the backing chords as notes in the output MIDI file. On sheet music, three voices are displayed. The first voice depicts the Primer MIDI, the second voice shows the triad and the third voice contains only the base tone.
improv.py
Another application that continues an existing piece of music. Unlike melody.py it does run on a polyphonic basis. Nonetheless, the algorithm analyses the Primer MIDI and simplifies larger harmonies while maintaining the existing melody. The generated piano sequence is polyphonic and has a distinct sound. It is fast-paced as it mostly contains 16th notes, has large interval jumps and therefore creates a very vivid, rushed listening experience.
bach.py
This script generates a new polyphonic piano sequence based on Bach. The algorithm was trained on the Bach Chorales Dataset. The Primer MIDI is again simplified but still polyphonic. As the name implies, the result musically imitates Bach’s work, as his piano pieces have a very distinct sound. \\

Test Results and Evaluation

Test Results

Sample	Duration	Transcription Time in s	Note Accuracy in %	Velocity Accuracy in %	Overall Rating
Piano1	9s	4.6	100	100	100
Piano3	13s	4.1	85	100	92
Für Elise	2:39	15.7	96	75	86

Evaluation

Algorithms succeed in synthetic, clean piano audio
Note accuracy drops immediately with noise
Velocity accuracy drops due to changes in dynamics

Survey

From a scale of 1 to 10, with 10 being the best, how good does this piece sound?
Do you think an AI created this piece?
Would you consider this creative?

Feature	Question 1	Question 2	Question 3
melody.py	8	No (75%)	Yes
bach.py	9	No (90%)	Yes
improv.py	5	Yes (75%)	Yes
chords.py	6	Yes (70%)	Yes

(20 probands were asked)

Evaluation

Bach -> familiar -> good sounding ?
Good sounding -> human creation VS bad sounding -> AI creation ?
Something is being created -> creative

Dateien

https://gitlab.ldv.ei.tum.de/komcrea/musik/-/tree/master

Bereichsverknüpfungen

Seitenhierarchie

Student/in

Kurzbeschreibung

Betreuer

Table of Contents

Research

State of the Art

Python Audio Library List:

Python Libraries:

Open source code:

Implementation

Program IO

Program Structure

Music Transcription

Music Generation

Test Results and Evaluation

Test Results

Evaluation

Survey

Evaluation

Dateien