Welcome! This project attempts to design a proof-of-concept model that converts bass guitar audio into Western notation. The task is analogous to Automatic Speech Recognition, but with music and sheet notation (in Lilypond) instead of speech and natural language.
Very little research has been conducted on notation-level transcription. From empirical results, most existing models do not perform well for bass. The primary goal is to establish such a model with a broader vision to serve as a useful tool for bassists and other musicians. After all, the world needs more bass players!
For this project, no suitable existing datasets were available. Therefore, the task includes generating bass guitar audio with corresponding labels in Lilypond format (inspired by this paper) to train the model. Data scarcity is a major problem for automatic music transcription; annotating music scores is an incredibly time-consuming and difficult task, even for musicians.
Part of the GNU project, Lilypond is a powerful music engraving toolset. It provides a nice and concise syntax for writing music notation and is widely adopted.
The model consists of a Convolutional Neural Network audio encoder and a Transformer implemented in PyTorch and trained from scratch. References and inspirations are from this paper and OpenAI Whisper.
The model achieved an average word error rate (WER) and character error rate (CER) of < 10%.