individual stems #2

zfarrell13 · 2023-09-17T21:14:27Z

zfarrell13
Sep 17, 2023

Hey mate, thanks for all of your quick responses recently.

I am interested in training tailored inference models based off of different stems/instruments in different styles. for the sake of an example, lets take the "bass" instrument in the style of genre x.

Common sense would tell me that i would need to build a dataset of bass midi in the genre of x, and train with that instead of the dataset you have provided access to.

I have noticed a bit of code in the generate portion of the Allegro_Music_Transformer_Composer.ipynb file that is meant to handle a variety of stems.

Obviously, if its trained on all bass stems, it will most likely only produce midi notes in the thresholds defined for that instrument. do you think that diverting attention away from other frequencies, and focusing on a smaller threshold could enhance the quality of the output at all?

A few other questions:

I can jump in and do the analysis myself through experimentation but are the individual midi files in the original training dataset a certain length? ive noticed your sample compositions on soundcloud are relatively long - 2 minutes-ish - are the midi files in your original dataset a similar length? or were you able to train with, for example, 10 second long midi files, but still able to compose 2 minute songs

asigalov61 · 2023-09-17T22:44:08Z

asigalov61
Sep 17, 2023
Maintainer

@zfarrell13 You are welcome :)

Yes, focusing the model's attention on a specific instrument by training on the MIDIs that all have that particular instrument does help indeed! Also, since my implementations use token shifting for each instrument, you can usually expect the model to output each instrument in its expected range, although sometimes model can still go outside of range.

RE: MIDIs length: The MIDI's length does not really matter because its usually much longer (in tokens) than the seq_len of the model. I did some checking and on average each MIDI is usually 16k tokens long, especially multi-instrumental stuff.

So what you can do for your project if you want to use small excerpts from MIDIs, you can cut each MIDI into chunks of seq_len of your model. For example, Allegro Music Transformer seq_len is 2048 tokens, which is equivalent to ~ 682 notes. So as long as your chunks are <= 2048 tokens/682 notes, you should be good to go.

On my soundcloud most of my compositions are supervised continuations, meaning that they were made by putting together smaller output chunks of the music. Compound compositions basically.

Hope this makes some sense but feel free to ask questions as I know that it takes a little to get a grasp on how it all works :)

Alex.

0 replies

asigalov61 · 2023-09-18T01:41:35Z

asigalov61
Sep 18, 2023
Maintainer

@zfarrell13 I have realized something here and I apologize for any confusion... I think for your project it is better to use my Lars Ulrich Transformer implementation/code. It has more suitable encoding for stem generation and it uses the same x_trainsformer module which is pretty fast.

If you do not mind, let's take our discussion there:
asigalov61/Lars-Ulrich-Transformer#1

Again, I am sorry for any confusion :)

Alex

1 reply

zfarrell13 Sep 21, 2023
Author

No probs man, I'm still in project planning phase. Will pop over to that repo with my next q's. Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

individual stems #2

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

individual stems #2

zfarrell13 Sep 17, 2023

Replies: 2 comments · 1 reply

asigalov61 Sep 17, 2023 Maintainer

asigalov61 Sep 18, 2023 Maintainer

zfarrell13 Sep 21, 2023 Author

zfarrell13
Sep 17, 2023

Replies: 2 comments 1 reply

asigalov61
Sep 17, 2023
Maintainer

asigalov61
Sep 18, 2023
Maintainer

zfarrell13 Sep 21, 2023
Author