Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run db:migrate in pre- and post-install/upgrade (#18, #26) #37

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

angdraug
Copy link

No description provided.

angdraug and others added 3 commits January 15, 2023 15:30
As recommended in Mastodon release notes, run db:migrate with
SKIP_POST_DEPLOYMENT_MIGRATIONS=true in pre-install and pre-upgrade
hooks, and again without the flag in post-install and post-upgrade.

Co-authored-by: Sheogorath <[email protected]>
pre-install and pre-upgrade hooks run before the persistent ConfigMap
resources are installed. As suggested in helm/helm#8694, create a hook
with lower hook-weight and resource-policy=keep to make the same
ConfigMap available in pre- hooks.
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "mastodon.fullname" . }}-env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a second configmap with the same name and data?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Helm doesn't create the other configmap until after the pre-install hooks, and deletes this one after running pre-install hooks. It's a catch-22: for the mastodon-env configmap to be available when db-migrate job runs, it has to be created as a hook with a lower weight, but since it's created as a hook it gets cleaned up after db-migrate is done, so we also have to keep the non-hook version of the same configmap. See also commit message in a749654.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i didn't realize that! thanks for the clarification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some additional clarification: this is only about the first time the chart gets installed after that it will be there.

@angdraug
Copy link
Author

Not sure if it shows up for you, but GitHub tells me:

First-time contributors need a maintainer to approve running workflows. Learn more.

I don't think this PR is going to land until someone does the needful.

@paolomainardi
Copy link
Contributor

paolomainardi commented Jan 27, 2023

Please can you release this PR? It seems ok. Otherwise, this chart cannot be used within a highly automated environment with Terraform + Helm; the helm installation never ends, and the migration job is never triggered by Helm, making it impossible to use it for a new fresh install.

cc @dunn

@dunn
Copy link
Contributor

dunn commented Jan 27, 2023

I'm not actually a maintainer of this repo, so I can't merge.

@paolomainardi
Copy link
Contributor

I'm not actually a maintainer of this repo, so I can't merge.

Ops, so sorry, I saw your comments and just thought you was a maintainer too.

@paolomainardi
Copy link
Contributor

paolomainardi commented Jan 29, 2023

Just tried this PR, and it doesn't work, the migrate job requires PVC already created otherwise the job cannot be executed.

The question is, does the migration job requires the PVC ?

@paolomainardi
Copy link
Contributor

I tried again using a bucket instead of PVC, and now the problem is with the required redis instance, which should be up and running to finish the job.

@paolomainardi
Copy link
Contributor

I tried running the migration job along with the other deployments, and it worked fine; it is the same approach the GitLab chart is taking. The concept is to let the scheduler restart the services until the migration job finishes to initialize the services; once finished, the pods start to come up and work fine.

Gitlab migration job used as a reference: https://docs.gitlab.com/charts/charts/gitlab/migrations/

@renchap
Copy link
Member

renchap commented Feb 17, 2023

Thanks for your work on this @paolomainardi!

I tried running the migration job along with the other deployments, and it worked fine; it is the same approach the GitLab chart is taking. The concept is to let the scheduler restart the services until the migration job finishes to initialize the services; once finished, the pods start to come up and work fine.

How would this work for version upgrades? The pre-upgrade migrations needs to be run before any of the new version application pods, otherwise those can generate errors on some requests (when trying to access a table that has not been migrated yet), while their /health endpoint returns OK (it does not check the schema version).

I worry it will create user-facing errors during the migration, or even a server to become unavailable if the migration does not happen and all pods are upgraded to the new version.

@paolomainardi
Copy link
Contributor

@renchap yes, you're right; the issue with this approach is that users can face problems while migrations are running.

This issue can only be overcome by just using Helm, and the best choice is to avoid running them as is doing this chart and move most of the complexity to the application side.

Always looking how Gitlab does, they open sourced the database migration types they support: https://docs.gitlab.com/ee/development/migration_style_guide.html

The case for Mastodon is "Regular migrations" which according to their document must be always under 3 minutes if higher than must be moved on post-deployment or background migrations.

Is not very clear indeed, what it happens during the 3-minutes window, maybe the migrations are always written in a way that prev/next releases are always compatible.

This is migration helm chart documentation: https://docs.gitlab.com/charts/charts/gitlab/migrations and from my direct experience, they run along the other deployments.

@jessebot
Copy link
Contributor

jessebot commented Jul 4, 2023

is there any chance this could be moved forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants