-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support disallowing inconsistent metadata in cli-migrations images (close #10599) #10602
base: master
Are you sure you want to change the base?
Conversation
@@ -77,7 +87,7 @@ if [ -d "$HASURA_GRAPHQL_METADATA_DIR" ]; then | |||
echo "version: 3" > config.yaml | |||
echo "endpoint: http://localhost:$HASURA_GRAPHQL_MIGRATIONS_SERVER_PORT" >> config.yaml | |||
echo "metadata_directory: metadata" >> config.yaml | |||
hasura-cli metadata apply | |||
hasura-cli metadata apply $HASURA_GRAPHQL_DISALLOW_INCONSISTENT_METADATA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question for the codeowners - is there a good reason that the v3 image applies metadata updates before applying db migrations, while v2 does them the other way around?
I suppose either order might result in temporary metadata inconsistencies if DB updates and metadata updates are bundled into the same release. if you're dropping a column or table, you probably want metadata applied first; if you're adding a column or table, you probably want migrations applied first -- so we probably need to accept some unavoidable ICs either way. just checking to see if there's a good reason that we're picking one side for v2 and another for v3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update on this - I went ahead and modified this script so that it has the same ordering (and thus the same behavior) as v2. The need for this became more clear after my updates to the test scripts, which revealed that the v2 test could pass while the v3 test would fail under the same circumstances, because the current v3 setup relies on DB migrations being run before metadata can be applied, in order to remain strictly consistent.
…ow use of the --disallow-inconsistent-metadata flag when applying metadata (fixes hasura#10599)
…en new flag is set
faa2c1f
to
034fd38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @chardo, Thanks for the PR! Sorry that we couldn't take a look at it earlier.
It looks good overall 👍 The one thing I am a bit doubtful about is the change of execution order of the metadata and migrations command in the v3 cli migrations image.
I feel like there was a reason to preserve the execution order when we originally introduced it but I am unable to find the reason behind it. So I am going to ask around internally (cc: @scriptonist) to see if I can find the original reason for this. Please give me a little more time to arrive at a conclusion on this matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason behind running the metadata command before the migrations command in the v3 cli migrations image seems to be:
- CLI uses the run_sql API to apply migrations of a database.
- The information about "connected" databases is in hasura metadata.
- So, if we are trying to apply migrations on a database A, it should already be connected to hasura. Otherwise hasura will throw an error saying it does not know what A database is.
- Because of this reason, we apply metadata first.
- ie Let hasura know about A first, though there might be some metadata inconsistencies if we are tracking tables from A etc
- Now, migrations can be applied without any errors.
So let us avoid changing the order of the commands in the v3 cli migrations image.
@scriptnull great, thanks so much for the review and the context! I can change the v3 docker entrypoint script on this branch so that it preserves the existing order, and then I'll re-run the updated test suites to make sure they still pass. Just want to point out, though, that this will probably result in different behavior between v2 and v3 when disallowing inconsistent metadata. In v2, running migrations first means you can create a new table and define metadata for that table in the same change (since the table will exist by the time the I'm totally fine with this (it preserves existing behavior, and the flag is opt-in rather than being on by default) but just wanted to mention it because it might be confusing for others. Do you think it's worth clarifying this difference in the docs in some way? |
Just to follow up on my previous comment: I just ran the test script on the v3 image inconsistent metadata disallowed, and confirmed that the graphql-engine container can't start up from scratch because of the ordering of metadata application/migrations (the engine can't apply metadata that references a table which hasn't been created yet). So as is, the baseline v3 tests are failing with the step ordering reverted to the original state. I think I can get the tests working properly by starting up the test gql-engine container without disallowing ICs, applying metadata and migrations, then restarting it with the Again, happy to do all this, but wanted to follow up with a confirmation that I'm seeing the predicted consequences of the current ordering and double check whether you have any concerns with my planned approach. I'll hold off on getting too deep into fiddling with tests for now, in case you have any other ideas. |
Description
The Hasura CLI's
hasura metadata apply
command supports a--disallow-inconsistent-metadata
flag, which helps to prevent breaking changes to metadata before they're applied rather than discovering them throughhasura metadata ic list
or, worse, via runtime application errors. However, in production environments, it's common to deploy graphql-engine using thecli-migrations
Docker image and avoid exposing the metadata API entirely. This means that CI/CD workflows have no guaranteed way of preventing inconsistent metadata from landing in production, which is made even riskier by the fact that metadata changes will be automatically picked up by any already-running instances, even if that metadata is inconsistent.This change attempts to address that by exposing an optional
HASURA_GRAPHQL_DISALLOW_INCONSISTENT_METADATA
env variable that can be provided to thecli-migrations
Docker images in order to activate the corresponding--disallow-inconsistent-metadata
flag on thehasura metadata apply
step. If this is set, metadata application will fail, thedocker-entrypoint.sh
script will exit early, and the container will fail to start up.Also, just to say: I know that I opened this PR before I got any traction on the associated issue. If there's good reason for not implementing this change, I will understand and won't mind throwing this work away.
Changelog
Component : build
Type: feature
Product: community-edition
Short Changelog
Add support for disallowing inconsistent metadata in
cli-migrations
imageLong Changelog
Related Issues
#10599
#8095
Solution and Design
This follows the existing pattern for configuration env vars in the
docker-entrypoint.sh
script(s), though is strict in requiring the value of the new variable to be "true" (case insensitive) rather than it just being set to any value.In this first draft, I've not made any effort to fail gracefully in the event of inconsistent metadata. Ideally I think we'd probably capture the exit code, shut down the temporary graphql-engine server, and then exit with the original code. That said, this implementation is consistent with how the script already handles possible non-zero exit codes for the
hasura-cli
commands (eg if the server is unreachable, or the metadata contains invalid YAML).Zooming out a little bit, it's maybe also worth mentioning that I've deliberately chosen to isolate this feature to the
cli-migrations
image, rather than making it a server-level config variable that would change the graphql-engine's default behavior when receiving new metadata updates. The latter seems a bit far-reaching, and I'd rather leverage an existing API than broaden/complicate its scope in a significant way.Steps to test and verify
I updated the existing test scripts so that, after confirming the "good" behavior works as intended, they also attempt to apply some inconsistent metadata and then confirm that the docker image is unable to start up.
I couldn't find any tools/docs for running tests locally, but I was able to get both test scripts running and passing locally by:
hasura-cli
cli-migrations
images manuallyI did set the new disallow-inconsistent-metadata flag to "true" on both of the test docker-compose.yaml files, so that I could just augment the existing test files. If you'd prefer to have this test run in a separate, isolated file with a different env configuration, I'm willing to do that too! Just thought this was a simpler first revision.
Limitations, known bugs & workarounds
Server checklist
Catalog upgrade
Does this PR change Hasura Catalog version?
Metadata
n/a
Does this PR add a new Metadata feature?
run_sql
auto manages the new metadata through schema diffing?run_sql
auto manages the definitions of metadata on renaming?export_metadata
/replace_metadata
supports the new metadata added?GraphQL
Breaking changes
No Breaking changes
There are breaking changes:
Metadata API
Existing
query
types:args
payload which is not backward compatibleJSON
schemaGraphQL API
Schema Generation:
NamedType
Schema Resolve:-
null
value for any input fieldsLogging
JSON
schema has changedtype
names have changed