This repository holds the terraform code, terragrunt code and configuration for the DataHub helm-chart as used in DFDS.
If starting from scratch, edit "remote_state.config.bucket" in terraform/terragrunt/dev/terragrunt.hcl to be an unique value (S3 naming limitation). Otherwise leave as-is.
cd terraform/terragrunt/dev
terragrunt init
terragrunt apply
You can retrieve the hostnames and passwords by running
terragrunt output -json
This is also what the CI/CD pipeline uses to pass the values on to the helm chart.
We use mostly managed prerequisites, which includes
- An EKS cluster (org-wide)
- A kafka cluster (org-wide)
- AWS Elastic Search
- AWS RDS managed MySQL
The only self-managed service is Confluent Schema Registry, which runs in EKS.
See the terraform code for more details. We don't use a graph database such as Neo4j.
Our configuration makes these alterations from the defaults:
- OIDC for authentication, which syncs with an LDAP directory in our organization
- Elastic Search instead of Neo4j for the graph search functionality
- Custom topic names and group ids for kafka, to abide by ACL authorization in kafka
CI/CD is set up with Azure Pipelines, see the pipeline definition for details. Some configuration values, such as the k8s service connection and the kafka settings, must be configured manually in Azure Pipelines.
The flow is roughly like this:
- Upgrade infrastructure with terragrunt for the dev environment
- If successful, get terraform output and replace in secrets and values.
- Run
helm upgrade
against the k8s cluster - Repeat 1-3 for prod environment
- Read the release notes of all versions between the current and the desired and see if there are breaking changes that must be taken into account. The DataHub Helm Chart release notes can be found here and the DataHub release notes can be found here.
- Update the dataHubHelmChartVersion to the desired version.
- Update the DataHub Helm Chart Values YAML file with the corresponding versions of the different components.
- Deploy.
This should be revisited when UI-based ingestion is implemented
Starting from v0.8.26, UI-based ingestion is possible in DataHub. However, the feature is still quite young and the documentation is scarce.
We have found, that the current accepted workaround for making the datahub-actions
pod work, is
to:
- Manually specify the configurations for Kafka under
extraEnvs
for the container, without aSPRING_
prefix (same configurations as GMS are needed) - Find some way to change the Kafka topic names to our custom ones for this container too
These things are difficult right now, because the code has not been open sourced yet. Therefore, the decision is made to hold off on implementing this until it is a bit more straight forward.