Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move longhaul to the same Azure subscription we use for E2E tests #167

Closed
artursouza opened this issue Mar 13, 2023 · 2 comments
Closed
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@artursouza
Copy link
Member

Longhaul tests are still running in a subscription that is only accessible by Msft employees. Both, release and nightly environments, should be moved to the same Azure subscription we use for E2E tests.

Child of: #156

@tmacam
Copy link
Contributor

tmacam commented Oct 31, 2023

Listing steps performed as a log of what was done and to serve as reference/documentation in case this needs to be redone.

Intro

  • Applications will live in the aks-longhaul-release and aks-longhaul-weekly subscription
  • We are not using the Azure-hosted grafana or prometheus yet - that's planned for the future.

Setup environment and credentials

# From https://github.com/dapr/test-infra/pull/203 and https://github.com/dapr/test-infra/blob/master/README.md

export SUBSCRIPTION_TO_BE_USED=INSERT_SUBSCRIPTION_UUID_HERE
export release_or_weekly='release' # use 'weekly' for weekly
export resourceGroup="aks-longhaul-${release_or_weekly}"
export DAPR_VERSION_TO_INSTALL='1.12.0'
export location=eastus
export clusterName=$resourceGroup
export MONITORING_NS=dapr-monitoring

Login to OSS subs

# First, loging on Dapr OSS subscription on your default browser

# Then, login on az CLI
az account clear && az login --output=none && az account set --subscription ${SUBSCRIPTION_TO_BE_USED}

Create new subscriptions

az group create --name ${resourceGroup} --location ${location}

Deploy clusters

az deployment group create \
    --resource-group ${resourceGroup} \
    --template-file ./deploy/aks/main.bicep \
    --parameters deploy/aks/parameters-longhaul-${release_or_weekly}.json

Remove Dapr AKS extension

# We want to manually control Dapr setup, so let's remove the Azure-controlled Dapr ext.
az k8s-extension delete --yes \
    --resource-group ${resourceGroup} \
    --cluster-name ${clusterName} \
    --cluster-type managedClusters \
    --name ${clusterName}-dapr-ext

Get cluster credentials

az aks get-credentials --admin --name ${clusterName} --resource-group ${resourceGroup}

Install latest stable on both clusters through helm

# Just for good measure...
dapr uninstall -k

# Now to the helm chart upgrade
helm repo update && \
helm upgrade --install dapr dapr/dapr \
    --version=${DAPR_VERSION_TO_INSTALL} \
    --namespace dapr-system \
    --create-namespace \
    --wait

Bounce the apps (we just re-installed Dapr)

for app in "feed-generator-app" "hashtag-actor-app" "hashtag-counter-app" "message-analyzer-app" "pubsub-workflow-app" "snapshot-app" "validation-worker-app" "workflow-gen-app"; do
    kubectl rollout restart deploy/${app} -n longhaul-test || break
done 

Setup monitoring namespace (next steps require this)

# From https://github.com/dapr/test-infra/blob/master/.github/workflows/dapr-longhaul-weekly.yml
kubectl get namespace | grep ${MONITORING_NS} || kubectl create namespace ${MONITORING_NS}

Install Prometheus through helm chart

# Following https://docs.dapr.io/operations/observability/metrics/prometheus/#setup-prometheus-on-kubernetes

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts && \
helm repo update && \
helm install dapr-prom prometheus-community/prometheus \
    --namespace dapr-monitoring \
    --create-namespace \
    --wait

Install Prometheus custom setting

This is being bypassed as we fixed dashboard code in dapr/dapr#7121. There is no need
to install custom prometheus setting. Rejoice.

Install Grafana through helm chart

    # https://docs.dapr.io/operations/observability/metrics/grafana/#setup-on-kubernetes
    helm repo add grafana https://grafana.github.io/helm-charts && \
    helm repo update && \
    helm upgrade --install grafana grafana/grafana \
        --values ./grafana-config/values.yaml \
        --namespace ${MONITORING_NS} \
        --create-namespace \
        --wait && \
    kubectl get pods -n ${MONITORING_NS}

Configure grafana

Steps here are basically just following the steps described on https://docs.dapr.io/operations/observability/metrics/grafana/#configure-prometheus-as-data-source

Log in to grafana

kubectl get secret --namespace dapr-monitoring grafana -o jsonpath={.data.admin-password} | base64 --decode | clip.exe
kubectl port-forward svc/grafana 8080:80 --namespace dapr-monitoring

Register prometheus datasource

Just follow https://docs.dapr.io/operations/observability/metrics/grafana/#configure-prometheus-as-data-source

Import dashboards (from #7121)

Use the code from dapr/dapr#7121 or, if it is merged, from https://github.com/dapr/dapr/blob/master/grafana/

Remember: cat ... | clip.exe or cat ... | pbcopy is your friend.

Create credentials for both clusters

Initial checks:

  • Are they created by default by the bicep template? - NO

  • Check curent permissions. - They are too wide.

  • Check workflows for how they log in currently -- we want to avoid changing them much

    az login --service-principal -u ${{ secrets.AZURE_LOGIN_USER }} -p ${{ secrets.AZURE_LOGIN_PASS }} --tenant ${{ secrets.AZURE_TENANT }} --output none
  • In summary, we want a subscription-wide Principal that grants the corect role in/for those two clusters only.

    • Ideally we would even prefer to have distinct principals/crendials to each cluster, automatically created by Bicep templates and such, but to speed things up and keep things closer to the current step, this is how we will proceed.

Create service Principal

Role: Azure Kubernetes Service Cluster User Role

  • https://learn.microsoft.com/en-us/azure/aks/control-kubeconfig-access#available-cluster-roles-permissions
    • Notice this Role is enough because we use az aks get-cretentials without the --admin flag. If we used that flag in out commands we would need the Azure Kubernetes Service Cluster Admin Role role that grants access to the Microsoft.ContainerService/managedClusters/listClusterAdminCredential/action API.
  • Grant this role to the created service principal in both clusters
  • Access Control > Add Role Assignment
  • aks-longhaul-weekly
    • Role: Azure Kubernetes Service Cluster User Role
    • Scope: /subscriptions/<DAPR_OSS_SUBSCRIPTION>/resourceGroups/aks-longhaul-weekly/providers/Microsoft.ContainerService/managedClusters/aks-longhaul-weekly
    • Members
      • Name: test-infra-github-actions-longhaul-cluster-admin
      • Object ID: SUPPRESSED
      • Type: App
    • Description: Grant GitHub Actions on test-infra permission to az aks get-cretentials to this k8s cluster
  • aks-longhaul-release
    • Role: Azure Kubernetes Service Cluster User Role
    • Scope: /subscriptions/<DAPR_OSS_SUBSCRIPTION>/resourceGroups/aks-longhaul-release/providers/Microsoft.ContainerService/managedClusters/aks-longhaul-release
    • Members
      • Name: test-infra-github-actions-longhaul-cluster-admin
      • Object ID: SUPPRESSED
      • Type: App
    • Description: Grant GitHub Actions on test-infra permission to az aks get-cretentials to this k8s cluster

Update release and weekly workflow to work on new cluster

Test by creating credentials in personal fork (tmacam/dapr-test-infra)

Update secrets on GH with new credentials

Updated secrets AZURE_TENANT, AZURE_LOGIN_USER, AZURE_LOGIN_PASS with values that point to service principal credential created above on 2023-11-06 17:10 PST.

Verify clusters and workflows are working as expected

Remove clusters in internal subscription

This is tracked separately, in issue #210.

tmacam added a commit to tmacam/dapr-test-infra that referenced this issue Nov 1, 2023
This commit updates the workflows to use the clusters configured
in the OSS subscription. Those clusters were created as documented
in dapr#167.

Those clusters live in distinct subscription and have distinct names
as the current ones -- hence the need to update the workflows.

Additionally, because those clusters use Bicep-configure Azure-hosted
state store, binding and pubsub components, there is no need to
configure Redis, Kafka or setup individual components.

Signed-off-by: Tiago Alves Macambira <[email protected]>
tmacam added a commit to tmacam/dapr-test-infra that referenced this issue Nov 2, 2023
This commit updates the workflows to use the clusters configured
in the OSS subscription. Those clusters were created as documented
in dapr#167.

Those clusters live in distinct subscription and have distinct names
as the current ones -- hence the need to update the workflows.

Additionally, because those clusters use Bicep-configure Azure-hosted
state store, binding and pubsub components, there is no need to
configure Redis, Kafka or setup individual components.

Signed-off-by: Tiago Alves Macambira <[email protected]>
berndverst pushed a commit that referenced this issue Nov 6, 2023
This commit updates the workflows to use the clusters configured
in the OSS subscription. Those clusters were created as documented
in #167.

Those clusters live in distinct subscription and have distinct names
as the current ones -- hence the need to update the workflows.

Additionally, because those clusters use Bicep-configure Azure-hosted
state store, binding and pubsub components, there is no need to
configure Redis, Kafka or setup individual components.

Signed-off-by: Tiago Alves Macambira <[email protected]>
@tmacam
Copy link
Contributor

tmacam commented Nov 7, 2023

Added screenshot of the current status of the dashboard to #210.

I am closing this issue as the transition was done: the clusters are running in the OSS subscription and GitHub workflows trigger actions on these clusters. The removal of the old longhaul clusters is tracked separately on #210.

@tmacam tmacam closed this as completed Nov 7, 2023
@tmacam tmacam added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants