Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

automated tests for module index CSV #1305

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

matebarabas
Copy link
Contributor

@matebarabas matebarabas commented Aug 16, 2024

Overview/Summary

This PR includes automated tests for module index CSVs.

This PR fixes/adds/changes/removes

  1. Adds ~50 Pester tests for each scenario of Bicep and TF module index CSVs for res, ptn and utl modules
  2. Adds a new GH workflow that is triggered when any of the module index CSV files change. See here.
  3. Changes the configuration of super-linter for PowerShell, so that it only enforces those rules that have the severity of "Error" and "Warning".
  4. Fixes broken links in documentation.

Closes #382

image

Breaking Changes

n/a

As part of this Pull Request I have

  • Read the Contribution Guide and ensured this PR is compliant with the guide
  • Checked for duplicate Pull Requests
  • Associated it with relevant GitHub Issues or ADO Work Items (Internal Only)
  • Ensured my code/branch is up-to-date with the latest changes in the main branch
  • Ensured PR tests are passing
  • Updated relevant and associated documentation (e.g. Contribution Guide, Docs etc.)

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs: Triage 🔍 Maintainers need to triage still label Aug 16, 2024
@matebarabas matebarabas added Type: Hygiene 🧹 things related to testing, issue triage etc. and removed Needs: Triage 🔍 Maintainers need to triage still labels Aug 16, 2024
@matebarabas matebarabas self-assigned this Aug 16, 2024
@matebarabas matebarabas added the Needs: Core Team 🧞 This item needs the AVM Core Team to review it label Aug 17, 2024
@matebarabas matebarabas marked this pull request as ready for review August 17, 2024 02:51
@matebarabas matebarabas requested a review from a team as a code owner August 17, 2024 02:51
@matebarabas matebarabas changed the title Feat/matebarabas/csv-checks automated tests for module index CSV Aug 20, 2024
$rawFile = Get-Content -Path $CsvFilePath
$csvHeaders = $csvContent[0].PSObject.Properties.Name

$issues = & gh issue list --limit 99999 --state all --json title,body | ConvertFrom-Json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good call to switch this implmentation from gh issue list to gh api as it allows you to add the --paginate parameter this will enable us to go beyond 1000 results.

You can see an example of that here (just ignore the PAT if).

For small use cases the classic GH API commands lik gh issue list seem good enough. But for everything where we might end up with a large number of results, the gh api seems to be the better alternative.

The $queryUrl in your case should be /issues.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to gh api as suggested. Please take a look and resolve when ready. Thanks!

Copy link
Contributor

@AlexanderSehr AlexanderSehr Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 😄

You switched it, though I'm not sure what to think about the {owner} & {repo} strings in this url "repos/{owner}/{repo}/issues?state=all&per_page=100" - seem like placeholders. I take it they're automatically replaced with the correct value, somehow? 😄
I'm more suprised about missing loop. I just ran the cmdlet and it returns 1.300 results. So far so good, yet I don't really understand why as we're not actively appending the $page to the url to loop through them. Seems like an implicit pagingation? If you have any details on that I'd be very curious.

$csvHeaders | Should -Be $requiredColumns
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Because of the headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this could be under the same condition under ptn modules, but kept it separate not only for readability, but for future proofing it (there's a good chance that this will fork and will have different columns.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may made a mistake and commented one line to high. I'm targeting line 67:

        It 'Should have at least 1 record' {
            $csvContent.Length | Should -BeGreaterOrEqual 1
        }

Comment on lines +110 to +114
$lineNumber = 2
foreach ($item in $csvContent) {
$item.ModuleName | Should -Not -BeNullOrEmpty -Because "ModuleName is a required field. In line #$lineNumber, this is empty. The following values have been provided: ""$($item)"""
$lineNumber++
}
Copy link
Contributor

@AlexanderSehr AlexanderSehr Aug 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note, if you write a test like this, it will fail on the first instance the Should returns $false. That means, if you have more than one such error, you will have to fix one after the other and re-run the test to learn about them all.

What would be better is to move the should out of the loop, collect all incorrect instances in the loop and then change the Should to something that says 'xzy list of incorrect lines should be empty'.

You can find an example here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is essentially true for every other test written this way.
It doesn't make your test wrong, so feel free to resolve. It just goes a long way to provide more meaningful results 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good suggestion; I'll look into this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment from every It block that loops through the CSV.

An alternative would've been to built an array of test cases outside the It test cases (say one for each csv line), and use the -TestCases parameter instead. This could've looked something like

$testCases = [System.Collections.ArrayList]@()
for ($index = 1; $index -lt $csvContent.Count; $index++) {
    $testCases += @{
        lineNumber = $index
        line       = $csvContent[$index]
        moduleName = $csvContent[$index].ModuleName
    }
}

It "Line [<lineNumber>] should not have any missing values in the 'ModuleName' column" -TestCases $testCases {
    param(
        $lineNumber,
        $line,
        $moduleName
    )  

    $moduleName | Should -Not -BeNullOrEmpty -Because "ModuleName is a required field. In line #$lineNumber, this is empty. The following values have been provided: ""$($line)"""
    
}

It 'Line [<lineNumber>] module [<moduleName>] Should only contain allowed characters' -TestCases $testCases {
    param(
        $moduleName,
        $lineNumber
    )  
    $moduleName | Should -Match '^[a-z0-9]+(?:[\/-][a-z0-9]+)*$' -Because "each segment of ModuleName should only contain lowercase letters, numbers and hyphens, in line #$lineNumber"
}

which would provide you with an output like

[+] Line [1] should not have any missing values in the 'ModuleName' column
[+] Line [2] should not have any missing values in the 'ModuleName' column
[+] Line [3] should not have any missing values in the 'ModuleName' column
[+] Line [4] should not have any missing values in the 'ModuleName' column
...
[+] Line [1] module [avm/res/aad/domain-service] Should only contain allowed characters
[+] Line [2] module [avm/res/alerts-management/action-rule] Should only contain allowed characters
[+] Line [3] module [avm/res/analysis-services/server] Should only contain allowed characters
[+] Line [3] module [avm/res/api-management/service] Should only contain allowed characters

The approach you've taken is absolutely fine, but moving the lines to the TestCases would simplify your tests (as you don't need the loops in them), it would provide a more detailed test trace and would ensure that the tests definitely run for all lines, as opposed to just stopping on the first line that doesn't meet a criteria (unless other measures are taken like desribed here).

Feel free to resolve if you deem it not worth it, but I at least wanted to call it out as it may save us time troubleshooting in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement suggestion! I'll check the idea and will decide based on effort needed.

Comment on lines 223 to 237
# Set-ItResult -Skipped

# $moduleNames = $csvContent.ModuleName
# $sortedModuleNames = $moduleNames | Sort-Object -Stable -Culture "en-US"
# for ($i = 0; $i -lt $moduleNames.Length; $i++) {
# $moduleNames[$i] | Should -Be $sortedModuleNames[$i] -Because "All items in the CSV should be alphabetically ordered, based on ModuleName"
# }

# $moduleNames = $csvContent.ModuleName
# $normalizedModuleNames = $moduleNames -replace '[/\-]', ''
# $sortedModuleNames = $normalizedModuleNames | Sort-Object -Stable -Culture "en-US"

# for ($i = 0; $i -lt $moduleNames.Length; $i++) {
# $normalizedModuleNames[$i] | Should -Be $sortedModuleNames[$i] -Because "All items in the CSV should be alphabetically ordered, based on normalized ModuleName"
# }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it. :) I wanted to remove this entire test case for now, since as it turns out, Excel orders strings separated with dashes, slashes, etc. in a very weird way. I couldn't figure out an easy way how PowerShell could apply the exact same algorithm for sorting, and hence, there was always a few differences. As the solution used to render the content on the AVM website is also ordering the item alphabetically, this is just cosmetics, and hence not worth the effort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least in case the forward slashes of Terraform, this was definitely the case. I may keep the Bicep solution for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs: Core Team 🧞 This item needs the AVM Core Team to review it Type: Hygiene 🧹 things related to testing, issue triage etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feedback]: Implement automated checks for the module index CSVs
3 participants