Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change KindIdentifier into a struct to handle unknown symbol kinds #7

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

QuietMisdreavus
Copy link
Contributor

@QuietMisdreavus QuietMisdreavus commented Dec 7, 2021

Resolves rdar://84276085

This PR changes the type definition of KindIdentifier from an enum into a struct with static fields for common values. This has two effects:

  1. Symbol kinds that don't match the currently-known set can still be properly handled and parsed, by saving the identifier string from the symbol graph.
  2. Adding new symbol kinds to the set of known kinds is no longer a breaking change for downstream users; matching KindIdentifier now requires a case default branch to handle unknown values, instead of simply matching on the .unknown enum case.

Package.swift Outdated Show resolved Hide resolved
Copy link
Contributor

@franklinsch franklinsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuietMisdreavus I think it would be useful to document why we're keeping track of "unknown" kinds, e.g., why this information would be useful for clients.

Also, since the unknown kinds are tracked in a static context, that state would be shared across different parsing instances of symbol graphs. Do you see potential issues with that approach?

Sources/SymbolKit/SymbolGraph/Symbol/Symbol.swift Outdated Show resolved Hide resolved
Sources/SymbolKit/SymbolGraph/Symbol/Symbol.swift Outdated Show resolved Hide resolved
Sources/SymbolKit/Utility/Synchronization.swift Outdated Show resolved Hide resolved
@QuietMisdreavus
Copy link
Contributor Author

Also, since the unknown kinds are tracked in a static context, that state would be shared across different parsing instances of symbol graphs. Do you see potential issues with that approach?

This could cause issues if multiple unrelated symbol graphs are handled in the same process, since there's no way to wipe out this cache without exiting the process entirely. An alternative would be to have this in a globally-accessible static cache, that each symbol graph would then fill and wipe out as needed, or that could allow users to manually clear it as desired. It would make logical sense to store this at the symbol graph level, but the isolation of the Decodable protocol conformance makes it difficult to relate a symbol kind to the symbol or graph it's appearing on (see SR-15570 which i filed yesterday).

That said, i don't see an issue with this is Swift-DocC's access patterns persist, since it's likely that the set of symbol graphs being handled at the same time are related by language or API. (Any other consumer may run into the aforementioned issues, especially if they choose to adopt a "persistent process" model.)

@QuietMisdreavus
Copy link
Contributor Author

@swift-ci Please test

@franklinsch
Copy link
Contributor

I see the value in keeping track of the unknown kinds' string name in KindIdentifier, but I'm not sure I understand the value of keeping track of them in a static store. In what cases would a client need that? If clients need that, we can potentially provide this information as a by-product of symbol graph decoding, or via an API that walks through symbol graph values and collects the kinds, instead of writing in a global static context. I think writing to a static context puts us in a bit of a design corner because for example, it could limit clients to perform concurrent decodings of unrelated symbol graphs in the same process space.

@QuietMisdreavus
Copy link
Contributor Author

The static store isn't intended for client use; it's mainly to cut down on string allocations in situations where a large symbol graph has many of the same symbol kind that SymbolKit doesn't already parse. It's a performance optimization. So, in effect the problem i mentioned earlier isn't that bad, since it would cause that static cache to hang around longer than necessary and create some memory bloat of its own. I could wire this into swift-docc and run some memory comparisons with and without the cache; that way we can see how much of an "issue" it's actually resolving.

@franklinsch
Copy link
Contributor

Ah I see—in that case I'd be curious about the memory investigation as well, yes. I'd also be interested in seeing if there's a time performance hit, since accesses to the static cache are synchronized using a lock, which might be noticeable for clients that decode symbol graph files concurrently.

@QuietMisdreavus
Copy link
Contributor Author

QuietMisdreavus commented Dec 10, 2021

I ran a few benchmarks on my machine using swift-docc's benchmark.swift and benchmark-diff.swift scripts. I tried it with the current main branch of SymbolKit, then ran it with this branch as-is and then edited to take out the synchronized collection. The comparisons look like this:

Between main and this branch:

+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 288.22               | 288.22               |
| Duration for 'bundle-registration' (sec) | +1.04%     | 11.48                | 11.60                |
| Duration for 'convert-action' (sec)      | +1.14%     | 16.08                | 16.26                |
| Peak memory footprint (MB)               | +0.96%     | 999.10               | 1008.66              |
| Topic Anchor Checksum                    | no change  | 9a675b9ad6d69f8b7f0c | 9a675b9ad6d69f8b7f0c |
| Topic Graph Checksum                     | no change  | 665713509084b5131a37 | 665713509084b5131a37 |
+-----------------------------------------------------------------------------------------------------+

Between the main branch and a simplified version without the static cache:

+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 288.22               | 288.22               |
| Duration for 'bundle-registration' (sec) | +0.79%     | 11.48                | 11.57                |
| Duration for 'convert-action' (sec)      | +1.06%     | 16.08                | 16.25                |
| Peak memory footprint (MB)               | +0.54%     | 999.10               | 1004.54              |
| Topic Anchor Checksum                    | no change  | 9a675b9ad6d69f8b7f0c | 9a675b9ad6d69f8b7f0c |
| Topic Graph Checksum                     | no change  | 665713509084b5131a37 | 665713509084b5131a37 |
+-----------------------------------------------------------------------------------------------------+

Between this branch and the edited version without the static cache:

+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 288.22               | 288.22               |
| Duration for 'bundle-registration' (sec) | -0.24%     | 11.60                | 11.57                |
| Duration for 'convert-action' (sec)      | -0.07%     | 16.26                | 16.25                |
| Peak memory footprint (MB)               | -0.41%     | 1008.66              | 1004.54              |
| Topic Anchor Checksum                    | no change  | 9a675b9ad6d69f8b7f0c | 9a675b9ad6d69f8b7f0c |
| Topic Graph Checksum                     | no change  | 665713509084b5131a37 | 665713509084b5131a37 |
+-----------------------------------------------------------------------------------------------------+

It looks like having the associated data in there is still adding a bit of a performance hit. The test bundle i used was generated by DocC's make-test-bundle utility, so it's not properly using the static cache (since it uses generated Swift symbol graphs, and all of the symbol kinds it generates are already parsed by SymbolKit).

@franklinsch
Copy link
Contributor

Could you please clarify the difference between "a simplified version without the static cache" and "the edited version without the static cache"? I'm not sure I understand the distinction. It's possible that adding an associated value to the enum is indeed incurring a performance penalty.

By the way, I think another way to model unknown kinds would be to actually make KindIdentifier a struct with static properties instead:

public struct KindIdentifier: Equatable, Hashable, Codable {
    private var identifier: String

    public init(identifier: String) {  }

    public static let `associatedtype`: Self = .init(identifier: "associatedtype")
    
}

And we'd have a static property with all the cases that SymbolKit provides, if we need that. This is similar to how SwiftDocC.SourceLanguage and others are modeled. I doubt this would resolve the performance hit, though, but it could provider better API ergonomics to clients.

@QuietMisdreavus
Copy link
Contributor Author

Could you please clarify the difference between "a simplified version without the static cache" and "the edited version without the static cache"? I'm not sure I understand the distinction.

They're the same; apologies for the confusion.

I'll update the branch to use the model that you proposed, and run some more performance tests with it.

@QuietMisdreavus
Copy link
Contributor Author

It looks like storing kind identifiers as a struct instead of an enum gives a slightly better result, though still a bit of a slowdown from main:

+-----------------------------------------------------------------------------------------------------+
| Metric                                   | Change     | Before               | After                |
+-----------------------------------------------------------------------------------------------------+
| Compiled output size (MB)                | no change  | 288.22               | 288.22               |
| Duration for 'bundle-registration' (sec) | +2.08%     | 11.17                | 11.40                |
| Duration for 'convert-action' (sec)      | +2.65%     | 15.68                | 16.09                |
| Peak memory footprint (MB)               | +0.73%     | 990.09               | 997.30               |
| Topic Anchor Checksum                    | no change  | 9a675b9ad6d69f8b7f0c | 9a675b9ad6d69f8b7f0c |
| Topic Graph Checksum                     | no change  | 665713509084b5131a37 | 665713509084b5131a37 |
+-----------------------------------------------------------------------------------------------------+

@Kyle-Ye
Copy link
Contributor

Kyle-Ye commented Dec 15, 2021

However, introducing this class required updating the swift-tools version of this project from 5.2 to 5.5 and placing minimum-version requirements on macOS and iOS.

Can you make it clear why we have to update the swift-tools version into 5.5 and placing minimum-version requirements on macOS and iOS if we introduce the class?

And it seems if we update the Package.swift version to 5.5, someone who build SymbolKit using a lower version will be broken.

Can we just use the [email protected] file which is already merged by #6

@QuietMisdreavus
Copy link
Contributor Author

If we're taking out the static cache entirely (which it seems like we might), i could remove that requirement. I'd put it in as a copy/paste from Swift-DocC, since the APIs it uses on macOS and iOS needed a certain OS version (and i assume the way it was specified, in the Package.swift, had a swift-tools requirement - i haven't verified that assumption, though). Since Franklin's on vacation for now, i'll update this PR with my most recent branch (refactoring KindIdentifier into a struct with static properties) so you can take a look.

@QuietMisdreavus
Copy link
Contributor Author

Note that the changes in the most recent commit will break downstream code (now that there is no .unknown member), so we'll have to synchronize a change in Swift-DocC if this is the version we want to land.

@Kyle-Ye
Copy link
Contributor

Kyle-Ye commented Dec 15, 2021

Second suggestion:
I think we can use let kind: KindIdentifier? here

public static func isKnownIdentifier(_ identifier: String) -> Bool {
       var kind: KindIdentifier? = nil     

@Kyle-Ye
Copy link
Contributor

Kyle-Ye commented Dec 15, 2021

Also I suggest this change to be rebased onto #8, so that we can track the change more easily rather on a giant Symbol.swift file

Copy link
Contributor

@Kyle-Ye Kyle-Ye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kyle-Ye
Copy link
Contributor

Kyle-Ye commented Mar 2, 2022

Hello @QuietMisdreavus, is there any progress on this PR?
Can we update it and ask for franklinsch's approval so that this can be merged?

@QuietMisdreavus QuietMisdreavus changed the title store a set of unknown symbol kinds to prevent loss of data change KindIdentifier into a struct to handle unknown symbol kinds Apr 7, 2022
case .module: return "module"
case .unknown: return "unknown"
}
private init(rawIdentifier: String) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in making this public so clients can create new symbol kinds without updating SymbolKit itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it looks like we have a public initializer below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The public initializer first checks against the known symbol kinds, then returns the original kind identifier string unchanged if it doesn't match any of the known ones. This at least can keep down on string allocations that are stored in the final symbol graph, if it uses known symbol kinds. In the end, it does defer to this rawIdentifier initializer, either via a static property or by storing the unknown kind string.

@QuietMisdreavus
Copy link
Contributor Author

I've rebased the PR to resolve the merge conflict (it was the addition of the snippet and snippetGroup kinds).

@swift-ci Please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants