Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Imaging Data Commons #1450

Merged
merged 6 commits into from
Feb 15, 2024

Conversation

psavery
Copy link
Collaborator

@psavery psavery commented Jan 31, 2024

The NCI's Imaging Data Commons is a big repository (>38k studies) for cancer research. For the DICOMweb server, it uses Google's Cloud Healthcare API behind a proxy. This DICOMweb server sometimes behaves differently than the dcm4chee server we have been testing with.

This PR fixes a couple of issues we encountered. One of which is that we cannot use the SOPClassUID as a search filter (even though the DICOMweb standard indicates that it should be supported when searching for instances). We can perform manual filtering instead, however.

This PR also adds a default limit to the import page (which is required for importing from IDC), and a default search filter that specifies SM. Without the search filter, we end up importing a lot of non-WSI datasets. With the SM search filter, most of the imported datasets look correct, and all are viewable.

Also needed to support the IDC: imi-bigpicture/wsidicom#149

@psavery psavery force-pushed the dicomweb-idc-support branch 4 times, most recently from 683443b to 218eed5 Compare February 1, 2024 19:28
@psavery psavery marked this pull request as ready for review February 6, 2024 17:57
@psavery psavery force-pushed the dicomweb-idc-support branch from 218eed5 to f0e3617 Compare February 6, 2024 18:02
@psavery
Copy link
Collaborator Author

psavery commented Feb 6, 2024

Even though we still need imi-bigpicture/wsidicom#149 to support IDC, this is ready for review anyways, because it doesn't make any breaking changes.

@psavery
Copy link
Collaborator Author

psavery commented Feb 12, 2024

imi-bigpicture/wsidicom#149 was merged, so this is definitely ready!

@psavery psavery force-pushed the dicomweb-idc-support branch 2 times, most recently from 8f650e5 to 44b4db6 Compare February 13, 2024 16:08
Google Healthcare API, used by Imaging Data Commons, does not allow
filtering by SOPClassUID. So we cannot use that in the search filter.
We should think of alternatives so that we can include only WSI results.

These changes were needed alongside [this wsidicom PR](imi-bigpicture/wsidicom#149)
in order to view an example dataset.

The following were used for testing:

url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.25644321580420796312527343668921514374'
series_uid = '1.3.6.1.4.1.5962.99.1.3205815762.381594633.1639588388306.2.0'

Signed-off-by: Patrick Avery <[email protected]>
These work well for importing examples from the Imaging Data Commons.

It takes a while to import, so a limit of 10 series seems reasonable
for now.

Also, it's good to have a default filter of "Modality": "SM", because
otherwise, we would mostly receive non-WSI series.

Signed-off-by: Patrick Avery <[email protected]>
We were previously just performing a `search_for_series()` and applying
the limit and filters to that search.

However, it is probably more intuitive for users to be searching for
studies, rather than series. So we are now performing a `search_for_studies()`
first, applying the limit and filters to this, and then locating all
series within those studies, and proceeding from there.

Signed-off-by: Patrick Avery <[email protected]>
We are now searching by studies, not series. The tests need to be
fixed to take this into account.

Signed-off-by: Patrick Avery <[email protected]>
Importing the first 10 studies on IDC has been taking around 20 minutes.
About 65% of that time has been spent on inferring file sizes.

Even though we don't stream the file data, the request from which we get
the content length must be taking some time on the server. Skip doing
this for now. We can add it back in if we figure out a way to make it much
faster.

Signed-off-by: Patrick Avery <[email protected]>
@psavery psavery force-pushed the dicomweb-idc-support branch from 44b4db6 to e471abe Compare February 13, 2024 17:48
@psavery psavery merged commit 5c95411 into girder:master Feb 15, 2024
14 checks passed
@psavery psavery deleted the dicomweb-idc-support branch February 15, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants