Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PFlotran Example #72

Open
9 tasks
johnkit opened this issue Mar 25, 2024 · 1 comment
Open
9 tasks

PFlotran Example #72

johnkit opened this issue Mar 25, 2024 · 1 comment

Comments

@johnkit
Copy link
Collaborator

johnkit commented Mar 25, 2024

PFLOTRAN is a hydrology code used by scientists at DOE Labs for watershed analysis and related research. It has a custom results/output format that is not directly compatible with Xarary. Using a "reference file system", we can access PFlotran data. I am adding an example here as a starting point for some additional features we should consider developing.

To reproduce the screenshot.

1. Create a python venv and install these packages

  • pyvista-xarray
  • zarr
  • fsspec
  • requests
  • aiohttp

2. Run the python script pasted below and pass in the attached json file, e.g.

> python test_small.py  pfsmall-gcp.json

pfsmall-gcp.json

import argparse
import sys

import pyvista as pv
import pvxarray
import xarray as xr


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('ref_filesystem', help='path (or url) of reference file system (.json)')
    args = parser.parse_args()

    ds = xr.open_dataset('reference://', engine='zarr', backend_kwargs={
        'consolidated': False,
        'storage_options': {'fo': args.ref_filesystem}
    })
    print(ds)

    var_list = list(ds.keys())
    if not var_list:
        print('No data arrays')
        sys.exit()
    print(var_list)

    var = 'Total_Fe+++ [M]'
    if var not in var_list:
        print(f'Cannot display {var}')
        sys.exit()

    tcoord = ds.coords.get('time')
    last_time_index = len(tcoord) - 1

    da = ds[var]
    dat = da[dict(time=last_time_index)]
    print(dat)

    # All important transpose (which copies data - sigh)
    dat_T = dat.transpose()

    print()
    mesh = dat_T.pyvista.mesh(x='x_cell [m]', y='y_cell [m]', z='z_cell [m]')
    print(mesh)
    p = pv.Plotter()
    p.add_mesh_clip_plane(mesh, show_edges=False)
    p.add_axes()
    p.show()

FYI the PFlotran data file is an hdf5 file stored in Google Cloud bucket. It is a small (67MB) subset of a much larger dataset that Dipankar at Berkely Lab provided to us last year.

Candidate features/updates

From my notes Friday 22-Mar-2024 meeting with Dipankar and Chuyang

  • I presume the same code will "just work" with pyvista[trame] and Jupyter notebooks, but I haven't verified.
  • The dataset has an integer "Material_ID" field. Cells set to zero should be invisible.
  • Before addressing the Material ID issue, we might first want to address the fact that the variable data are cell based, so that the "original" grid dimensions are 1 greater than the data dimensions. There is a current limitation -- I think in pyvista-xarray -- that mandates the coordinates have the same dimensions as the data. Maybe we need an option to support cell scalars and point scalars?
  • Would like to see a time selector (the dataset has 6 time steps), which might be trivial w/pyvista-trame.
  • Would like to capture 2D plots via interaction. Possible examples: (i) select one voxel and display its value vs time. (ii) select a group of voxels and display some aggregate value vs time, (iii) draw a line and plot values vs 1-3 axes.
  • "Overlapped data". Needs more definition, but could be displaying multiple variables each displaced by some offset (mostly z).
  • Record animations.
  • I would also like to restrict our datasets to kitware access only (this current one is publicly available to anyone with the url in the json file). That might be trivial once 2i2c sets up a cloud system for us.
  • And finally, once we have some confidence with this data set, I would also like to set up the full scale one (50 GB)
@johnkit
Copy link
Collaborator Author

johnkit commented Mar 27, 2024

I made a new script that more closely matches the intended visualization. It has 2 main differences from the original script (above)

  • It assigns the FE+++ input array to vtk CELL data instead of vtk point data. We might want to look into updating the pan3d viewer to handle cell data like this.
  • It uses the Material_ID input array to create a vtkGhostData array that blanks cells with id 0. We need to figure out an easy way for end users to reproduce this behavior.

Also note there is a new json file to pass in to the script: gcp.json

import argparse
import sys

import pyvista as pv
import pvxarray
import xarray as xr

DEFAULT_VARIABLE='Total_Fe+++ [M]'
ADD_BLANKING_ARRAY = True


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('ref_filesystem', help='path (or url) of reference file system (.json)')
    parser.add_argument('-v', '--variable', default=DEFAULT_VARIABLE, \
        help=f'variable to display [{DEFAULT_VARIABLE}]')

    parser.add_argument('-o', '--output_vtr_filename', help='write grid to .vtr file')
    parser.add_argument('-c', '--clip_plane', action='store_true', help='add clip plane to plot')
    args = parser.parse_args()

    # Must load with reference file system
    # And MUST disable mask_and_scale in order to input Material_ID (int32) correctly
    ds = xr.open_dataset('reference://', engine='zarr', mask_and_scale=False, backend_kwargs={
        'consolidated': False,
        'storage_options': {'fo': args.ref_filesystem}
    })
    print(ds)

    var_list = list(ds.keys())
    if not var_list:
        print('No data arrays')
        sys.exit()
    print(var_list)

    var_name = args.variable
    if var_name not in var_list:
        print(f'Cannot display {var_name}')
        sys.exit()

    print()
    # Create grid from point coordinates
    xg = ds['X [m]'].values
    yg = ds['Y [m]'].values
    zg = ds['Z [m]'].values
    grid = pv.RectilinearGrid(xg, yg, zg)
    # print(dir(grid))

    def update_grid(time_index):
        """Set grid's cell data to specified time"""
        t = time_index if isinstance(time_index, int) else int(time_index + 0.5)
        da = ds[var_name]
        var_data = da[t].values.transpose().flatten()  # MUST TRANSPOSE
        grid.cell_data.set_array(var_data, var_name)
    update_grid(0)

    if ADD_BLANKING_ARRAY:
        import numpy as np
        def blank_cell(mat_id):
            return 32 if mat_id < 1 else 0
        vblank_cell = np.vectorize(blank_cell, otypes=[np.uint8])

        # We are presuming blanking doesn't change w/time
        da_mat = ds['Material_ID'][0]
        np_blank = vblank_cell(da_mat).transpose().flatten()  # MUST TRANSPOSE
        grid.cell_data.set_array(np_blank, 'vtkGhostType')

    if args.output_vtr_filename is not None:
        from vtkmodules.vtkIOXML import vtkXMLRectilinearGridWriter
        writer = vtkXMLRectilinearGridWriter()
        writer.SetInputData(grid)
        writer.SetFileName(args.output_vtr_filename)
        writer.Write()
        print(f'Wrote {args.output_vtr_filename}')

    p = pv.Plotter()
    if args.clip_plane:
        p.add_mesh_clip_plane(grid, scalars=var_name, show_edges=False)
    else:
        p.add_mesh(grid, scalars=var_name, show_edges=False)
    p.add_axes()
    p.add_slider_widget(update_grid, [0.1, 5.1], fmt='%0.0f', value=0, title='Time Step')
    p.show()
```

@johnkit johnkit added this to the Future Sprint topics milestone Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant