Consider replacing "map-key mapping" with trie structure #47

crisptrutski · 2024-05-30T16:29:48Z

Currently Macaw makes use of various maps whose keys are qualified entities that are themselves represented by maps.

For example, the parameters passed to replace-names:

{:columns {{:table "orders" :column "total"} "subtotal"}
 :tables  {{:table "orders"}                 "purchases"}}

Ignoring what we want the library API to look like, there are some practical downsides to using this representation internally:

The relaxed look-up (find-relaxed) method requires a linear scan over they keys of a persistent map - slow!
Since the ordering of the keys is non-deterministic and we return the first match we find, this may also be non-deterministic!
We may want to do something different when there are multiple matches - e.g. return all of them, throw an error, etc.
We could still support this with an exhaustive scan that accumulates all the results, but that's even more expensive.

An idea for something better would be to build a trie-like structure which us traversed from the "outside in", i.e. right-to-left as identifiers are written. For example:

{"total" {"orders"   {"schema_1" "primary_orders_total"
                      "schema_2" "secondary_order_total"}
          "payments" {"schema_1" "total_payments"}}}

In the simple case of a fully-qualified identifier, we'd now need to do 3 look-ups instead of a single one, so there's no free lunch. On the other hand, we do save hashing the whole map - and clojure does not cache map hashes.

The advantage for partially qualified identifiers is that we can replace a linear seek which needs to check equality with modified keys with a sequence of look-ups. When we reach the end of our crumbs we can then easily enumerate all the children nodes. In the case where there is no match, we can also fail on a look-up step instead of doing a full seek.

They're also not great for pretty printing, given their density. That said, a trie will probably be even worse - so we'd probably want a utility method for printing them nicely when debugging too.

The text was updated successfully, but these errors were encountered:

crisptrutski · 2024-05-30T16:30:29Z

@tsmacdonald do we want a different label for refactors and optimizations? "enhancement" sounds very feature-y

crisptrutski · 2024-05-30T16:33:58Z

Given the cost of building this trie in the first place, for smaller examples the "efficiency gains" could be negative? That's assuming we want to keep taking the original structure in the API and need to convert it. This representation certainly isn't what we'd want to write by hand for tests though.

As far as the fully-versus-partially-qualified look-up conundrum, we could conceivably have both representation behind deftype which does the most efficient look-up, but that sounds like it could easily be jumping the complexity shark.

crisptrutski · 2024-06-20T09:58:50Z

For extra performance we could encapsulate this Trie in a non-persistent immutable ADT, and possibly even flatten it into a nice jump table to avoid pointer chasing.

crisptrutski added the enhancement New feature or request label May 30, 2024

crisptrutski added optimization Make it go brrr, without changing behavior and removed enhancement New feature or request labels Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider replacing "map-key mapping" with trie structure #47

Consider replacing "map-key mapping" with trie structure #47

crisptrutski commented May 30, 2024 •

edited

Loading

crisptrutski commented May 30, 2024 •

edited

Loading

crisptrutski commented May 30, 2024 •

edited

Loading

crisptrutski commented Jun 20, 2024

Consider replacing "map-key mapping" with trie structure #47

Consider replacing "map-key mapping" with trie structure #47

Comments

crisptrutski commented May 30, 2024 • edited Loading

crisptrutski commented May 30, 2024 • edited Loading

crisptrutski commented May 30, 2024 • edited Loading

crisptrutski commented Jun 20, 2024

crisptrutski commented May 30, 2024 •

edited

Loading

crisptrutski commented May 30, 2024 •

edited

Loading

crisptrutski commented May 30, 2024 •

edited

Loading