Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Two Sorted Sequences, Attempt 3 #236

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
07af721
Add functions to merge sorted sequences
CTMacUser Jul 11, 2024
154dc24
Correct function naming in documentation
CTMacUser Jul 11, 2024
335dc2a
Add functions to merge sorted partitions
CTMacUser Jul 11, 2024
7b0bc32
Remove the specialized set-operation initializers
CTMacUser Jul 13, 2024
1e9a611
Anonymize an unused variable
CTMacUser Jul 13, 2024
db85a4f
Make some opaque types explicit
CTMacUser Jul 13, 2024
34dc401
Clean up code for eager mergers
CTMacUser Jul 14, 2024
b627b6b
Add tests for merging
CTMacUser Jul 14, 2024
482a03f
Clean up some code; add sanity check
CTMacUser Jul 14, 2024
baaf9a4
Clarify the meaning of a function parameter
CTMacUser Jul 14, 2024
62055ea
Touch up tenses and permissions
CTMacUser Jul 15, 2024
eb31473
Add functions to merge sorted partitions in-place
CTMacUser Jul 19, 2024
7885728
Update documentation file
CTMacUser Jul 19, 2024
c482717
Add guide documentation
CTMacUser Jul 20, 2024
7276f96
Refine text for the Guide
CTMacUser Jul 20, 2024
c74b0d4
Add documentation partition markers
CTMacUser Aug 2, 2024
d1ef470
Separate tests between regular and subset mergers
CTMacUser Aug 2, 2024
f40015d
Separate the regular and set-operation merger functions
CTMacUser Aug 2, 2024
a9dc93b
Move the set-operation merger functions to a new file
CTMacUser Aug 2, 2024
33f17fa
Rename some set-operation merger functions
CTMacUser Aug 2, 2024
9a0702d
Change the conformance for set-operation merges
CTMacUser Aug 2, 2024
b11af98
Update documentation for the merger free-functions
CTMacUser Aug 3, 2024
ea1a22a
Update the general documentation
CTMacUser Aug 3, 2024
b3a5031
Update on a conformance made conditional
CTMacUser Aug 3, 2024
d3fc6b8
Add function/type summaries
CTMacUser Aug 3, 2024
02838f0
Update documentation over the full/subset split
CTMacUser Aug 3, 2024
59b2d4c
Correct (lack of) copy & paste error
CTMacUser Aug 4, 2024
d539b1f
Update the merge summary on the full/subset split
CTMacUser Aug 4, 2024
b3eeabb
Add preliminary filter-less merger sequence
CTMacUser Sep 23, 2024
1ee97e0
Move partition-based merging to separate files
CTMacUser Sep 25, 2024
6caf340
Redo merging two sequences
CTMacUser Sep 25, 2024
188649a
Regroup free functions over eager vs lazy
CTMacUser Sep 25, 2024
bd7411d
Lock eager mergers to always return arrays
CTMacUser Sep 25, 2024
aa2a5d0
Remove the error type in a public declaration
CTMacUser Sep 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 193 additions & 0 deletions Guides/Merge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Merge

- Between Partitions:
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/MergePartitions.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/MergePartitionsTests.swift)]
- Between Arbitrary Sequences:
[[Source](https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/Merge.swift) |
[Tests](https://github.com/apple/swift-algorithms/blob/main/Tests/SwiftAlgorithmsTests/MergeTests.swift)]

Splice two sequences that use the same sorting criteria into a sequence that
is also sorted with that criteria.

If the sequences are sorted with something besides the less-than operator (`<`),
then a predicate can be supplied:

```swift
let merged = lazilyMerge([10, 4, 0, 0, -3], [20, 6, 1, -1, -5], keeping: .sum, sortedBy: >)
print(Array(merged))
// [20, 10, 6, 4, 1, 0, 0, -1, -3, -5]
```

Sorted sequences can be treated as (multi-)sets.
Due to being sorted,
distinguishing elements that are shared between sequences or
are exclusive to a sequence can be determined in a resonable time frame.
Set operations take advantage of the catagories of sharing,
so applying operations can be done in-line during merging:

```swift
let first = [0, 1, 1, 2, 5, 10], second = [-1, 0, 1, 2, 2, 7, 10, 20]
print(merge(first, second, keeping: .union))
print(merge(first, second, keeping: .intersection))
print(merge(first, second, keeping: .secondWithoutFirst))
print(merge(first, second, keeping: .sum)) // Standard merge!
/*
[-1, 0, 1, 1, 2, 2, 5, 7, 10, 20]
[0, 1, 2, 10]
[-1, 2, 7, 20]
[-1, 0, 0, 1, 1, 1, 2, 2, 2, 5, 7, 10, 10, 20]
*/
```

## Detailed Design

The merging algorithm can be applied in two domains:

- Free functions taking the source sequences.
- Functions over a `MutableCollection & BidirectionalCollection`,
where the two sources are adjancent partitions of the collection.

Besides the optional ordering predicate,
the partition-merging methods' other parameter is the index to the
first element of the second partition,
or `endIndex` if that partition is empty.

Besides the optional ordering predicate,
the free functions take the two operand sequences and the desired set operation
(intersection, union, symmetric difference, *etc.*).
Use `.sum` for a conventional merge.
Half of those functions take an extra parameter taking a reference to
a collection type.
These functions create an object of that type and eagerly fill it with the
result of the merger.
The functions without that parameter return a special sequence that lazily
generates the result of the merger.

```swift
// Merging two adjacent partitions.

extension MutableCollection where Self : BidirectionalCollection {
/// Assuming that both this collection's slice before the given index and
/// the slice at and past that index are both sorted according to
/// the given predicate,
/// rearrange the slices' elements until the collection as
/// a whole is sorted according to the predicate.
public mutating func mergePartitions<Fault>(
across pivot: Index,
sortedBy areInIncreasingOrder: (Element, Element) throws(Fault) -> Bool
) throws(Fault) where Fault : Error
}

extension MutableCollection where Self : BidirectionalCollection, Self.Element : Comparable {
/// Assuming that both this collection's slice before the given index and
/// the slice at and past that index are both sorted,
/// rearrange the slices' elements until the collection as
/// a whole is sorted.
public mutating func mergePartitions(across pivot: Index)
}

// Merging two sequences with free functions, applying a set operation.
// Has lazy and eager variants.

/// Given two sequences treated as (multi)sets, both sorted according to
/// a given predicate,
/// return a sequence that lazily vends the also-sorted result of applying a
/// given set operation to the sequence operands.
public func lazilyMerge<First, Second>(
_ first: First, _ second: Second, keeping filter: MergerSubset,
sortedBy areInIncreasingOrder: @escaping (First.Element, Second.Element) -> Bool
) -> MergedSequence<First, Second>
where First : Sequence, Second : Sequence, First.Element == Second.Element

/// Given two sorted sequences treated as (multi)sets,
/// return a sequence that lazily vends the also-sorted result of applying a
/// given set operation to the sequence operands.
public func lazilyMerge<First, Second>(
_ first: First, _ second: Second, keeping filter: MergerSubset
) -> MergedSequence<First, Second>
where First : Sequence, Second : Sequence, First.Element : Comparable,
First.Element == Second.Element

/// Returns a sorted array containing the result of the given set operation
/// applied to the given sorted sequences,
/// where sorting is determined by the given predicate.
public func merge<First, Second, Fault>(
_ first: First, _ second: Second, keeping filter: MergerSubset,
sortedBy areInIncreasingOrder: (First.Element, Second.Element) throws(Fault) -> Bool
) throws(Fault) -> [Second.Element]
where First : Sequence, Second : Sequence,
Fault : Error, First.Element == Second.Element

/// Returns a sorted array containing the result of the given set operation
/// applied to the given sorted sequences.
public func merge<First, Second>(
_ first: First, _ second: Second, keeping filter: MergerSubset
) -> [Second.Element]
where First : Sequence, Second : Sequence,
First.Element : Comparable, First.Element == Second.Element
```

Target subsets are described by a new type.

```swift
/// Description of which elements of a merger will be retained.
public enum MergerSubset : UInt, CaseIterable
{
case none, firstWithoutSecond, secondWithoutFirst, symmetricDifference,
intersection, first, second, union,
sum

//...
}
```

Every set-operation combination is provided, although some are degenerate.

The merging free-functions use these support types:

```swift
/// A sequence that reads from two sequences treated as (multi)sets,
/// where both sequences' elements are sorted according to some predicate,
/// and emits a sorted merger,
/// excluding any elements barred by a set operation.
public struct MergedSequence<First, Second, Fault>
: Sequence, LazySequenceProtocol
where First : Sequence, Second : Sequence, Fault : Error,
First.Element == Second.Element
{
//...
}

/// An iterator that reads from two virtual sequences treated as (multi)sets,
/// where both sequences' elements are sorted according to some predicate,
/// and emits a sorted merger,
/// excluding any elements barred by a set operation.
public struct MergingIterator<First, Second, Fault>
: IteratorProtocol
where First : IteratorProtocol, Second : IteratorProtocol, Fault : Error,
First.Element == Second.Element
{
//...
}
```

The partition merger operates **O(** 1 **)** in space;
for time it works at _???_ for random-access collections and
_???_ for bidirectional collections.

The eager merging free functions operate at **O(** _n_ `+` _m_ **)** in
space and time,
where *n* and *m* are the lengths of the source sequences.
The lazy merging free functions operate at **O(** 1 **)** in space and time.
Actually generating the entire merged sequence will take
**O(** _n_ `+` _m_ **)** over distributed time.

### Naming

Many merging functions use the word "merge" in their name.

**[C++]:** Provides the `merge` and `inplace_merge` functions.
Set operations are provided by
the `set_union`, `set_intersection`, `set_difference`, and
`set_symmetric_difference` functions.
2 changes: 2 additions & 0 deletions Guides/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ These guides describe the design and intention behind the APIs included in the `

#### Mutating algorithms

- [`mergePartitions(across:)`, `mergePartitions(across:sortedBy:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/MergedSorted.md): In-place merger of sorted partitions.
- [`rotate(toStartAt:)`, `rotate(subrange:toStartAt:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Rotate.md): In-place rotation of elements.
- [`stablePartition(by:)`, `stablePartition(subrange:by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Partition.md): A partition that preserves the relative order of the resulting prefix and suffix.

Expand All @@ -20,6 +21,7 @@ These guides describe the design and intention behind the APIs included in the `
- [`chain(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Chain.md): Concatenates two collections with the same element type.
- [`cycled()`, `cycled(times:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Cycle.md): Repeats the elements of a collection forever or a set number of times.
- [`joined(by:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Joined.md): Concatenate sequences of sequences, using an element or sequence as a separator, or using a closure to generate each separator.
- [`lazilyMerge(_:_:keeping:sortedBy:)`, `lazilyMerge(_:_:keeping:)`, `merge(_:_:keeping:sortedBy:)`, `merge(_:_:keeping:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Merge.md): Merge two sorted sequences together.
- [`product(_:_:)`](https://github.com/apple/swift-algorithms/blob/main/Guides/Product.md): Iterates over all the pairs of two collections; equivalent to nested `for`-`in` loops.

#### Subsetting operations
Expand Down
1 change: 1 addition & 0 deletions Sources/Algorithms/Documentation.docc/Algorithms.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@ Explore more chunking methods and the remainder of the Algorithms package, group
- <doc:Filtering>
- <doc:Reductions>
- <doc:Partitioning>
- <doc:Merging>
22 changes: 22 additions & 0 deletions Sources/Algorithms/Documentation.docc/Merging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Merging

Merge two sorted sequences as a new sorted sequence.
Take two sorted sequences to be treated as sets,
then generate the result of applying a set operation.

## Topics

### Merging Sorted Sequences

- ``lazilyMerge(_:_:keeping:sortedBy)``
- ``lazilyMerge(_:_:keeping:)``
- ``merge(_:_:keeping:sortedBy)``
- ``merge(_:_:keeping:)``
- ``Swift/MutableCollection/mergePartitions(across:sortedBy:)``
- ``Swift/MutableCollection/mergePartitions(across:)``

### Supporting Types

- ``MergerSubset``
- ``MergedSequence``
- ``MergingIterator``
Loading