Skip to content

3.2.0

Latest
Compare
Choose a tag to compare
@lhoestq lhoestq released this 10 Dec 17:00
· 6 commits to main since this release
fba4758

Dataset Features

  • Faster parquet streaming + filters with predicate pushdown by @lhoestq in #7309
    • Up to +100% streaming speed
    • Fast filtering via predicate pushdown (skip files/row groups based on predicate instead of downloading the full data), e.g.
      from datasets import load_dataset
      filters = [('date', '>=', '2023')]
      ds = load_dataset("HuggingFaceFW/fineweb-2", "fra_Latn", streaming=True, filters=filters)

Other improvements and bug fixes

New Contributors

Full Changelog: 3.1.0...3.2.0