Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: AVX implementation for 2D diamond SE #97

Merged
merged 2 commits into from
Jun 15, 2022
Merged

Conversation

johnnychen94
Copy link
Member

@johnnychen94 johnnychen94 commented Jun 15, 2022

rewrite from #90 for 2D diamond shape SE

Still needs to tweak a bit

Note that I didn't use the tweaked version in #90 (comment) because it seems that LoopVectorization failed to handle it JuliaSIMD/LoopVectorization.jl#415

cc: @ThomasRetornaz

@github-actions
Copy link
Contributor

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ImageMorphology.jl/ImageMorphology.jl

Job Properties

  • Time of benchmarks:
    • Target: 15 Jun 2022 - 21:59
    • Baseline: 15 Jun 2022 - 22:02
  • Package commits:
    • Target: 6ebd1d
    • Baseline: e7c206
  • Julia commits:
    • Target: 742b9a
    • Baseline: 742b9a
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Maxtree", "area_opening", "64×64"] 1.07 (5%) ❌ 1.00 (1%)
["extreme_filter", "Bool", "64×64", "r1_diamond_best"] 0.29 (5%) ✅ 0.98 (1%) ✅
["extreme_filter", "Bool", "64×64", "r1_diamond_random"] 0.09 (5%) ✅ 0.98 (1%) ✅
["extreme_filter", "Bool", "64×64", "r1_diamond_worst"] 0.43 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "64×64", "r5_diamond_best"] 0.68 (5%) ✅ 1.07 (1%) ❌
["extreme_filter", "Bool", "64×64", "r5_diamond_random"] 0.31 (5%) ✅ 1.07 (1%) ❌
["extreme_filter", "Bool", "64×64", "r5_diamond_worst"] 0.39 (5%) ✅ 0.84 (1%) ✅
["extreme_filter", "Gray{Float32}", "64×64", "r1_diamond"] 0.20 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{Float32}", "64×64", "r5_diamond"] 0.23 (5%) ✅ 0.95 (1%) ✅
["extreme_filter", "Gray{N0f8}", "64×64", "r1_diamond"] 0.80 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{N0f8}", "64×64", "r1_generic"] 0.91 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{N0f8}", "64×64", "r5_diamond"] 1.14 (5%) ❌ 0.83 (1%) ✅
["extreme_filter", "Gray{N0f8}", "64×64", "r5_generic"] 0.94 (5%) ✅ 1.00 (1%)
["extreme_filter", "Int64", "64×64", "r1_diamond"] 0.68 (5%) ✅ 1.00 (1%)
["extreme_filter", "Int64", "64×64", "r1_generic"] 1.14 (5%) ❌ 1.00 (1%)
["extreme_filter", "Int64", "64×64", "r5_diamond"] 0.61 (5%) ✅ 0.97 (1%) ✅

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Maxtree", "area_opening"]
  • ["connected"]
  • ["convexhull"]
  • ["dilatation_and_erosion", "erode", "Gray{Float32}"]
  • ["dilatation_and_erosion", "erode", "Gray{N0f8}"]
  • ["dilatation_and_erosion", "opening", "Gray{Float32}"]
  • ["dilatation_and_erosion", "opening", "Gray{N0f8}"]
  • ["extreme_filter", "Bool", "64×64"]
  • ["extreme_filter", "Gray{Float32}", "64×64"]
  • ["extreme_filter", "Gray{N0f8}", "64×64"]
  • ["extreme_filter", "Int64", "64×64"]
  • ["feature_transform"]

Julia versioninfo

Target

Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1025-azure #29~20.04.1-Ubuntu SMP Thu May 19 14:50:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1318 s          1 s        167 s       5238 s          0 s
       #2  2593 MHz       4609 s          1 s        211 s       1946 s          0 s
       
  Memory: 6.783603668212891 GB (2976.53515625 MB free)
  Uptime: 680.2 sec
  Load Avg:  1.1  1.07  0.64
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Baseline

Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1025-azure #29~20.04.1-Ubuntu SMP Thu May 19 14:50:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1402 s          1 s        172 s       6463 s          0 s
       #2  2593 MHz       5837 s          1 s        221 s       2024 s          0 s
       
  Memory: 6.783603668212891 GB (3012.3359375 MB free)
  Uptime: 812.02 sec
  Load Avg:  1.06  1.06  0.71
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Target result

Benchmark Report for /home/runner/work/ImageMorphology.jl/ImageMorphology.jl

Job Properties

  • Time of benchmark: 15 Jun 2022 - 21:59
  • Package commit: 6ebd1d
  • Julia commit: 742b9a
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Maxtree", "area_opening", "64×64"] 795.610 μs (5%) 133.38 KiB (1%) 21
["connected", "label_components"] 891.612 μs (5%) 521.95 KiB (1%) 44
["convexhull", "convexhull"] 412.605 μs (5%) 529.77 KiB (1%) 16528
["dilatation_and_erosion", "erode", "Gray{Float32}", "64×64"] 122.901 μs (5%) 17.05 KiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{N0f8}", "64×64"] 44.900 μs (5%) 5.11 KiB (1%) 23
["dilatation_and_erosion", "opening", "Gray{Float32}", "64×64"] 245.403 μs (5%) 33.94 KiB (1%) 38
["dilatation_and_erosion", "opening", "Gray{N0f8}", "64×64"] 90.101 μs (5%) 10.06 KiB (1%) 38
["extreme_filter", "Bool", "64×64", "r1_bool_best"] 15.800 μs (5%) 5.22 KiB (1%) 17
["extreme_filter", "Bool", "64×64", "r1_bool_random"] 33.100 μs (5%) 5.19 KiB (1%) 16
["extreme_filter", "Bool", "64×64", "r1_bool_worst"] 33.200 μs (5%) 5.22 KiB (1%) 17
["extreme_filter", "Bool", "64×64", "r1_diamond_best"] 4.629 μs (5%) 4.80 KiB (1%) 13
["extreme_filter", "Bool", "64×64", "r1_diamond_random"] 4.700 μs (5%) 4.86 KiB (1%) 14
["extreme_filter", "Bool", "64×64", "r1_diamond_worst"] 4.657 μs (5%) 4.80 KiB (1%) 13
["extreme_filter", "Bool", "64×64", "r5_bool_best"] 30.400 μs (5%) 12.22 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_bool_random"] 63.501 μs (5%) 12.22 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_bool_worst"] 223.803 μs (5%) 12.22 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_diamond_best"] 19.100 μs (5%) 8.98 KiB (1%) 14
["extreme_filter", "Bool", "64×64", "r5_diamond_random"] 19.100 μs (5%) 8.98 KiB (1%) 14
["extreme_filter", "Bool", "64×64", "r5_diamond_worst"] 19.100 μs (5%) 8.98 KiB (1%) 14
["extreme_filter", "Gray{Float32}", "64×64", "r1_diamond"] 7.100 μs (5%) 16.66 KiB (1%) 11
["extreme_filter", "Gray{Float32}", "64×64", "r1_generic"] 60.801 μs (5%) 16.67 KiB (1%) 10
["extreme_filter", "Gray{Float32}", "64×64", "r5_diamond"] 38.800 μs (5%) 32.72 KiB (1%) 11
["extreme_filter", "Gray{Float32}", "64×64", "r5_generic"] 947.313 μs (5%) 20.14 KiB (1%) 11
["extreme_filter", "Gray{N0f8}", "64×64", "r1_diamond"] 5.940 μs (5%) 4.72 KiB (1%) 11
["extreme_filter", "Gray{N0f8}", "64×64", "r1_generic"] 25.400 μs (5%) 4.73 KiB (1%) 10
["extreme_filter", "Gray{N0f8}", "64×64", "r5_diamond"] 38.400 μs (5%) 8.84 KiB (1%) 11
["extreme_filter", "Gray{N0f8}", "64×64", "r5_generic"] 376.304 μs (5%) 8.20 KiB (1%) 11
["extreme_filter", "Int64", "64×64", "r1_diamond"] 7.433 μs (5%) 32.58 KiB (1%) 12
["extreme_filter", "Int64", "64×64", "r1_generic"] 32.000 μs (5%) 32.59 KiB (1%) 11
["extreme_filter", "Int64", "64×64", "r5_diamond"] 32.100 μs (5%) 64.56 KiB (1%) 13
["extreme_filter", "Int64", "64×64", "r5_generic"] 319.004 μs (5%) 36.06 KiB (1%) 12
["feature_transform", "feature_transform"] 2.408 ms (5%) 1.02 MiB (1%) 7

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Maxtree", "area_opening"]
  • ["connected"]
  • ["convexhull"]
  • ["dilatation_and_erosion", "erode", "Gray{Float32}"]
  • ["dilatation_and_erosion", "erode", "Gray{N0f8}"]
  • ["dilatation_and_erosion", "opening", "Gray{Float32}"]
  • ["dilatation_and_erosion", "opening", "Gray{N0f8}"]
  • ["extreme_filter", "Bool", "64×64"]
  • ["extreme_filter", "Gray{Float32}", "64×64"]
  • ["extreme_filter", "Gray{N0f8}", "64×64"]
  • ["extreme_filter", "Int64", "64×64"]
  • ["feature_transform"]

Julia versioninfo

Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1025-azure #29~20.04.1-Ubuntu SMP Thu May 19 14:50:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1318 s          1 s        167 s       5238 s          0 s
       #2  2593 MHz       4609 s          1 s        211 s       1946 s          0 s
       
  Memory: 6.783603668212891 GB (2976.53515625 MB free)
  Uptime: 680.2 sec
  Load Avg:  1.1  1.07  0.64
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Baseline result

Benchmark Report for /home/runner/work/ImageMorphology.jl/ImageMorphology.jl

Job Properties

  • Time of benchmark: 15 Jun 2022 - 22:2
  • Package commit: e7c206
  • Julia commit: 742b9a
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Maxtree", "area_opening", "64×64"] 745.010 μs (5%) 133.38 KiB (1%) 21
["connected", "label_components"] 876.912 μs (5%) 521.95 KiB (1%) 44
["convexhull", "convexhull"] 408.205 μs (5%) 529.77 KiB (1%) 16528
["dilatation_and_erosion", "erode", "Gray{Float32}", "64×64"] 122.901 μs (5%) 17.05 KiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{N0f8}", "64×64"] 44.900 μs (5%) 5.11 KiB (1%) 23
["dilatation_and_erosion", "opening", "Gray{Float32}", "64×64"] 245.603 μs (5%) 33.94 KiB (1%) 38
["dilatation_and_erosion", "opening", "Gray{N0f8}", "64×64"] 90.101 μs (5%) 10.06 KiB (1%) 38
["extreme_filter", "Bool", "64×64", "r1_bool_best"] 16.200 μs (5%) 5.22 KiB (1%) 17
["extreme_filter", "Bool", "64×64", "r1_bool_random"] 33.200 μs (5%) 5.19 KiB (1%) 16
["extreme_filter", "Bool", "64×64", "r1_bool_worst"] 33.200 μs (5%) 5.22 KiB (1%) 17
["extreme_filter", "Bool", "64×64", "r1_diamond_best"] 15.872 μs (5%) 4.89 KiB (1%) 13
["extreme_filter", "Bool", "64×64", "r1_diamond_random"] 50.115 μs (5%) 4.95 KiB (1%) 14
["extreme_filter", "Bool", "64×64", "r1_diamond_worst"] 10.772 μs (5%) 4.78 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_bool_best"] 30.700 μs (5%) 12.22 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_bool_random"] 64.200 μs (5%) 12.22 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_bool_worst"] 233.803 μs (5%) 12.22 KiB (1%) 21
["extreme_filter", "Bool", "64×64", "r5_diamond_best"] 27.900 μs (5%) 8.39 KiB (1%) 15
["extreme_filter", "Bool", "64×64", "r5_diamond_random"] 61.801 μs (5%) 8.39 KiB (1%) 15
["extreme_filter", "Bool", "64×64", "r5_diamond_worst"] 49.600 μs (5%) 10.72 KiB (1%) 86
["extreme_filter", "Gray{Float32}", "64×64", "r1_diamond"] 34.850 μs (5%) 16.66 KiB (1%) 20
["extreme_filter", "Gray{Float32}", "64×64", "r1_generic"] 60.803 μs (5%) 16.67 KiB (1%) 10
["extreme_filter", "Gray{Float32}", "64×64", "r5_diamond"] 166.502 μs (5%) 34.47 KiB (1%) 84
["extreme_filter", "Gray{Float32}", "64×64", "r5_generic"] 947.612 μs (5%) 20.14 KiB (1%) 11
["extreme_filter", "Gray{N0f8}", "64×64", "r1_diamond"] 7.420 μs (5%) 4.72 KiB (1%) 20
["extreme_filter", "Gray{N0f8}", "64×64", "r1_generic"] 27.800 μs (5%) 4.73 KiB (1%) 10
["extreme_filter", "Gray{N0f8}", "64×64", "r5_diamond"] 33.801 μs (5%) 10.59 KiB (1%) 84
["extreme_filter", "Gray{N0f8}", "64×64", "r5_generic"] 399.006 μs (5%) 8.20 KiB (1%) 11
["extreme_filter", "Int64", "64×64", "r1_diamond"] 10.967 μs (5%) 32.58 KiB (1%) 21
["extreme_filter", "Int64", "64×64", "r1_generic"] 28.100 μs (5%) 32.59 KiB (1%) 11
["extreme_filter", "Int64", "64×64", "r5_diamond"] 52.601 μs (5%) 66.31 KiB (1%) 86
["extreme_filter", "Int64", "64×64", "r5_generic"] 316.704 μs (5%) 36.06 KiB (1%) 12
["feature_transform", "feature_transform"] 2.419 ms (5%) 1.02 MiB (1%) 7

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Maxtree", "area_opening"]
  • ["connected"]
  • ["convexhull"]
  • ["dilatation_and_erosion", "erode", "Gray{Float32}"]
  • ["dilatation_and_erosion", "erode", "Gray{N0f8}"]
  • ["dilatation_and_erosion", "opening", "Gray{Float32}"]
  • ["dilatation_and_erosion", "opening", "Gray{N0f8}"]
  • ["extreme_filter", "Bool", "64×64"]
  • ["extreme_filter", "Gray{Float32}", "64×64"]
  • ["extreme_filter", "Gray{N0f8}", "64×64"]
  • ["extreme_filter", "Int64", "64×64"]
  • ["feature_transform"]

Julia versioninfo

Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.4 LTS
  uname: Linux 5.13.0-1025-azure #29~20.04.1-Ubuntu SMP Thu May 19 14:50:45 UTC 2022 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2593 MHz       1402 s          1 s        172 s       6463 s          0 s
       #2  2593 MHz       5837 s          1 s        221 s       2024 s          0 s
       
  Memory: 6.783603668212891 GB (3012.3359375 MB free)
  Uptime: 812.02 sec
  Load Avg:  1.06  1.06  0.71
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake-avx512)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          2
On-line CPU(s) list:             0,1
Thread(s) per core:              1
Core(s) per socket:              2
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Stepping:                        7
CPU MHz:                         2593.906
BogoMIPS:                        5187.81
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       64 KiB
L1i cache:                       64 KiB
L2 cache:                        2 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):               0,1
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Mitigation; PTE Inversion
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, STIBP disabled, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x07, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@johnnychen94
Copy link
Member Author

johnnychen94 commented Jun 15, 2022

Here's the local benchmark result. Most of them are good except for one weird Int case (12.69x slow down).

Benchmark result

Judge result

Benchmark Report for /home/jc/Documents/Julia/ImageMorphology.jl

Job Properties

  • Time of benchmarks:
    • Target: 16 Jun 2022 - 06:05
    • Baseline: 16 Jun 2022 - 06:11
  • Package commits:
    • Target: 5323f4
    • Baseline: e7c206
  • Julia commits:
    • Target: 6368fd
    • Baseline: 6368fd
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["Maxtree", "area_opening", "256×256"] 1.87 (5%) ❌ 1.00 (1%)
["Maxtree", "area_opening", "512×512"] 1.39 (5%) ❌ 1.00 (1%)
["connected", "label_components"] 0.92 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "256×256", "r1_diamond_best"] 0.10 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "256×256", "r1_diamond_random"] 0.02 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "256×256", "r1_diamond_worst"] 0.09 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "256×256", "r5_diamond_best"] 0.77 (5%) ✅ 1.89 (1%) ❌
["extreme_filter", "Bool", "256×256", "r5_diamond_random"] 0.17 (5%) ✅ 1.89 (1%) ❌
["extreme_filter", "Bool", "256×256", "r5_diamond_worst"] 0.13 (5%) ✅ 0.99 (1%) ✅
["extreme_filter", "Bool", "512×512", "r1_diamond_best"] 0.05 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "512×512", "r1_diamond_random"] 0.01 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "512×512", "r1_diamond_worst"] 0.04 (5%) ✅ 1.00 (1%)
["extreme_filter", "Bool", "512×512", "r5_diamond_best"] 0.45 (5%) ✅ 1.97 (1%) ❌
["extreme_filter", "Bool", "512×512", "r5_diamond_random"] 0.09 (5%) ✅ 1.97 (1%) ❌
["extreme_filter", "Bool", "512×512", "r5_diamond_worst"] 0.07 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{Float32}", "256×256", "r1_diamond"] 0.09 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{Float32}", "256×256", "r1_generic"] 0.94 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{Float32}", "256×256", "r5_diamond"] 0.20 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{Float32}", "512×512", "r1_diamond"] 0.06 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{Float32}", "512×512", "r5_diamond"] 0.14 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{N0f8}", "256×256", "r1_diamond"] 0.09 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{N0f8}", "256×256", "r5_diamond"] 0.17 (5%) ✅ 0.99 (1%) ✅
["extreme_filter", "Gray{N0f8}", "512×512", "r1_diamond"] 0.05 (5%) ✅ 1.00 (1%)
["extreme_filter", "Gray{N0f8}", "512×512", "r5_diamond"] 0.12 (5%) ✅ 1.00 (1%)
["extreme_filter", "Int64", "256×256", "r1_diamond"] 0.11 (5%) ✅ 1.00 (1%)
["extreme_filter", "Int64", "256×256", "r5_diamond"] 0.31 (5%) ✅ 1.00 (1%)
["extreme_filter", "Int64", "512×512", "r1_diamond"] 12.69 (5%) ❌ 3.88 (1%) ❌
["extreme_filter", "Int64", "512×512", "r5_diamond"] 0.15 (5%) ✅ 1.00 (1%)
["feature_transform", "feature_transform"] 0.66 (5%) ✅ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Maxtree", "area_opening"]
  • ["connected"]
  • ["convexhull"]
  • ["dilatation_and_erosion", "erode", "Gray{Float32}"]
  • ["dilatation_and_erosion", "erode", "Gray{N0f8}"]
  • ["dilatation_and_erosion", "opening", "Gray{Float32}"]
  • ["dilatation_and_erosion", "opening", "Gray{N0f8}"]
  • ["extreme_filter", "Bool", "256×256"]
  • ["extreme_filter", "Bool", "512×512"]
  • ["extreme_filter", "Gray{Float32}", "256×256"]
  • ["extreme_filter", "Gray{Float32}", "512×512"]
  • ["extreme_filter", "Gray{N0f8}", "256×256"]
  • ["extreme_filter", "Gray{N0f8}", "512×512"]
  • ["extreme_filter", "Int64", "256×256"]
  • ["extreme_filter", "Int64", "512×512"]
  • ["feature_transform"]

Julia versioninfo

Target

Julia Version 1.8.0-rc1
Commit 6368fdc656 (2022-05-27 18:33 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64
  CPU: 12th Gen Intel(R) Core(TM) i9-12900K: 
                 speed         user         nice          sys         idle          irq
       #1-24  3187 MHz     144321 s        786 s      37297 s  145789540 s          0 s
  Memory: 15.621490478515625 GB (3204.171875 MB free)
  Uptime: 608308.32 sec
  Load Avg:  1.73  1.6  1.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 16 on 24 virtual cores

Baseline

Julia Version 1.8.0-rc1
Commit 6368fdc656 (2022-05-27 18:33 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64
  CPU: 12th Gen Intel(R) Core(TM) i9-12900K: 
                 speed         user         nice          sys         idle          irq
       #1-24  3187 MHz     147638 s        786 s      37426 s  145863304 s          0 s
  Memory: 15.621490478515625 GB (3306.55078125 MB free)
  Uptime: 608630.11 sec
  Load Avg:  1.13  1.23  1.22
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 16 on 24 virtual cores

Target result

Benchmark Report for /home/jc/Documents/Julia/ImageMorphology.jl

Job Properties

  • Time of benchmark: 16 Jun 2022 - 6:5
  • Package commit: 5323f4
  • Julia commit: 6368fd
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Maxtree", "area_opening", "256×256"] 11.192 ms (5%) 2.07 MiB (1%) 24
["Maxtree", "area_opening", "512×512"] 40.440 ms (5%) 8.25 MiB (1%) 23
["connected", "label_components"] 321.791 μs (5%) 519.81 KiB (1%) 12
["convexhull", "convexhull"] 158.654 μs (5%) 529.77 KiB (1%) 16528
["dilatation_and_erosion", "erode", "Gray{Float32}", "256×256"] 749.990 μs (5%) 256.91 KiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{Float32}", "512×512"] 3.033 ms (5%) 1.00 MiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{N0f8}", "256×256"] 250.868 μs (5%) 64.97 KiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{N0f8}", "512×512"] 996.575 μs (5%) 256.97 KiB (1%) 23
["dilatation_and_erosion", "opening", "Gray{Float32}", "256×256"] 1.532 ms (5%) 513.59 KiB (1%) 37
["dilatation_and_erosion", "opening", "Gray{Float32}", "512×512"] 6.152 ms (5%) 2.00 MiB (1%) 37
["dilatation_and_erosion", "opening", "Gray{N0f8}", "256×256"] 501.851 μs (5%) 129.72 KiB (1%) 37
["dilatation_and_erosion", "opening", "Gray{N0f8}", "512×512"] 1.995 ms (5%) 513.72 KiB (1%) 37
["extreme_filter", "Bool", "256×256", "r1_bool_best"] 104.690 μs (5%) 65.11 KiB (1%) 17
["extreme_filter", "Bool", "256×256", "r1_bool_random"] 187.493 μs (5%) 65.11 KiB (1%) 17
["extreme_filter", "Bool", "256×256", "r1_bool_worst"] 187.766 μs (5%) 65.11 KiB (1%) 17
["extreme_filter", "Bool", "256×256", "r1_diamond_best"] 10.964 μs (5%) 64.72 KiB (1%) 14
["extreme_filter", "Bool", "256×256", "r1_diamond_random"] 10.835 μs (5%) 64.72 KiB (1%) 14
["extreme_filter", "Bool", "256×256", "r1_diamond_worst"] 10.994 μs (5%) 64.72 KiB (1%) 14
["extreme_filter", "Bool", "256×256", "r5_bool_best"] 118.286 μs (5%) 72.11 KiB (1%) 21
["extreme_filter", "Bool", "256×256", "r5_bool_random"] 532.507 μs (5%) 72.11 KiB (1%) 21
["extreme_filter", "Bool", "256×256", "r5_bool_worst"] 1.085 ms (5%) 72.11 KiB (1%) 21
["extreme_filter", "Bool", "256×256", "r5_diamond_best"] 90.364 μs (5%) 128.83 KiB (1%) 16
["extreme_filter", "Bool", "256×256", "r5_diamond_random"] 91.806 μs (5%) 128.83 KiB (1%) 16
["extreme_filter", "Bool", "256×256", "r5_diamond_worst"] 87.575 μs (5%) 128.83 KiB (1%) 16
["extreme_filter", "Bool", "512×512", "r1_bool_best"] 422.036 μs (5%) 257.11 KiB (1%) 17
["extreme_filter", "Bool", "512×512", "r1_bool_random"] 726.953 μs (5%) 257.11 KiB (1%) 17
["extreme_filter", "Bool", "512×512", "r1_bool_worst"] 740.058 μs (5%) 257.11 KiB (1%) 17
["extreme_filter", "Bool", "512×512", "r1_diamond_best"] 21.525 μs (5%) 256.72 KiB (1%) 14
["extreme_filter", "Bool", "512×512", "r1_diamond_random"] 21.960 μs (5%) 256.72 KiB (1%) 14
["extreme_filter", "Bool", "512×512", "r1_diamond_worst"] 21.979 μs (5%) 256.72 KiB (1%) 14
["extreme_filter", "Bool", "512×512", "r5_bool_best"] 440.347 μs (5%) 264.11 KiB (1%) 21
["extreme_filter", "Bool", "512×512", "r5_bool_random"] 2.194 ms (5%) 264.11 KiB (1%) 21
["extreme_filter", "Bool", "512×512", "r5_bool_worst"] 4.382 ms (5%) 264.11 KiB (1%) 21
["extreme_filter", "Bool", "512×512", "r5_diamond_best"] 200.121 μs (5%) 512.83 KiB (1%) 16
["extreme_filter", "Bool", "512×512", "r5_diamond_random"] 200.764 μs (5%) 512.83 KiB (1%) 16
["extreme_filter", "Bool", "512×512", "r5_diamond_worst"] 200.948 μs (5%) 512.83 KiB (1%) 16
["extreme_filter", "Gray{Float32}", "256×256", "r1_diamond"] 14.887 μs (5%) 256.52 KiB (1%) 11
["extreme_filter", "Gray{Float32}", "256×256", "r1_generic"] 339.650 μs (5%) 256.56 KiB (1%) 10
["extreme_filter", "Gray{Float32}", "256×256", "r5_diamond"] 163.709 μs (5%) 512.56 KiB (1%) 13
["extreme_filter", "Gray{Float32}", "256×256", "r5_generic"] 7.653 ms (5%) 260.06 KiB (1%) 12
["extreme_filter", "Gray{Float32}", "512×512", "r1_diamond"] 43.887 μs (5%) 1.00 MiB (1%) 11
["extreme_filter", "Gray{Float32}", "512×512", "r1_generic"] 1.411 ms (5%) 1.00 MiB (1%) 10
["extreme_filter", "Gray{Float32}", "512×512", "r5_diamond"] 516.506 μs (5%) 2.00 MiB (1%) 13
["extreme_filter", "Gray{Float32}", "512×512", "r5_generic"] 30.672 ms (5%) 1.00 MiB (1%) 12
["extreme_filter", "Gray{N0f8}", "256×256", "r1_diamond"] 11.183 μs (5%) 64.58 KiB (1%) 11
["extreme_filter", "Gray{N0f8}", "256×256", "r1_generic"] 142.367 μs (5%) 64.62 KiB (1%) 10
["extreme_filter", "Gray{N0f8}", "256×256", "r5_diamond"] 143.392 μs (5%) 128.69 KiB (1%) 13
["extreme_filter", "Gray{N0f8}", "256×256", "r5_generic"] 1.757 ms (5%) 68.12 KiB (1%) 12
["extreme_filter", "Gray{N0f8}", "512×512", "r1_diamond"] 27.772 μs (5%) 256.58 KiB (1%) 11
["extreme_filter", "Gray{N0f8}", "512×512", "r1_generic"] 558.938 μs (5%) 256.62 KiB (1%) 10
["extreme_filter", "Gray{N0f8}", "512×512", "r5_diamond"] 427.346 μs (5%) 512.69 KiB (1%) 13
["extreme_filter", "Gray{N0f8}", "512×512", "r5_generic"] 6.998 ms (5%) 260.12 KiB (1%) 12
["extreme_filter", "Int64", "256×256", "r1_diamond"] 19.151 μs (5%) 512.52 KiB (1%) 11
["extreme_filter", "Int64", "256×256", "r1_generic"] 160.938 μs (5%) 512.56 KiB (1%) 10
["extreme_filter", "Int64", "256×256", "r5_diamond"] 249.505 μs (5%) 1.00 MiB (1%) 13
["extreme_filter", "Int64", "256×256", "r5_generic"] 1.794 ms (5%) 516.06 KiB (1%) 12
["extreme_filter", "Int64", "512×512", "r1_diamond"] 14.343 ms (5%) 7.76 MiB (1%) 119028
["extreme_filter", "Int64", "512×512", "r1_generic"] 633.024 μs (5%) 2.00 MiB (1%) 10
["extreme_filter", "Int64", "512×512", "r5_diamond"] 825.951 μs (5%) 4.00 MiB (1%) 13
["extreme_filter", "Int64", "512×512", "r5_generic"] 7.332 ms (5%) 2.00 MiB (1%) 12
["feature_transform", "feature_transform"] 2.016 ms (5%) 1.45 MiB (1%) 268

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Maxtree", "area_opening"]
  • ["connected"]
  • ["convexhull"]
  • ["dilatation_and_erosion", "erode", "Gray{Float32}"]
  • ["dilatation_and_erosion", "erode", "Gray{N0f8}"]
  • ["dilatation_and_erosion", "opening", "Gray{Float32}"]
  • ["dilatation_and_erosion", "opening", "Gray{N0f8}"]
  • ["extreme_filter", "Bool", "256×256"]
  • ["extreme_filter", "Bool", "512×512"]
  • ["extreme_filter", "Gray{Float32}", "256×256"]
  • ["extreme_filter", "Gray{Float32}", "512×512"]
  • ["extreme_filter", "Gray{N0f8}", "256×256"]
  • ["extreme_filter", "Gray{N0f8}", "512×512"]
  • ["extreme_filter", "Int64", "256×256"]
  • ["extreme_filter", "Int64", "512×512"]
  • ["feature_transform"]

Julia versioninfo

Julia Version 1.8.0-rc1
Commit 6368fdc656 (2022-05-27 18:33 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64
  CPU: 12th Gen Intel(R) Core(TM) i9-12900K: 
                 speed         user         nice          sys         idle          irq
       #1-24  3187 MHz     144321 s        786 s      37297 s  145789540 s          0 s
  Memory: 15.621490478515625 GB (3204.171875 MB free)
  Uptime: 608308.32 sec
  Load Avg:  1.73  1.6  1.31
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 16 on 24 virtual cores

Baseline result

Benchmark Report for /home/jc/Documents/Julia/ImageMorphology.jl

Job Properties

  • Time of benchmark: 16 Jun 2022 - 6:11
  • Package commit: e7c206
  • Julia commit: 6368fd
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["Maxtree", "area_opening", "256×256"] 5.994 ms (5%) 2.07 MiB (1%) 23
["Maxtree", "area_opening", "512×512"] 29.193 ms (5%) 8.26 MiB (1%) 24
["connected", "label_components"] 350.442 μs (5%) 521.89 KiB (1%) 43
["convexhull", "convexhull"] 160.081 μs (5%) 529.77 KiB (1%) 16528
["dilatation_and_erosion", "erode", "Gray{Float32}", "256×256"] 761.636 μs (5%) 256.91 KiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{Float32}", "512×512"] 3.066 ms (5%) 1.00 MiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{N0f8}", "256×256"] 246.054 μs (5%) 64.97 KiB (1%) 23
["dilatation_and_erosion", "erode", "Gray{N0f8}", "512×512"] 980.504 μs (5%) 256.97 KiB (1%) 23
["dilatation_and_erosion", "opening", "Gray{Float32}", "256×256"] 1.528 ms (5%) 513.59 KiB (1%) 37
["dilatation_and_erosion", "opening", "Gray{Float32}", "512×512"] 6.129 ms (5%) 2.00 MiB (1%) 37
["dilatation_and_erosion", "opening", "Gray{N0f8}", "256×256"] 498.928 μs (5%) 129.72 KiB (1%) 37
["dilatation_and_erosion", "opening", "Gray{N0f8}", "512×512"] 1.986 ms (5%) 513.72 KiB (1%) 37
["extreme_filter", "Bool", "256×256", "r1_bool_best"] 105.939 μs (5%) 65.11 KiB (1%) 17
["extreme_filter", "Bool", "256×256", "r1_bool_random"] 181.575 μs (5%) 65.11 KiB (1%) 17
["extreme_filter", "Bool", "256×256", "r1_bool_worst"] 188.422 μs (5%) 65.11 KiB (1%) 17
["extreme_filter", "Bool", "256×256", "r1_diamond_best"] 106.184 μs (5%) 64.75 KiB (1%) 13
["extreme_filter", "Bool", "256×256", "r1_diamond_random"] 501.492 μs (5%) 64.75 KiB (1%) 13
["extreme_filter", "Bool", "256×256", "r1_diamond_worst"] 120.166 μs (5%) 64.70 KiB (1%) 22
["extreme_filter", "Bool", "256×256", "r5_bool_best"] 120.421 μs (5%) 72.11 KiB (1%) 21
["extreme_filter", "Bool", "256×256", "r5_bool_random"] 527.423 μs (5%) 72.11 KiB (1%) 21
["extreme_filter", "Bool", "256×256", "r5_bool_worst"] 1.061 ms (5%) 72.11 KiB (1%) 21
["extreme_filter", "Bool", "256×256", "r5_diamond_best"] 117.004 μs (5%) 68.25 KiB (1%) 15
["extreme_filter", "Bool", "256×256", "r5_diamond_random"] 531.057 μs (5%) 68.25 KiB (1%) 15
["extreme_filter", "Bool", "256×256", "r5_diamond_worst"] 663.545 μs (5%) 130.56 KiB (1%) 88
["extreme_filter", "Bool", "512×512", "r1_bool_best"] 431.636 μs (5%) 257.11 KiB (1%) 17
["extreme_filter", "Bool", "512×512", "r1_bool_random"] 756.972 μs (5%) 257.11 KiB (1%) 17
["extreme_filter", "Bool", "512×512", "r1_bool_worst"] 757.122 μs (5%) 257.11 KiB (1%) 17
["extreme_filter", "Bool", "512×512", "r1_diamond_best"] 429.808 μs (5%) 256.75 KiB (1%) 13
["extreme_filter", "Bool", "512×512", "r1_diamond_random"] 2.157 ms (5%) 256.75 KiB (1%) 13
["extreme_filter", "Bool", "512×512", "r1_diamond_worst"] 558.812 μs (5%) 256.70 KiB (1%) 22
["extreme_filter", "Bool", "512×512", "r5_bool_best"] 457.812 μs (5%) 264.11 KiB (1%) 21
["extreme_filter", "Bool", "512×512", "r5_bool_random"] 2.227 ms (5%) 264.11 KiB (1%) 21
["extreme_filter", "Bool", "512×512", "r5_bool_worst"] 4.195 ms (5%) 264.11 KiB (1%) 21
["extreme_filter", "Bool", "512×512", "r5_diamond_best"] 443.371 μs (5%) 260.25 KiB (1%) 15
["extreme_filter", "Bool", "512×512", "r5_diamond_random"] 2.240 ms (5%) 260.25 KiB (1%) 15
["extreme_filter", "Bool", "512×512", "r5_diamond_worst"] 2.791 ms (5%) 514.56 KiB (1%) 88
["extreme_filter", "Gray{Float32}", "256×256", "r1_diamond"] 168.405 μs (5%) 256.52 KiB (1%) 20
["extreme_filter", "Gray{Float32}", "256×256", "r1_generic"] 362.001 μs (5%) 256.56 KiB (1%) 10
["extreme_filter", "Gray{Float32}", "256×256", "r5_diamond"] 825.942 μs (5%) 514.31 KiB (1%) 86
["extreme_filter", "Gray{Float32}", "256×256", "r5_generic"] 7.994 ms (5%) 260.06 KiB (1%) 12
["extreme_filter", "Gray{Float32}", "512×512", "r1_diamond"] 723.147 μs (5%) 1.00 MiB (1%) 20
["extreme_filter", "Gray{Float32}", "512×512", "r1_generic"] 1.451 ms (5%) 1.00 MiB (1%) 10
["extreme_filter", "Gray{Float32}", "512×512", "r5_diamond"] 3.573 ms (5%) 2.00 MiB (1%) 86
["extreme_filter", "Gray{Float32}", "512×512", "r5_generic"] 32.281 ms (5%) 1.00 MiB (1%) 12
["extreme_filter", "Gray{N0f8}", "256×256", "r1_diamond"] 126.488 μs (5%) 64.58 KiB (1%) 20
["extreme_filter", "Gray{N0f8}", "256×256", "r1_generic"] 149.609 μs (5%) 64.62 KiB (1%) 10
["extreme_filter", "Gray{N0f8}", "256×256", "r5_diamond"] 864.293 μs (5%) 130.44 KiB (1%) 86
["extreme_filter", "Gray{N0f8}", "256×256", "r5_generic"] 1.756 ms (5%) 68.12 KiB (1%) 12
["extreme_filter", "Gray{N0f8}", "512×512", "r1_diamond"] 546.047 μs (5%) 256.58 KiB (1%) 20
["extreme_filter", "Gray{N0f8}", "512×512", "r1_generic"] 586.097 μs (5%) 256.62 KiB (1%) 10
["extreme_filter", "Gray{N0f8}", "512×512", "r5_diamond"] 3.544 ms (5%) 514.44 KiB (1%) 86
["extreme_filter", "Gray{N0f8}", "512×512", "r5_generic"] 7.024 ms (5%) 260.12 KiB (1%) 12
["extreme_filter", "Int64", "256×256", "r1_diamond"] 174.521 μs (5%) 512.52 KiB (1%) 20
["extreme_filter", "Int64", "256×256", "r1_generic"] 163.549 μs (5%) 512.56 KiB (1%) 10
["extreme_filter", "Int64", "256×256", "r5_diamond"] 803.115 μs (5%) 1.00 MiB (1%) 86
["extreme_filter", "Int64", "256×256", "r5_generic"] 1.793 ms (5%) 516.06 KiB (1%) 12
["extreme_filter", "Int64", "512×512", "r1_diamond"] 1.131 ms (5%) 2.00 MiB (1%) 20
["extreme_filter", "Int64", "512×512", "r1_generic"] 640.936 μs (5%) 2.00 MiB (1%) 10
["extreme_filter", "Int64", "512×512", "r5_diamond"] 5.667 ms (5%) 4.00 MiB (1%) 86
["extreme_filter", "Int64", "512×512", "r5_generic"] 7.065 ms (5%) 2.00 MiB (1%) 12
["feature_transform", "feature_transform"] 3.036 ms (5%) 1.45 MiB (1%) 268

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["Maxtree", "area_opening"]
  • ["connected"]
  • ["convexhull"]
  • ["dilatation_and_erosion", "erode", "Gray{Float32}"]
  • ["dilatation_and_erosion", "erode", "Gray{N0f8}"]
  • ["dilatation_and_erosion", "opening", "Gray{Float32}"]
  • ["dilatation_and_erosion", "opening", "Gray{N0f8}"]
  • ["extreme_filter", "Bool", "256×256"]
  • ["extreme_filter", "Bool", "512×512"]
  • ["extreme_filter", "Gray{Float32}", "256×256"]
  • ["extreme_filter", "Gray{Float32}", "512×512"]
  • ["extreme_filter", "Gray{N0f8}", "256×256"]
  • ["extreme_filter", "Gray{N0f8}", "512×512"]
  • ["extreme_filter", "Int64", "256×256"]
  • ["extreme_filter", "Int64", "512×512"]
  • ["feature_transform"]

Julia versioninfo

Julia Version 1.8.0-rc1
Commit 6368fdc656 (2022-05-27 18:33 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 20.04.3 LTS
  uname: Linux 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64
  CPU: 12th Gen Intel(R) Core(TM) i9-12900K: 
                 speed         user         nice          sys         idle          irq
       #1-24  3187 MHz     147638 s        786 s      37426 s  145863304 s          0 s
  Memory: 15.621490478515625 GB (3306.55078125 MB free)
  Uptime: 608630.11 sec
  Load Avg:  1.13  1.23  1.22
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, goldmont)
  Threads: 16 on 24 virtual cores

Runtime information

Runtime Info
BLAS #threads 12
BLAS.vendor() openblas64
Sys.CPU_THREADS 24

lscpu output:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              12
Socket(s):                       1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           151
Model name:                      12th Gen Intel(R) Core(TM) i9-12900K
Stepping:                        2
CPU MHz:                         3187.201
BogoMIPS:                        6374.40
Virtualization:                  VT-x
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       576 KiB
L1i cache:                       384 KiB
L2 cache:                        15 MiB
L3 cache:                        30 MiB
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm serialize flush_l1d arch_capabilities
Cpu Property Value
Brand 12th Gen Intel(R) Core(TM) i9-12900K
Vendor :Intel
Architecture :UnknownIntel
Model Family: 0x06, Model: 0x97, Stepping: 0x02, Type: 0x00
Cores 16 physical cores, 32 logical cores (on executing CPU)
Hyperthreading hardware capability detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (48, 1280, 30720) kbytes
64 byte cache line size
Address Size 48 bits virtual, 46 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC runs at constant rate (invariant from clock frequency)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@johnnychen94
Copy link
Member Author

johnnychen94 commented Jun 15, 2022

Int64 and Float64 have some strange histogram, this might be a LoopVectorization issue I guess. I'm wondering if @chriselrod has seen such behavior before?

@johnnychen94
Copy link
Member Author

Anyway, I'm going to merge this since this implementation turns out to be simpler and more efficient than #90 and it passes the test.

@johnnychen94 johnnychen94 merged commit d5075d8 into master Jun 15, 2022
@johnnychen94 johnnychen94 deleted the jc/lv_c42d branch June 15, 2022 23:20
johnnychen94 added a commit that referenced this pull request Jun 15, 2022
* perf: AVX implementation for 2D diamond SE

* add more comprehensive test for diamond shape optimization

Co-Authored-By: Retornaz Thomas <[email protected]>
@ThomasRetornaz
Copy link
Collaborator

Thanks @johnnychen94
Do you plan to do the 2D box2D counterprat (i could try if you want?)
I will discard the PR #90

@johnnychen94
Copy link
Member Author

johnnychen94 commented Jun 16, 2022

I plan to tag v0.4.0 later today and then task switch to ImageFiltering.jl for the next 1-2 weeks and then DitherPunk.jl, so it would be great if you can try it out.
In the meantime, I'll slowly pick up those pending PRs (60-62). #60 first, I think.

This PR actually doesn't work on 32bit machines, unsure why. I decided to disable @turbo in #100 for 32bit machines.

In contrast, the single thread version in PR #90 seems just work in 32bit machines.
We can revisit #90 codes if needed but that's of low priority, to be honest.

@chriselrod
Copy link

If you have a few minimal examples, I can take a loot.

@johnnychen94
Copy link
Member Author

On dilate(img, strel_diamond(img)) performance, there're more room for optimization but is relatively satisfying already. The OpenCV result on 1024x1024 indicates a more efficient algorithm.

version size time (ms)
MATLAB (512, 512) 0.145
MATLAB (1024, 1024) 0.435
Julia (UInt8) (512, 512) 0.029
Julia (UInt8) (1024, 1024) 0.103
Julia (Float64) (512, 512) 0.068
Julia (Float64) (1024, 1024) 0.297
OpenCV (UInt8) (512, 512) 0.020
OpenCV (UInt8) (1024, 1024) 0.072
OpenCV (Float64) (512, 512) 0.205
OpenCV (Float64) (1024, 1024) 0.924

@ThomasRetornaz
Copy link
Collaborator

FYI i spend time in OpenCV implem
Basically the OpenCV use

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants