Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add imfilter benchmarks #158

Merged
merged 3 commits into from
Mar 4, 2020
Merged

Add imfilter benchmarks #158

merged 3 commits into from
Mar 4, 2020

Conversation

timholy
Copy link
Member

@timholy timholy commented Mar 3, 2020

I was wondering if GSOC applicants might want to do this, but there seems to be some interest in speeding this package up so we need these now. This should be good enough to get us going, anyway. CC @stillyslalom, @chriselrod.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2020

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmarks:
    • Target: 3 Mar 2020 - 23:08
    • Baseline: 3 Mar 2020 - 23:09
  • Package commits:
    • Target: 18e1c5
    • Baseline: 75f89d
  • Julia commits:
    • Target: 2d5741
    • Baseline: 2d5741
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["mapwindow", "extrema"] 1.06 (5%) ❌ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapwindow"]

Julia versioninfo

Target

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      26679 s          0 s       1181 s       7184 s          0 s
       #2  2095 MHz       6240 s          0 s       1297 s      28140 s          0 s
       
  Memory: 6.782737731933594 GB (3529.234375 MB free)
  Uptime: 369.0 sec
  Load Avg:  1.0126953125  0.83544921875  0.41650390625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Baseline

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      31008 s          0 s       1213 s       7222 s          0 s
       #2  2095 MHz       6320 s          0 s       1321 s      32431 s          0 s
       
  Memory: 6.782737731933594 GB (3601.05859375 MB free)
  Uptime: 414.0 sec
  Load Avg:  1.00439453125  0.8603515625  0.447265625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Target result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmark: 3 Mar 2020 - 23:8
  • Package commit: 18e1c5
  • Julia commit: 2d5741
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["2d", "IIRGaussian_F32"] 149.800 μs (5%) 131.02 KiB (1%) 128
["2d", "IIRGaussian_GrayF32"] 148.801 μs (5%) 131.02 KiB (1%) 128
["2d", "IIRGaussian_GrayN0f8"] 151.300 μs (5%) 131.02 KiB (1%) 128
["2d", "IIRGaussian_N0f8"] 151.600 μs (5%) 131.02 KiB (1%) 128
["2d", "IIRGaussian_RGBF32"] 218.301 μs (5%) 365.39 KiB (1%) 128
["2d", "IIRGaussian_RGBN0f8"] 227.901 μs (5%) 365.39 KiB (1%) 128
["2d", "dense_F32"] 83.000 μs (5%) 162.17 KiB (1%) 24
["2d", "dense_GrayF32"] 83.800 μs (5%) 162.17 KiB (1%) 24
["2d", "dense_GrayN0f8"] 84.800 μs (5%) 162.17 KiB (1%) 24
["2d", "dense_N0f8"] 84.300 μs (5%) 162.17 KiB (1%) 24
["2d", "dense_RGBF32"] 126.001 μs (5%) 480.98 KiB (1%) 24
["2d", "dense_RGBN0f8"] 128.100 μs (5%) 480.98 KiB (1%) 24
["2d", "factored_F32"] 20.901 μs (5%) 243.59 KiB (1%) 27
["2d", "factored_GrayF32"] 32.000 μs (5%) 243.59 KiB (1%) 27
["2d", "factored_GrayN0f8"] 32.400 μs (5%) 243.59 KiB (1%) 27
["2d", "factored_N0f8"] 23.500 μs (5%) 243.59 KiB (1%) 27
["2d", "factored_RGBF32"] 64.600 μs (5%) 724.97 KiB (1%) 27
["2d", "factored_RGBN0f8"] 73.101 μs (5%) 724.97 KiB (1%) 27
["mapwindow", "cheap f, tiny window"] 9.000 μs (5%) 8.73 KiB (1%) 20
["mapwindow", "expensive f"] 1.235 ms (5%) 1.30 MiB (1%) 18998
["mapwindow", "extrema"] 48.600 μs (5%) 45.73 KiB (1%) 60
["mapwindow", "maximum"] 199.000 μs (5%) 233.22 KiB (1%) 4059
["mapwindow", "mean, large window"] 1.322 ms (5%) 1.39 MiB (1%) 24634
["mapwindow", "mean, small window"] 12.200 μs (5%) 9.84 KiB (1%) 43
["mapwindow", "median!"] 600.602 μs (5%) 233.38 KiB (1%) 4063
["mapwindow", "minimum"] 200.500 μs (5%) 233.25 KiB (1%) 4060

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["2d"]
  • ["mapwindow"]

Julia versioninfo

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      26679 s          0 s       1181 s       7184 s          0 s
       #2  2095 MHz       6240 s          0 s       1297 s      28140 s          0 s
       
  Memory: 6.782737731933594 GB (3529.234375 MB free)
  Uptime: 369.0 sec
  Load Avg:  1.0126953125  0.83544921875  0.41650390625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Baseline result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmark: 3 Mar 2020 - 23:9
  • Package commit: 75f89d
  • Julia commit: 2d5741
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapwindow", "cheap f, tiny window"] 9.000 μs (5%) 8.73 KiB (1%) 20
["mapwindow", "expensive f"] 1.221 ms (5%) 1.30 MiB (1%) 18998
["mapwindow", "extrema"] 45.700 μs (5%) 45.73 KiB (1%) 60
["mapwindow", "mean, large window"] 1.316 ms (5%) 1.39 MiB (1%) 24634
["mapwindow", "mean, small window"] 11.800 μs (5%) 9.84 KiB (1%) 43
["mapwindow", "median!"] 599.302 μs (5%) 233.41 KiB (1%) 4064

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapwindow"]

Julia versioninfo

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz: 
              speed         user         nice          sys         idle          irq
       #1  2095 MHz      31008 s          0 s       1213 s       7222 s          0 s
       #2  2095 MHz       6320 s          0 s       1321 s      32431 s          0 s
       
  Memory: 6.782737731933594 GB (3601.05859375 MB free)
  Uptime: 414.0 sec
  Load Avg:  1.00439453125  0.8603515625  0.447265625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Stepping:            4
CPU MHz:             2095.078
BogoMIPS:            4190.15
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves
Cpu Property Value
Brand Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
Vendor :Intel
Architecture :Skylake
Model Family: 0x06, Model: 0x55, Stepping: 0x04, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 1024, 36608) kbytes
64 byte cache line size
Address Size 48 bits virtual, 44 bits physical
SIMD 512 bit = 64 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@stillyslalom
Copy link
Contributor

Thanks for getting started on this, Tim! I'm surprised that factored_GrayN0f8 takes 50% longer than factored_N0f8. To focus on the algorithms, might it be better to benchmark preallocated imfilter! & mapwindow!? Some additional benchmarks that might be worthwhile:

  • Filtering of 1D and 3D arrays
  • Larger kernels (e.g. Kernel.gaussian(3) gives a 13 x 13 kernel)
  • Larger images (100 x 100 N0f8 can fit in L1 cache, 2048 x 2048 Float64 can't fit in cache at all)

@johnnychen94 johnnychen94 mentioned this pull request Mar 4, 2020
7 tasks
@timholy
Copy link
Member Author

timholy commented Mar 4, 2020

might it be better to benchmark preallocated imfilter! & mapwindow!

I don't think it will matter, the allocation of the output should be trivial. And for the algorithms that use TiledIteration it's going to have to allocate tile buffers anyway.

The rest seems sensible. It's much slower now, but you're right that's all stuff we need to test. Once it finishes I'll commit the tuning file, too, just so it doesn't take forever to run.

@codecov
Copy link

codecov bot commented Mar 4, 2020

Codecov Report

Merging #158 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #158   +/-   ##
=======================================
  Coverage   91.65%   91.65%           
=======================================
  Files           9        9           
  Lines        1222     1222           
=======================================
  Hits         1120     1120           
  Misses        102      102

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75f89d0...916a8eb. Read the comment docs.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2020

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmarks:
    • Target: 4 Mar 2020 - 01:46
    • Baseline: 4 Mar 2020 - 01:47
  • Package commits:
    • Target: d52116
    • Baseline: 75f89d
  • Julia commits:
    • Target: 2d5741
    • Baseline: 2d5741
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapwindow"]

Julia versioninfo

Target

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      57653 s          0 s       4005 s      17393 s          0 s
       #2  2294 MHz      16047 s          0 s       2308 s      60474 s          0 s
       
  Memory: 6.782737731933594 GB (2755.68359375 MB free)
  Uptime: 808.0 sec
  Load Avg:  1.0810546875  1.03369140625  0.71240234375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Baseline

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      65415 s          0 s       4075 s      18872 s          0 s
       #2  2294 MHz      17618 s          0 s       2352 s      68175 s          0 s
       
  Memory: 6.782737731933594 GB (3566.00390625 MB free)
  Uptime: 902.0 sec
  Load Avg:  1.0322265625  1.0322265625  0.748046875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Target result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmark: 4 Mar 2020 - 1:46
  • Package commit: d52116
  • Julia commit: 2d5741
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["imfilter", "FFT_F32_100×100"] 626.801 μs (5%) 1.35 MiB (1%) 246
["imfilter", "FFT_F32_100×100×100"] 108.458 ms (5%) 2.445 ms 169.15 MiB (1%) 281
["imfilter", "FFT_F32_2048"] 373.401 μs (5%) 183.14 KiB (1%) 219
["imfilter", "FFT_F32_2048×2048"] 509.327 ms (5%) 8.043 ms 397.06 MiB (1%) 265
["imfilter", "FFT_GrayF32_100×100"] 678.002 μs (5%) 1.35 MiB (1%) 236
["imfilter", "FFT_GrayF32_100×100×100"] 111.910 ms (5%) 2.881 ms 169.15 MiB (1%) 268
["imfilter", "FFT_GrayF32_2048"] 378.601 μs (5%) 182.83 KiB (1%) 212
["imfilter", "FFT_GrayF32_2048×2048"] 524.861 ms (5%) 4.301 ms 397.06 MiB (1%) 255
["imfilter", "FFT_GrayN0f8_100×100"] 675.101 μs (5%) 1.35 MiB (1%) 236
["imfilter", "FFT_GrayN0f8_100×100×100"] 112.803 ms (5%) 2.709 ms 169.15 MiB (1%) 268
["imfilter", "FFT_GrayN0f8_2048"] 384.501 μs (5%) 182.83 KiB (1%) 212
["imfilter", "FFT_GrayN0f8_2048×2048"] 506.524 ms (5%) 8.403 ms 397.06 MiB (1%) 255
["imfilter", "FFT_N0f8_100×100"] 629.801 μs (5%) 1.35 MiB (1%) 246
["imfilter", "FFT_N0f8_100×100×100"] 107.707 ms (5%) 2.495 ms 169.15 MiB (1%) 281
["imfilter", "FFT_N0f8_2048"] 369.902 μs (5%) 183.14 KiB (1%) 219
["imfilter", "FFT_N0f8_2048×2048"] 475.428 ms (5%) 5.851 ms 397.06 MiB (1%) 265
["imfilter", "FFT_RGBF32_100×100"] 1.986 ms (5%) 3.27 MiB (1%) 261
["imfilter", "FFT_RGBF32_100×100×100"] 448.336 ms (5%) 4.221 ms 410.41 MiB (1%) 281
["imfilter", "FFT_RGBF32_2048"] 611.101 μs (5%) 387.14 KiB (1%) 231
["imfilter", "FFT_RGBF32_2048×2048"] 1.631 s (5%) 52.463 ms 972.07 MiB (1%) 281
["imfilter", "FFT_RGBN0f8_100×100"] 2.064 ms (5%) 3.27 MiB (1%) 261
["imfilter", "FFT_RGBN0f8_100×100×100"] 456.518 ms (5%) 6.211 ms 410.41 MiB (1%) 281
["imfilter", "FFT_RGBN0f8_2048"] 613.303 μs (5%) 387.13 KiB (1%) 231
["imfilter", "FFT_RGBN0f8_2048×2048"] 1.571 s (5%) 9.583 ms 972.07 MiB (1%) 281
["imfilter", "IIRGaussian_F32_100×100"] 183.100 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_F32_100×100×100"] 34.079 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_F32_2048"] 25.600 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_F32_2048×2048"] 77.286 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_GrayF32_100×100"] 185.200 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_GrayF32_100×100×100"] 34.266 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_GrayF32_2048"] 26.700 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_GrayF32_2048×2048"] 77.346 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_GrayN0f8_100×100"] 192.801 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_GrayN0f8_100×100×100"] 34.072 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_GrayN0f8_2048"] 26.600 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_GrayN0f8_2048×2048"] 78.941 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_N0f8_100×100"] 188.301 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_N0f8_100×100×100"] 34.407 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_N0f8_2048"] 27.400 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_N0f8_2048×2048"] 79.174 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_RGBF32_100×100"] 288.400 μs (5%) 365.39 KiB (1%) 128
["imfilter", "IIRGaussian_RGBF32_100×100×100"] 57.597 ms (5%) 46.86 MiB (1%) 10139
["imfilter", "IIRGaussian_RGBF32_2048"] 30.200 μs (5%) 64.81 KiB (1%) 17
["imfilter", "IIRGaussian_RGBF32_2048×2048"] 161.669 ms (5%) 144.25 MiB (1%) 2076
["imfilter", "IIRGaussian_RGBN0f8_100×100"] 297.101 μs (5%) 365.39 KiB (1%) 128
["imfilter", "IIRGaussian_RGBN0f8_100×100×100"] 58.445 ms (5%) 46.86 MiB (1%) 10139
["imfilter", "IIRGaussian_RGBN0f8_2048"] 32.700 μs (5%) 64.80 KiB (1%) 17
["imfilter", "IIRGaussian_RGBN0f8_2048×2048"] 164.320 ms (5%) 144.25 MiB (1%) 2076
["imfilter", "denselarge_F32_100×100"] 1.659 ms (5%) 179.22 KiB (1%) 25
["imfilter", "denselarge_F32_100×100×100"] 2.838 s (5%) 18.35 MiB (1%) 32
["imfilter", "denselarge_F32_2048"] 21.000 μs (5%) 49.06 KiB (1%) 17
["imfilter", "denselarge_F32_2048×2048"] 857.746 ms (5%) 64.41 MiB (1%) 27
["imfilter", "denselarge_GrayF32_100×100"] 1.735 ms (5%) 179.22 KiB (1%) 25
["imfilter", "denselarge_GrayF32_100×100×100"] 2.813 s (5%) 18.35 MiB (1%) 32
["imfilter", "denselarge_GrayF32_2048"] 26.100 μs (5%) 49.06 KiB (1%) 17
["imfilter", "denselarge_GrayF32_2048×2048"] 871.035 ms (5%) 64.41 MiB (1%) 27
["imfilter", "denselarge_GrayN0f8_100×100"] 1.760 ms (5%) 179.22 KiB (1%) 25
["imfilter", "denselarge_GrayN0f8_100×100×100"] 2.826 s (5%) 18.35 MiB (1%) 32
["imfilter", "denselarge_GrayN0f8_2048"] 25.400 μs (5%) 49.06 KiB (1%) 17
["imfilter", "denselarge_GrayN0f8_2048×2048"] 848.923 ms (5%) 64.41 MiB (1%) 27
["imfilter", "denselarge_N0f8_100×100"] 1.750 ms (5%) 179.22 KiB (1%) 25
["imfilter", "denselarge_N0f8_100×100×100"] 2.807 s (5%) 18.35 MiB (1%) 32
["imfilter", "denselarge_N0f8_2048"] 20.600 μs (5%) 49.06 KiB (1%) 17
["imfilter", "denselarge_N0f8_2048×2048"] 846.793 ms (5%) 64.41 MiB (1%) 27
["imfilter", "denselarge_RGBF32_100×100"] 1.983 ms (5%) 531.47 KiB (1%) 25
["imfilter", "denselarge_RGBF32_100×100×100"] 4.289 s (5%) 2.190 ms 55.05 MiB (1%) 32
["imfilter", "denselarge_RGBF32_2048"] 42.000 μs (5%) 113.20 KiB (1%) 18
["imfilter", "denselarge_RGBF32_2048×2048"] 978.395 ms (5%) 2.794 ms 193.16 MiB (1%) 27
["imfilter", "denselarge_RGBN0f8_100×100"] 1.805 ms (5%) 531.47 KiB (1%) 25
["imfilter", "denselarge_RGBN0f8_100×100×100"] 4.228 s (5%) 55.05 MiB (1%) 32
["imfilter", "denselarge_RGBN0f8_2048"] 41.300 μs (5%) 113.19 KiB (1%) 18
["imfilter", "denselarge_RGBN0f8_2048×2048"] 991.667 ms (5%) 2.938 ms 193.16 MiB (1%) 27
["imfilter", "densesmall_F32_100×100"] 107.100 μs (5%) 162.19 KiB (1%) 25
["imfilter", "densesmall_F32_100×100×100"] 49.962 ms (5%) 15.73 MiB (1%) 32
["imfilter", "densesmall_F32_2048"] 11.525 μs (5%) 32.92 KiB (1%) 16
["imfilter", "densesmall_F32_2048×2048"] 58.299 ms (5%) 64.10 MiB (1%) 27
["imfilter", "densesmall_GrayF32_100×100"] 103.200 μs (5%) 162.19 KiB (1%) 25
["imfilter", "densesmall_GrayF32_100×100×100"] 49.115 ms (5%) 15.73 MiB (1%) 32
["imfilter", "densesmall_GrayF32_2048"] 12.800 μs (5%) 32.92 KiB (1%) 16
["imfilter", "densesmall_GrayF32_2048×2048"] 57.560 ms (5%) 64.10 MiB (1%) 27
["imfilter", "densesmall_GrayN0f8_100×100"] 112.001 μs (5%) 162.19 KiB (1%) 25
["imfilter", "densesmall_GrayN0f8_100×100×100"] 48.759 ms (5%) 15.73 MiB (1%) 32
["imfilter", "densesmall_GrayN0f8_2048"] 12.867 μs (5%) 32.92 KiB (1%) 16
["imfilter", "densesmall_GrayN0f8_2048×2048"] 63.679 ms (5%) 64.10 MiB (1%) 27
["imfilter", "densesmall_N0f8_100×100"] 107.401 μs (5%) 162.19 KiB (1%) 25
["imfilter", "densesmall_N0f8_100×100×100"] 49.793 ms (5%) 15.73 MiB (1%) 32
["imfilter", "densesmall_N0f8_2048"] 11.975 μs (5%) 32.92 KiB (1%) 16
["imfilter", "densesmall_N0f8_2048×2048"] 53.717 ms (5%) 64.10 MiB (1%) 27
["imfilter", "densesmall_RGBF32_100×100"] 172.201 μs (5%) 481.00 KiB (1%) 25
["imfilter", "densesmall_RGBF32_100×100×100"] 78.060 ms (5%) 47.18 MiB (1%) 32
["imfilter", "densesmall_RGBF32_2048"] 17.700 μs (5%) 64.83 KiB (1%) 18
["imfilter", "densesmall_RGBF32_2048×2048"] 134.021 ms (5%) 192.22 MiB (1%) 27
["imfilter", "densesmall_RGBN0f8_100×100"] 174.300 μs (5%) 481.00 KiB (1%) 25
["imfilter", "densesmall_RGBN0f8_100×100×100"] 78.040 ms (5%) 47.18 MiB (1%) 32
["imfilter", "densesmall_RGBN0f8_2048"] 20.000 μs (5%) 64.81 KiB (1%) 18
["imfilter", "densesmall_RGBN0f8_2048×2048"] 130.839 ms (5%) 192.22 MiB (1%) 27
["imfilter", "factored_F32_100×100"] 40.600 μs (5%) 160.53 KiB (1%) 25
["imfilter", "factored_F32_100×100×100"] 4.135 ms (5%) 15.42 MiB (1%) 32
["imfilter", "factored_F32_2048"] 9.520 μs (5%) 48.88 KiB (1%) 17
["imfilter", "factored_F32_2048×2048"] 27.470 ms (5%) 64.06 MiB (1%) 26
["imfilter", "factored_GrayF32_100×100"] 41.200 μs (5%) 160.53 KiB (1%) 25
["imfilter", "factored_GrayF32_100×100×100"] 4.283 ms (5%) 15.42 MiB (1%) 32
["imfilter", "factored_GrayF32_2048"] 11.850 μs (5%) 48.88 KiB (1%) 17
["imfilter", "factored_GrayF32_2048×2048"] 35.673 ms (5%) 64.06 MiB (1%) 26
["imfilter", "factored_GrayN0f8_100×100"] 42.500 μs (5%) 160.53 KiB (1%) 25
["imfilter", "factored_GrayN0f8_100×100×100"] 4.255 ms (5%) 15.42 MiB (1%) 32
["imfilter", "factored_GrayN0f8_2048"] 11.650 μs (5%) 48.88 KiB (1%) 17
["imfilter", "factored_GrayN0f8_2048×2048"] 22.381 ms (5%) 64.06 MiB (1%) 26
["imfilter", "factored_N0f8_100×100"] 40.101 μs (5%) 160.53 KiB (1%) 25
["imfilter", "factored_N0f8_100×100×100"] 4.078 ms (5%) 15.42 MiB (1%) 32
["imfilter", "factored_N0f8_2048"] 11.020 μs (5%) 48.88 KiB (1%) 17
["imfilter", "factored_N0f8_2048×2048"] 28.537 ms (5%) 64.06 MiB (1%) 26
["imfilter", "factored_RGBF32_100×100"] 77.200 μs (5%) 476.16 KiB (1%) 25
["imfilter", "factored_RGBF32_100×100×100"] 9.726 ms (5%) 46.24 MiB (1%) 32
["imfilter", "factored_RGBF32_2048"] 20.800 μs (5%) 112.83 KiB (1%) 18
["imfilter", "factored_RGBF32_2048×2048"] 85.658 ms (5%) 192.13 MiB (1%) 26
["imfilter", "factored_RGBN0f8_100×100"] 75.301 μs (5%) 476.16 KiB (1%) 25
["imfilter", "factored_RGBN0f8_100×100×100"] 8.358 ms (5%) 46.24 MiB (1%) 32
["imfilter", "factored_RGBN0f8_2048"] 20.200 μs (5%) 112.81 KiB (1%) 18
["imfilter", "factored_RGBN0f8_2048×2048"] 83.331 ms (5%) 192.13 MiB (1%) 26
["mapwindow", "cheap f, tiny window"] 11.825 μs (5%) 8.73 KiB (1%) 20
["mapwindow", "expensive f"] 1.588 ms (5%) 1.30 MiB (1%) 18998
["mapwindow", "extrema"] 70.900 μs (5%) 45.73 KiB (1%) 60
["mapwindow", "maximum"] 281.100 μs (5%) 233.22 KiB (1%) 4059
["mapwindow", "mean, large window"] 2.027 ms (5%) 1.39 MiB (1%) 24634
["mapwindow", "mean, small window"] 15.600 μs (5%) 9.84 KiB (1%) 43
["mapwindow", "median!"] 715.101 μs (5%) 233.38 KiB (1%) 4063
["mapwindow", "minimum"] 278.001 μs (5%) 233.25 KiB (1%) 4060

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["imfilter"]
  • ["mapwindow"]

Julia versioninfo

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      57653 s          0 s       4005 s      17393 s          0 s
       #2  2294 MHz      16047 s          0 s       2308 s      60474 s          0 s
       
  Memory: 6.782737731933594 GB (2755.68359375 MB free)
  Uptime: 808.0 sec
  Load Avg:  1.0810546875  1.03369140625  0.71240234375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Baseline result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmark: 4 Mar 2020 - 1:47
  • Package commit: 75f89d
  • Julia commit: 2d5741
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapwindow", "cheap f, tiny window"] 12.100 μs (5%) 8.73 KiB (1%) 20
["mapwindow", "expensive f"] 1.589 ms (5%) 1.30 MiB (1%) 18998
["mapwindow", "extrema"] 72.500 μs (5%) 45.73 KiB (1%) 60
["mapwindow", "mean, large window"] 2.023 ms (5%) 1.39 MiB (1%) 24634
["mapwindow", "mean, small window"] 15.500 μs (5%) 9.84 KiB (1%) 43
["mapwindow", "median!"] 717.601 μs (5%) 233.41 KiB (1%) 4064

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapwindow"]

Julia versioninfo

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      65415 s          0 s       4075 s      18872 s          0 s
       #2  2294 MHz      17618 s          0 s       2352 s      68175 s          0 s
       
  Memory: 6.782737731933594 GB (3566.00390625 MB free)
  Uptime: 902.0 sec
  Load Avg:  1.0322265625  1.0322265625  0.748046875
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             2294.685
BogoMIPS:            4589.37
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            51200K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Vendor :Intel
Architecture :Broadwell
Model Family: 0x06, Model: 0x4f, Stepping: 0x01, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 51200) kbytes
64 byte cache line size
Address Size 48 bits virtual, 44 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@stillyslalom
Copy link
Contributor

I ran a mini-shootout against OpenCV as a basis of comparison. Looks like we're within striking distance for kernel operations, but mapwindow(median!, img, (3,3)) is much slower than OpenCV's specialized medianBlur.

using Conda, PyCall, BenchmarkTools, Images, Statistics

cv2 = pyimport_conda("cv2", "opencv")
np = pyimport("numpy")
timeit = pyimport("timeit")

function pybench(expr, setup, N_init=10)
    N = round(Int, N_init/timeit.timeit(expr, setup=setup, number=N_init))
    t = timeit.timeit(expr, setup=setup, number=N)
    t/N
end

src = rand(Gray{N0f8}, 512, 512)
# median filter
cv2median = pybench("cv2.medianBlur(src, 3)", "import cv2; import numpy as np; src = np.uint8(np.random.rand(512,512))")
jlmedian = @belapsed mapwindow(median!, $src, (3,3))

# gaussian blur
cv2blur = pybench("cv2.GaussianBlur(src, (13,13), 3)", "import cv2; import numpy as np; src = np.uint8(np.random.rand(512,512))")
jlblur = @belapsed imfilter($src, KernelFactors.gaussian((3,3)))

# Sobel derivative
cv2deriv = pybench("cv2.spatialGradient(src)", "import cv2; import numpy as np; src = np.uint8(np.random.rand(512,512))")
jlderiv = @belapsed imgradients($src, KernelFactors.sobel)
jlmedian/cv2median  # 112x slower
jlblur/cv2blur      # 6x slower
jlderiv/cv2deriv    # 14x slower

@timholy
Copy link
Member Author

timholy commented Mar 4, 2020

median is expected, and would be even worse for larger filters. (I'm actually a bit surprised it's this bad for 3x3.) The lack of an efficient large-scale median filter is one of the most glaring missing elements in the ecosystem. It's fairly easy to write a fast median filter for large kernels if you have N0f8 data in any dimensionality. It gets harder for any other kind of data, because binning is no longer an obvious win. The strategy of maintaining a histogram of slices and only updating a single slice does repeated work proportional to (d-1)/d where d is dimensionality; repeating half your work in 2d seems OK but starts to get unsatisfying even in 3d. See #34 for an abandoned attempt. I think @tejus-gupta has some ideas in this domain.

The rest are also disappointing but I'd gotten a hint this was an issue. I suspect there's quite a lot we can do to address all these, fortunately.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2020

Benchmark result

Judge result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmarks:
    • Target: 4 Mar 2020 - 08:23
    • Baseline: 4 Mar 2020 - 08:25
  • Package commits:
    • Target: 8057b5
    • Baseline: 75f89d
  • Julia commits:
    • Target: 2d5741
    • Baseline: 2d5741
  • Julia command flags:
    • Target: None
    • Baseline: None
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with ❌), while a ratio less
than 1.0 denotes a possible improvement (marked with ✅). Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).

ID time ratio memory ratio
["mapwindow", "expensive f"] 1.07 (5%) ❌ 1.00 (1%)

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapwindow"]

Julia versioninfo

Target

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      24163 s          0 s       2306 s      63622 s          0 s
       #2  2294 MHz      60010 s          0 s       3582 s      26095 s          0 s
       
  Memory: 6.782737731933594 GB (2248.88671875 MB free)
  Uptime: 916.0 sec
  Load Avg:  1.08203125  1.01611328125  0.7275390625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Baseline

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      24332 s          0 s       2354 s      72433 s          0 s
       #2  2294 MHz      68900 s          0 s       3635 s      26194 s          0 s
       
  Memory: 6.782737731933594 GB (3424.765625 MB free)
  Uptime: 1007.0 sec
  Load Avg:  1.09814453125  1.0380859375  0.7646484375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Target result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmark: 4 Mar 2020 - 8:23
  • Package commit: 8057b5
  • Julia commit: 2d5741
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["imfilter", "FFT_F32_100×100"] 633.897 μs (5%) 1.35 MiB (1%) 246
["imfilter", "FFT_F32_100×100×100"] 103.285 ms (5%) 2.699 ms 169.15 MiB (1%) 281
["imfilter", "FFT_F32_2048"] 357.099 μs (5%) 183.14 KiB (1%) 219
["imfilter", "FFT_F32_2048×2048"] 468.454 ms (5%) 3.776 ms 397.06 MiB (1%) 265
["imfilter", "FFT_GrayF32_100×100"] 686.398 μs (5%) 1.35 MiB (1%) 236
["imfilter", "FFT_GrayF32_100×100×100"] 101.596 ms (5%) 3.148 ms 169.15 MiB (1%) 268
["imfilter", "FFT_GrayF32_2048"] 373.798 μs (5%) 182.83 KiB (1%) 212
["imfilter", "FFT_GrayF32_2048×2048"] 487.770 ms (5%) 7.734 ms 397.06 MiB (1%) 255
["imfilter", "FFT_GrayN0f8_100×100"] 680.694 μs (5%) 1.35 MiB (1%) 236
["imfilter", "FFT_GrayN0f8_100×100×100"] 108.558 ms (5%) 2.726 ms 169.15 MiB (1%) 268
["imfilter", "FFT_GrayN0f8_2048"] 378.796 μs (5%) 182.83 KiB (1%) 212
["imfilter", "FFT_GrayN0f8_2048×2048"] 463.125 ms (5%) 7.901 ms 397.06 MiB (1%) 255
["imfilter", "FFT_N0f8_100×100"] 632.697 μs (5%) 1.35 MiB (1%) 246
["imfilter", "FFT_N0f8_100×100×100"] 101.663 ms (5%) 2.813 ms 169.15 MiB (1%) 281
["imfilter", "FFT_N0f8_2048"] 346.598 μs (5%) 183.14 KiB (1%) 219
["imfilter", "FFT_N0f8_2048×2048"] 448.918 ms (5%) 7.796 ms 397.06 MiB (1%) 265
["imfilter", "FFT_RGBF32_100×100"] 1.945 ms (5%) 3.27 MiB (1%) 261
["imfilter", "FFT_RGBF32_100×100×100"] 439.194 ms (5%) 10.368 ms 410.41 MiB (1%) 281
["imfilter", "FFT_RGBF32_2048"] 614.898 μs (5%) 387.14 KiB (1%) 231
["imfilter", "FFT_RGBF32_2048×2048"] 1.542 s (5%) 43.718 ms 972.07 MiB (1%) 281
["imfilter", "FFT_RGBN0f8_100×100"] 2.058 ms (5%) 3.27 MiB (1%) 261
["imfilter", "FFT_RGBN0f8_100×100×100"] 394.893 ms (5%) 7.827 ms 410.41 MiB (1%) 281
["imfilter", "FFT_RGBN0f8_2048"] 575.197 μs (5%) 387.13 KiB (1%) 231
["imfilter", "FFT_RGBN0f8_2048×2048"] 1.515 s (5%) 10.878 ms 972.07 MiB (1%) 281
["imfilter", "IIRGaussian_F32_100×100"] 181.198 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_F32_100×100×100"] 32.746 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_F32_2048"] 24.600 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_F32_2048×2048"] 76.419 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_GrayF32_100×100"] 180.600 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_GrayF32_100×100×100"] 32.564 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_GrayF32_2048"] 25.800 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_GrayF32_2048×2048"] 75.996 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_GrayN0f8_100×100"] 185.598 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_GrayN0f8_100×100×100"] 33.003 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_GrayN0f8_2048"] 27.300 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_GrayN0f8_2048×2048"] 77.618 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_N0f8_100×100"] 192.599 μs (5%) 131.02 KiB (1%) 128
["imfilter", "IIRGaussian_N0f8_100×100×100"] 33.009 ms (5%) 16.34 MiB (1%) 10139
["imfilter", "IIRGaussian_N0f8_2048"] 25.499 μs (5%) 32.91 KiB (1%) 15
["imfilter", "IIRGaussian_N0f8_2048×2048"] 77.458 ms (5%) 48.25 MiB (1%) 2076
["imfilter", "IIRGaussian_RGBF32_100×100"] 284.698 μs (5%) 365.39 KiB (1%) 128
["imfilter", "IIRGaussian_RGBF32_100×100×100"] 56.305 ms (5%) 46.86 MiB (1%) 10139
["imfilter", "IIRGaussian_RGBF32_2048"] 28.499 μs (5%) 64.81 KiB (1%) 17
["imfilter", "IIRGaussian_RGBF32_2048×2048"] 148.624 ms (5%) 3.145 ms 144.25 MiB (1%) 2076
["imfilter", "IIRGaussian_RGBN0f8_100×100"] 295.797 μs (5%) 365.39 KiB (1%) 128
["imfilter", "IIRGaussian_RGBN0f8_100×100×100"] 56.697 ms (5%) 46.86 MiB (1%) 10139
["imfilter", "IIRGaussian_RGBN0f8_2048"] 32.699 μs (5%) 64.80 KiB (1%) 17
["imfilter", "IIRGaussian_RGBN0f8_2048×2048"] 153.041 ms (5%) 2.778 ms 144.25 MiB (1%) 2076
["imfilter", "denselarge_F32_100×100"] 1.546 ms (5%) 179.20 KiB (1%) 24
["imfilter", "denselarge_F32_100×100×100"] 2.681 s (5%) 18.35 MiB (1%) 31
["imfilter", "denselarge_F32_2048"] 19.600 μs (5%) 49.05 KiB (1%) 16
["imfilter", "denselarge_F32_2048×2048"] 784.887 ms (5%) 64.41 MiB (1%) 26
["imfilter", "denselarge_GrayF32_100×100"] 1.566 ms (5%) 179.20 KiB (1%) 24
["imfilter", "denselarge_GrayF32_100×100×100"] 2.554 s (5%) 18.35 MiB (1%) 31
["imfilter", "denselarge_GrayF32_2048"] 23.100 μs (5%) 49.05 KiB (1%) 16
["imfilter", "denselarge_GrayF32_2048×2048"] 784.681 ms (5%) 64.41 MiB (1%) 26
["imfilter", "denselarge_GrayN0f8_100×100"] 1.684 ms (5%) 179.20 KiB (1%) 24
["imfilter", "denselarge_GrayN0f8_100×100×100"] 2.555 s (5%) 18.35 MiB (1%) 31
["imfilter", "denselarge_GrayN0f8_2048"] 24.800 μs (5%) 49.05 KiB (1%) 16
["imfilter", "denselarge_GrayN0f8_2048×2048"] 783.389 ms (5%) 64.41 MiB (1%) 26
["imfilter", "denselarge_N0f8_100×100"] 1.553 ms (5%) 179.20 KiB (1%) 24
["imfilter", "denselarge_N0f8_100×100×100"] 2.597 s (5%) 18.35 MiB (1%) 31
["imfilter", "denselarge_N0f8_2048"] 19.400 μs (5%) 49.05 KiB (1%) 16
["imfilter", "denselarge_N0f8_2048×2048"] 773.686 ms (5%) 2.833 ms 64.41 MiB (1%) 26
["imfilter", "denselarge_RGBF32_100×100"] 1.876 ms (5%) 531.45 KiB (1%) 24
["imfilter", "denselarge_RGBF32_100×100×100"] 3.885 s (5%) 2.343 ms 55.05 MiB (1%) 31
["imfilter", "denselarge_RGBF32_2048"] 38.599 μs (5%) 113.19 KiB (1%) 17
["imfilter", "denselarge_RGBF32_2048×2048"] 990.786 ms (5%) 2.162 ms 193.16 MiB (1%) 26
["imfilter", "denselarge_RGBN0f8_100×100"] 1.844 ms (5%) 531.45 KiB (1%) 24
["imfilter", "denselarge_RGBN0f8_100×100×100"] 3.999 s (5%) 55.05 MiB (1%) 31
["imfilter", "denselarge_RGBN0f8_2048"] 36.899 μs (5%) 113.17 KiB (1%) 17
["imfilter", "denselarge_RGBN0f8_2048×2048"] 988.625 ms (5%) 2.179 ms 193.16 MiB (1%) 26
["imfilter", "densesmall_F32_100×100"] 104.500 μs (5%) 162.17 KiB (1%) 24
["imfilter", "densesmall_F32_100×100×100"] 43.515 ms (5%) 15.73 MiB (1%) 31
["imfilter", "densesmall_F32_2048"] 11.400 μs (5%) 32.91 KiB (1%) 15
["imfilter", "densesmall_F32_2048×2048"] 49.254 ms (5%) 64.10 MiB (1%) 26
["imfilter", "densesmall_GrayF32_100×100"] 105.099 μs (5%) 162.17 KiB (1%) 24
["imfilter", "densesmall_GrayF32_100×100×100"] 46.661 ms (5%) 15.73 MiB (1%) 31
["imfilter", "densesmall_GrayF32_2048"] 12.033 μs (5%) 32.91 KiB (1%) 15
["imfilter", "densesmall_GrayF32_2048×2048"] 60.543 ms (5%) 64.10 MiB (1%) 26
["imfilter", "densesmall_GrayN0f8_100×100"] 104.699 μs (5%) 162.17 KiB (1%) 24
["imfilter", "densesmall_GrayN0f8_100×100×100"] 43.515 ms (5%) 15.73 MiB (1%) 31
["imfilter", "densesmall_GrayN0f8_2048"] 12.133 μs (5%) 32.91 KiB (1%) 15
["imfilter", "densesmall_GrayN0f8_2048×2048"] 48.478 ms (5%) 64.10 MiB (1%) 26
["imfilter", "densesmall_N0f8_100×100"] 103.599 μs (5%) 162.17 KiB (1%) 24
["imfilter", "densesmall_N0f8_100×100×100"] 43.662 ms (5%) 15.73 MiB (1%) 31
["imfilter", "densesmall_N0f8_2048"] 11.975 μs (5%) 32.91 KiB (1%) 15
["imfilter", "densesmall_N0f8_2048×2048"] 48.590 ms (5%) 64.10 MiB (1%) 26
["imfilter", "densesmall_RGBF32_100×100"] 164.499 μs (5%) 480.98 KiB (1%) 24
["imfilter", "densesmall_RGBF32_100×100×100"] 69.411 ms (5%) 47.18 MiB (1%) 31
["imfilter", "densesmall_RGBF32_2048"] 17.500 μs (5%) 64.81 KiB (1%) 17
["imfilter", "densesmall_RGBF32_2048×2048"] 129.309 ms (5%) 192.22 MiB (1%) 26
["imfilter", "densesmall_RGBN0f8_100×100"] 160.599 μs (5%) 480.98 KiB (1%) 24
["imfilter", "densesmall_RGBN0f8_100×100×100"] 69.449 ms (5%) 47.18 MiB (1%) 31
["imfilter", "densesmall_RGBN0f8_2048"] 19.500 μs (5%) 64.80 KiB (1%) 17
["imfilter", "densesmall_RGBN0f8_2048×2048"] 130.814 ms (5%) 1.347 ms 192.22 MiB (1%) 26
["imfilter", "factoredlarge_F32_100×100"] 703.697 μs (5%) 196.30 KiB (1%) 28
["imfilter", "factoredlarge_F32_100×100×100"] 172.491 ms (5%) 35.22 MiB (1%) 41
["imfilter", "factoredlarge_F32_2048"] 57.699 μs (5%) 33.56 KiB (1%) 15
["imfilter", "factoredlarge_F32_2048×2048"] 309.188 ms (5%) 49.30 MiB (1%) 30
["imfilter", "factoredlarge_GrayF32_100×100"] 722.697 μs (5%) 196.30 KiB (1%) 28
["imfilter", "factoredlarge_GrayF32_100×100×100"] 157.075 ms (5%) 35.22 MiB (1%) 41
["imfilter", "factoredlarge_GrayF32_2048"] 64.099 μs (5%) 33.56 KiB (1%) 15
["imfilter", "factoredlarge_GrayF32_2048×2048"] 333.221 ms (5%) 49.30 MiB (1%) 30
["imfilter", "factoredlarge_GrayN0f8_100×100"] 727.893 μs (5%) 196.30 KiB (1%) 28
["imfilter", "factoredlarge_GrayN0f8_100×100×100"] 168.857 ms (5%) 35.22 MiB (1%) 41
["imfilter", "factoredlarge_GrayN0f8_2048"] 59.700 μs (5%) 33.56 KiB (1%) 15
["imfilter", "factoredlarge_GrayN0f8_2048×2048"] 320.186 ms (5%) 49.30 MiB (1%) 30
["imfilter", "factoredlarge_N0f8_100×100"] 709.597 μs (5%) 196.30 KiB (1%) 28
["imfilter", "factoredlarge_N0f8_100×100×100"] 178.550 ms (5%) 35.22 MiB (1%) 41
["imfilter", "factoredlarge_N0f8_2048"] 60.199 μs (5%) 33.56 KiB (1%) 15
["imfilter", "factoredlarge_N0f8_2048×2048"] 321.148 ms (5%) 49.30 MiB (1%) 30
["imfilter", "factoredlarge_RGBF32_100×100"] 1.090 ms (5%) 580.67 KiB (1%) 28
["imfilter", "factoredlarge_RGBF32_100×100×100"] 267.907 ms (5%) 105.66 MiB (1%) 41
["imfilter", "factoredlarge_RGBF32_2048"] 91.300 μs (5%) 65.78 KiB (1%) 17
["imfilter", "factoredlarge_RGBF32_2048×2048"] 594.814 ms (5%) 3.271 ms 147.82 MiB (1%) 30
["imfilter", "factoredlarge_RGBN0f8_100×100"] 1.140 ms (5%) 580.67 KiB (1%) 28
["imfilter", "factoredlarge_RGBN0f8_100×100×100"] 270.826 ms (5%) 2.725 ms 105.66 MiB (1%) 41
["imfilter", "factoredlarge_RGBN0f8_2048"] 91.799 μs (5%) 65.77 KiB (1%) 17
["imfilter", "factoredlarge_RGBN0f8_2048×2048"] 600.145 ms (5%) 147.82 MiB (1%) 30
["imfilter", "factoredsmall_F32_100×100"] 69.700 μs (5%) 243.61 KiB (1%) 28
["imfilter", "factoredsmall_F32_100×100×100"] 11.340 ms (5%) 31.92 MiB (1%) 41
["imfilter", "factoredsmall_F32_2048"] 10.180 μs (5%) 48.86 KiB (1%) 16
["imfilter", "factoredsmall_F32_2048×2048"] 31.992 ms (5%) 96.16 MiB (1%) 30
["imfilter", "factoredsmall_GrayF32_100×100"] 73.899 μs (5%) 243.61 KiB (1%) 28
["imfilter", "factoredsmall_GrayF32_100×100×100"] 14.779 ms (5%) 31.92 MiB (1%) 41
["imfilter", "factoredsmall_GrayF32_2048"] 10.475 μs (5%) 48.86 KiB (1%) 16
["imfilter", "factoredsmall_GrayF32_2048×2048"] 57.743 ms (5%) 96.16 MiB (1%) 30
["imfilter", "factoredsmall_GrayN0f8_100×100"] 72.999 μs (5%) 243.61 KiB (1%) 28
["imfilter", "factoredsmall_GrayN0f8_100×100×100"] 14.872 ms (5%) 31.92 MiB (1%) 41
["imfilter", "factoredsmall_GrayN0f8_2048"] 11.100 μs (5%) 48.86 KiB (1%) 16
["imfilter", "factoredsmall_GrayN0f8_2048×2048"] 34.005 ms (5%) 96.16 MiB (1%) 30
["imfilter", "factoredsmall_N0f8_100×100"] 72.399 μs (5%) 243.61 KiB (1%) 28
["imfilter", "factoredsmall_N0f8_100×100×100"] 11.188 ms (5%) 31.92 MiB (1%) 41
["imfilter", "factoredsmall_N0f8_2048"] 10.050 μs (5%) 48.86 KiB (1%) 16
["imfilter", "factoredsmall_N0f8_2048×2048"] 45.995 ms (5%) 2.653 ms 96.16 MiB (1%) 30
["imfilter", "factoredsmall_RGBF32_100×100"] 125.299 μs (5%) 724.98 KiB (1%) 28
["imfilter", "factoredsmall_RGBF32_100×100×100"] 26.650 ms (5%) 95.76 MiB (1%) 41
["imfilter", "factoredsmall_RGBF32_2048"] 20.200 μs (5%) 112.81 KiB (1%) 17
["imfilter", "factoredsmall_RGBF32_2048×2048"] 136.153 ms (5%) 4.106 ms 288.41 MiB (1%) 30
["imfilter", "factoredsmall_RGBN0f8_100×100"] 124.499 μs (5%) 724.98 KiB (1%) 28
["imfilter", "factoredsmall_RGBN0f8_100×100×100"] 25.197 ms (5%) 95.76 MiB (1%) 41
["imfilter", "factoredsmall_RGBN0f8_2048"] 19.100 μs (5%) 112.80 KiB (1%) 17
["imfilter", "factoredsmall_RGBN0f8_2048×2048"] 137.500 ms (5%) 288.41 MiB (1%) 30
["mapwindow", "cheap f, tiny window"] 10.650 μs (5%) 8.73 KiB (1%) 20
["mapwindow", "expensive f"] 1.480 ms (5%) 1.30 MiB (1%) 18998
["mapwindow", "extrema"] 69.400 μs (5%) 45.73 KiB (1%) 60
["mapwindow", "maximum"] 263.099 μs (5%) 233.22 KiB (1%) 4059
["mapwindow", "mean, large window"] 2.036 ms (5%) 1.39 MiB (1%) 24634
["mapwindow", "mean, small window"] 13.700 μs (5%) 9.84 KiB (1%) 43
["mapwindow", "median!"] 649.697 μs (5%) 233.38 KiB (1%) 4063
["mapwindow", "minimum"] 274.699 μs (5%) 233.25 KiB (1%) 4060

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["imfilter"]
  • ["mapwindow"]

Julia versioninfo

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      24163 s          0 s       2306 s      63622 s          0 s
       #2  2294 MHz      60010 s          0 s       3582 s      26095 s          0 s
       
  Memory: 6.782737731933594 GB (2248.88671875 MB free)
  Uptime: 916.0 sec
  Load Avg:  1.08203125  1.01611328125  0.7275390625
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Baseline result

Benchmark Report for /home/runner/work/ImageFiltering.jl/ImageFiltering.jl

Job Properties

  • Time of benchmark: 4 Mar 2020 - 8:25
  • Package commit: 75f89d
  • Julia commit: 2d5741
  • Julia command flags: None
  • Environment variables: None

Results

Below is a table of this job's results, obtained by running the benchmarks.
The values listed in the ID column have the structure [parent_group, child_group, ..., key], and can be used to
index into the BaseBenchmarks suite to retrieve the corresponding benchmarks.
The percentages accompanying time and memory values in the below table are noise tolerances. The "true"
time/memory value for a given benchmark is expected to fall within this percentage of the reported value.
An empty cell means that the value was zero.

ID time GC time memory allocations
["mapwindow", "cheap f, tiny window"] 10.200 μs (5%) 8.73 KiB (1%) 20
["mapwindow", "expensive f"] 1.385 ms (5%) 1.30 MiB (1%) 18998
["mapwindow", "extrema"] 69.300 μs (5%) 45.73 KiB (1%) 60
["mapwindow", "mean, large window"] 2.047 ms (5%) 1.39 MiB (1%) 24634
["mapwindow", "mean, small window"] 13.300 μs (5%) 9.84 KiB (1%) 43
["mapwindow", "median!"] 648.394 μs (5%) 233.41 KiB (1%) 4064

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["mapwindow"]

Julia versioninfo

Julia Version 1.3.1
Commit 2d5741174c (2019-12-30 21:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
      Ubuntu 18.04.4 LTS
  uname: Linux 5.0.0-1032-azure #34-Ubuntu SMP Mon Feb 10 19:37:25 UTC 2020 x86_64 x86_64
  CPU: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz: 
              speed         user         nice          sys         idle          irq
       #1  2294 MHz      24332 s          0 s       2354 s      72433 s          0 s
       #2  2294 MHz      68900 s          0 s       3635 s      26194 s          0 s
       
  Memory: 6.782737731933594 GB (3424.765625 MB free)
  Uptime: 1007.0 sec
  Load Avg:  1.09814453125  1.0380859375  0.7646484375
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, broadwell)

Runtime information

Runtime Info
BLAS #threads 2
BLAS.vendor() openblas64
Sys.CPU_THREADS 2

lscpu output:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             2294.689
BogoMIPS:            4589.37
Hypervisor vendor:   Microsoft
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            51200K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt md_clear
Cpu Property Value
Brand Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
Vendor :Intel
Architecture :Broadwell
Model Family: 0x06, Model: 0x4f, Stepping: 0x01, Type: 0x00
Cores 2 physical cores, 2 logical cores (on executing CPU)
No Hyperthreading detected
Clock Frequencies Not supported by CPU
Data Cache Level 1:3 : (32, 256, 51200) kbytes
64 byte cache line size
Address Size 48 bits virtual, 44 bits physical
SIMD 256 bit = 32 byte max. SIMD vector size
Time Stamp Counter TSC is accessible via rdtsc
TSC increased at every clock cycle (non-invariant TSC)
Perf. Monitoring Performance Monitoring Counters (PMC) are not supported
Hypervisor Yes, Microsoft

@timholy timholy merged commit ee6cf8e into master Mar 4, 2020
@timholy timholy deleted the teh/benchmarks branch March 4, 2020 08:37
This was referenced Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants