CvtColor
Functions for converting image color.
| OS | Build Status |
|---|---|
| Ubuntu 20.04 | |
| macOS 11 |
Installation
If available in Hex, the package can be installed
by adding cvt_color to your list of dependencies in mix.exs:
def deps do
[
{:cvt_color, "~> 0.1.3"}
]
endDocumentation can be generated with ExDoc and published on HexDocs. Once published, the docs can be found at https://hexdocs.pm/cvt_color.
Usage
bgr565_data = CvtColor.cvt(binary_data, :bgr888, :bgr565)If you have OpenMP enabled, see more details about this in the Optional Config section below.
chunk_size = 65536
bgr565_data = CvtColor.cvt(binary_data, :bgr888, :bgr565, chunk_size)
# to balance the task
# set chunk_size to 0 or do not pass anything
bgr565_data = CvtColor.cvt(binary_data, :bgr888, :bgr565, 0)
bgr565_data = CvtColor.cvt(binary_data, :bgr888, :bgr565)Currently supported pairs:
| src color | dst color |
|---|---|
:bgr888 | :bgr565 |
:bgr888 | :rgb565 |
:rgb888 | :bgr565 |
:rgb888 | :rgb565 |
:bgr888 | :bgr666 |
:bgr888 | :rgb666 |
:rgb888 | :bgr666 |
:rgb888 | :rgb666 |
:bgr888 | :bgr666_compact |
:bgr888 | :rgb666_compact |
:rgb888 | :bgr666_compact |
:rgb888 | :rgb666_compact |
Each component in bgr666 and rgb666 takes 8bit space, but only bits in MSB(7-2) are valid.
MSB 7 LSB
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ X │ X │ X │ X │ X │ X │ - │ - │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘X indicates valid bit. - indicates ignored bit.
An example of :rgb666_compact is shown below. Each rectangle indicates 1 bit.
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│R5│R4│R3│R2│R1│R0│G5│G4│G3│G2│G1│G0│B5│B4│B3│B2│B1│B0│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘Optional Config
OpenMP
This project can set to use OpenMP to accelerate the converting process. By default, this project will try to find and
use OpenMP. However, you can completely disable OpenMP by setting the environment variable CVT_COLOR_USE_OPENMP to OFF.
Chunk size (when OpenMP is enabled)
When assigning computing task to each thread, cvt_color will try to balance the number of consecutive pixels for each thread.
For example, say you have 4 (logical) cores (and didn't set an explicit value for OMP_NUM_THREADS, i.e., OMP_NUM_THREADS
will be equal to the number of logical cores), and 4_000_000 pixels, then core 0 will process pixels from index 0 to 999_999,
pixels from 1_000_000 to 1_999_999 will be assigned to core 1, and so on. The chunk size is calculated in the way of the
following pseudocode:
if not specified chunk_size
chunk_size = number_of_pixels / ${OMP_NUM_THREADS};
if chunk_size is 0
# the result of the integer division is 0
# which means that number_of_pixels < ${OMP_NUM_THREADS}
# then a single thread is good enough
# unless ${OMP_NUM_THREADS} is really a large value
# e.g., ${OMP_NUM_THREADS} > 1_000_000 or even larger
chunk_size = number_of_pixels
endif
endif
However, when you have a relatively fewer number of pixels, say 255, then the time of starting ~${OMP_NUM_THREADS} new
threads and waiting for them to finish can take a significant portion of the converting process. Therefore, we should set
a minimal chunk size, i.e., the forth argument of CvtColor.cvt_color/4.
The appropriate minimal chunk size can vary a lot depending on the processor (cache size, speed, etc), the threading library (how long it takes to init a new thread).
You can test and find a good value on your device using the benchmark program. To compile the benchmark program, set
the environment variable CVT_COLOR_BUILD_BENCHMARK to ON.
Or you can pass different values to CvtColor.cvt_color/4 and pick the one that satisfies your requirement.
The default chunk size is 1048576 (pixels) based on some benchmarks (see below). And presumably, as long as it is a power
of 2 and range from 65536 to 1048576, it should be good. But if you are in doubt, you can always run the benchmark
on your device to find a good value.
Benchmarks
Image size 7680x4320, pixel format RGB888. The time of generating the source image is not counted, and the source image
will only be generated once per benchmark (i.e, per entry in the table below). The converting task will be repeated for
100 times, and the final results denote the average running time per converting.
Note that RGB888 to RGB666 is basically memcpy. RGB666C. in the table below means convert to :rgb666_compact.
chunk_size is the number of pixels (instead of bytes) assigned to a thread.
Time unit is milliseconds.
| Platform | #threads | chunk_size | RGB666C. | RGB666 | RGB565 |
|---|---|---|---|---|---|
| GitHub CI Linux | 2 | 256 | 36.62 | 40.03 | 40.73 |
| GitHub CI Linux | 2 | 1 K | 35.79 | 40.46 | 39.46 |
| GitHub CI Linux | 2 | 16 K | 35.43 | 40.29 | 37.97 |
| GitHub CI Linux | 2 | 64 K | 35.61 | 40.82 | 38.25 |
| GitHub CI Linux | 2 | 256 K | 33.13 | 40.35 | 37.80 |
| GitHub CI Linux | 2 | 1 M | 32.55 | 40.23 | 35.20 |
| GitHub CI Linux | 2 | 2 M | 31.81 | 41.62 | 35.29 |
| GitHub CI Linux | 2 | 4 M | 31.15 | 40.38 | 35.37 |
| GitHub CI Linux | 2 | auto | 62.45 | 41.09 | 35.12 |
| GitHub CI Linux | 1 | NA | 61.52 | 39.87 | 67.66 |
| GitHub CI macOS | 3 | 256 | 36.58 | 21.32 | 27.68 |
| GitHub CI macOS | 3 | 1 K | 34.17 | 21.55 | 23.23 |
| GitHub CI macOS | 3 | 16 K | 33.63 | 21.58 | 22.11 |
| GitHub CI macOS | 3 | 64 K | 34.27 | 21.43 | 22.91 |
| GitHub CI macOS | 3 | 256 K | 35.41 | 21.73 | 21.46 |
| GitHub CI macOS | 3 | 1 M | 37.89 | 21.61 | 23.10 |
| GitHub CI macOS | 3 | 2 M | 47.56 | 21.38 | 23.03 |
| GitHub CI macOS | 3 | 4 M | 48.49 | 21.78 | 24.08 |
| GitHub CI macOS | 3 | auto | 90.44 | 21.56 | 29.51 |
| GitHub CI macOS | 1 | NA | 90.81 | 21.73 | 54.03 |
| Raspberry Pi 4 | 4 | 256 | 114.45 | 148.92 | 120.07 |
| Raspberry Pi 4 | 4 | 1 K | 94.61 | 149.58 | 98.46 |
| Raspberry Pi 4 | 4 | 16 K | 88.28 | 149.65 | 76.26 |
| Raspberry Pi 4 | 4 | 64 K | 87.04 | 148.34 | 74.32 |
| Raspberry Pi 4 | 4 | 256 K | 87.65 | 148.93 | 75.46 |
| Raspberry Pi 4 | 4 | 1 M | 87.11 | 148.80 | 74.65 |
| Raspberry Pi 4 | 4 | 2 M | 86.53 | 149.31 | 74.91 |
| Raspberry Pi 4 | 4 | 4 M | 120.25 | 149.34 | 74.13 |
| Raspberry Pi 4 | 4 | auto | 211.52 | 149.06 | 75.09 |
| Raspberry Pi 4 | 1 | NA | 210.87 | 148.98 | 140.95 |
| M1 Max | 10 | 256 | 11.49 | 7.81 | 12.44 |
| M1 Max | 10 | 1 K | 9.90 | 7.88 | 8.44 |
| M1 Max | 10 | 16 K | 9.84 | 7.79 | 8.07 |
| M1 Max | 10 | 64 K | 9.80 | 7.77 | 7.69 |
| M1 Max | 10 | 256 K | 9.32 | 7.84 | 7.62 |
| M1 Max | 10 | 1 M | 8.83 | 7.73 | 7.34 |
| M1 Max | 10 | 2 M | 14.10 | 7.75 | 7.51 |
| M1 Max | 10 | 4 M | 26.55 | 7.78 | 6.94 |
| M1 Max | 10 | auto | 26.38 | 7.84 | 6.99 |
| M1 Max | 1 | NA | 49.64 | 7.79 | 35.22 |
| AMD 3900X | 24 | 256 | 25.46 | 24.26 | 47.52 |
| AMD 3900X | 24 | 1 K | 10.03 | 23.52 | 12.13 |
| AMD 3900X | 24 | 16 K | 10.03 | 23.17 | 8.25 |
| AMD 3900X | 24 | 64 K | 9.00 | 23.17 | 8.25 |
| AMD 3900X | 24 | 256 K | 9.70 | 24.08 | 8.12 |
| AMD 3900X | 24 | 1 M | 11.08 | 23.34 | 8.88 |
| AMD 3900X | 24 | 2 M | 16.94 | 23.61 | 8.42 |
| AMD 3900X | 24 | 4 M | 28.25 | 23.60 | 10.90 |
| AMD 3900X | 24 | auto | 17.03 | 23.96 | 8.45 |
| AMD 3900X | 1 | NA | 44.03 | 23.15 | 48.57 |