GPU-based implementation of JPEG2K standard

JPEG2000 is a new image compression standard developed by Joint Photographic Experts Group in 2000. This standard assumes reducing or bypassing limitations of the current JPEG compression technique and providing high quality along with high compression ratio. JPEG2000 is based on discrete wavelet transform (DWT) which is a successor of discrete cosine transform. There are multiple domains in which JPEG2000 features might be employed, e.g. medical imaging in which lossy compression is not acceptable, compression of huge volumes of hyperspectral images, real-time compression of high resolution (e.g. 4K - 4096x2160) movies. Although there are some hardware implementations that offer real time encoding, they are costly as specialized hardware is required. Current consumer-level architectures with software implementations may provide low-cost alternative to the hardware solutions. The algorithms for JPEG2000 compression and decompression have been implemented at PSNC from scratch. Our software takes advantage of modern NVIDIA GPUs and CUDA technology to deliver highest performance in its class.

  1. Key features
  2. License
  3. Computational tests
  4. Download and installation
  5. Executing
  6. Configuration file
  7. Example configuration file

Key features

  • all modules (DWT, MQ Coder, Quantization, MCT) done on GPU
  • real-time hyperspectral images (NASA AVIRIS) compression
  • includes KLT transform for 3D data
  • open source GNU Affero General Public License

License

Copyright 2009-2013 Poznan Supercomputing and Networking Center

GPU JPEG2K is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

GPU JPEG2K is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with GPU JPEG2K. If not, see < http://www.gnu.org/licenses/>.

If you would like to use our software for commercial purpose please contact us:

Milosz Ciznicki miloszc[replace_with_at]man.poznan.pl

Computational tests

Extensive computational tests were performed in order to compare the solution to other state-of-the-art packages. Tests revealed that our implementation (gpu_jpeg2k) is the fastest among openJpeg, Kakadu and cuj2k (note that cuj2k is also CUDA-GPU based).

http://fury.man.poznan.pl/~michal/jpeg2k/jpegPerformance.png

Some more detailed results are shown below. The two charts (click to enlarge) present the time needed by different implementations to compress a picture depending on its size. For each implementation the same compression parameters have been chosen. Kakadu-multi refers to multi-threaded implementation of Kakadu software. Note that the scale of time axis is logarithmic.

Tests were performed on the following hardware:

  • CPU: Core 2 Duo E8400, 3.00GHz
  • GPU: NVIDIA GeForce GTX 480, 1.5GB of RAM
  • RAM: 6GB

It is plain to see that our implementation (gpu_jpeg2k) is the fastest: in case of losless compression starting from images of size 2Mpix and for lossy compression - from 4Mpix upwards. For an image of size 39Mpix, our software achieves the speedup of 1.33 and 1.46 (losless and lossy compression, respectively) comparing to the only GPU-based reference implementation - cuj2k. In comparison to the fastest CPU-based software - Kakadu-multi, the speedup in favour to gpu_jpeg2k is 2.85 for losless and 3.0 for lossy compression.

Chart 1

Download and installation

Download the latest source code by issuing the following command: svn co https://apps.man.poznan.pl/svn/jpeg2k/trunk

To compile the software go to the main directory of the project and issue the following commands:

mkdir build

cd build

cmake ..

make

This will produce an executable file in build directory.

Executing

To execute the encoder one have to provide input and output file (optionally configuration file):

./encoder -i in.file -o out.j2k -c file.config

To execute the decoder one have to provide input and output file:

./decoder -i in.j2k -o out.file

Currently supported format is *.j2k.

Configuration file

#"-1" means automatic value selection. Where applicable.

Tiling:
tile_w - Tile width.
tile_h = - Tile height

Discrete Wavelet Transform:
tile_comp_dlvls - Number of the decomposition levels.
wavelet_type - Type of the wavelet transform. 0 - means reversible DWT 5/3; 1 - means irreversible DWT 9/7.

Codeblocks:
cblk_exp_w - Exponential codeblock width 2x.
cblk_exp_h - Exponential codeblock height 2y.

Set device:
device - Defines which GPU device to use.

Compression rate:
target_size - Compress image to target output size (in bytes).

Color transform:
use_mct - Use reversible/irreversible (depends on wavelet type) color transformation from RGB to YUV. Speedup computations when used.

Multi component transform:
use_part2_mct - Use multi component transform for 3D data (depends on mct_compression_method). Currently applicable for hyperspectral data (*.bsq format).
mct_compression_method - Defines type of layer transform for 3D data. 0 - means KLT transform; 1 - means 1D DWT 9/7.

KLT transform:
mct_klt_iterations - Defines maximum number of iterations for Gram-Schmidt algorithm.
mct_klt_border_eigenvalue - Defines cut-off for redundant components.
mct_klt_err - Defines error sufficient for Gram-Schmidt algorithm to end iteration.

Example configuration file

Below is example configuration file for RGB image which defines: no tiling, 4 decomposition levels, lossless compression DWT 5/3, codeblock width and height: 64, use device 0, without target size, use reversible color transformation, without mutli component transform.

tile_w = -1
tile_h = -1
tile_comp_dlvls = 4
wavelet_type = 0
cblk_exp_w = 6
cblk_exp_h = 6
device = 0
target_size = 0
use_mct = 1
use_part2_mct = 0
mct_compression_method = 0
mct_klt_iterations = 10000
mct_klt_border_eigenvalue = 1.0
mct_klt_err = 1.0e-6

Attachments