Enhanced Parallel bzip2 Compression with Lock-Free Queue

  • José Sánchez-Salazar Escuela de Informática, Universidad Nacional
  • Edward Aymerich-Sánchez Escuela de Ciencias de la Computación e Informática, Universidad de Costa Rica
Keywords: Computer programming, Data processing, Programming languages

Abstract

Since the general trend nowadays is to have more and more processors (cores) available in each computer, the scalability of the data structures used in parallel programs must be carefully considered in order to guarantee that they take full advantage of the available processors. Because of increased containment, lock-based data structures usually do not perform proportionally as the number of processors increases. The use of well-designed lock-free data structures, like first in-first out, (fifo) queues, can boost the performance of a parallel program when many processors are available. In this work, a parallel version of bzip2, a popular compression and decompression program, is designed and implemented by using lock-free queues instead of the lock-based ones, and applying a two-buffer-output strategy. The performance of lock-free implementation is measured against lock-based implementations. Compression time was measured with different number of processors and different block sizes. Consistent with our hypothesis, the results show that our parallel lock-free implementation outperforms the other implementations.

References

Boost C++ Library (n.d.). (2013). Retrieved from https://svn.boost.org/trac/boost/

Burrows, M., & Wheeler, D. J. (1994). A block-sorting lossless data compression algorithm. (SRC Research Report 124). California: Digital systems research center. http://www.hpl.hp.com/techreports.

Feldman, S. & Dechev, D. (2015). A wait-free multi-producer multi-consumer ring buffer. ACM SIGAPP Applied Computing Review, 15(3), 59-71. http://dx.doi.org/10.1145/2835260.2835264

Gilchrist, J. (2004). Parallel data compression with bzip2. Proceedings of the 16th IASTED International Conference on Parallel and Distributed Computing and Systems(PDCS). http://gilchrist.ca/jeff/comp5704/Final_Paper.pdf

Huffman, D.A. (1952). A method for the construction of minimum-redundancy codes. Proceedings of the I.R.E. http://dx.doi.org/10.1109/jrproc.1952.273898

Isakov, K. (2005). Bzip2smp. Retrieved from http://bzip2smp.sourceforge.net

Jannesari, A., Pankratius, V. & Tichy, W. (1990). Parallelizing bzip2-a case study in multicore software engineering. IEEE Software, 26(6). Recovered from https://www.semanticscholar.org

Ladan-mozes, E. and Shavit, N. (2004). An optimistic approach to lock-free fifo queues. Proceedings of the 18th International Symposium on Distributed Computing, Springer, Berlin, Germany. http://dx.doi.org/10.1007/978-3-540-30186-8_9

Marcais, G. & Kingsford, C. (2011). A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27, 764–770. http://dx.doi.org/10.1093/bioinformatics/btr011

Michael, M., & Scott, M. (1996). Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing. http://dx.doi.org/10.1145/248052.248106

Neal, R., Witten, I. & Cleary, J. (1987). Arithmetic coding for data compression. Communications of the ACM, 30(6), 520-540. http://dx.doi.org/10.1145/214762.214771

Seward, J. (2010). bzip2. Retrieved from http://www.bzip.org/

Valois, D. (1994). Implementing lock-free queues. Proceeding of the Seventh International Conference On Parallel and Distributed Computing Systems, 1994.

Welch, T. (1984). A technique for high-performance data compression. IEEE Computer, 17(6). http://dx.doi.org/10.1109/MC.1984.1659158

Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin. 1 (6). http://dx.doi.org/10.2307/3001968

Zhang, D. & Dechev, D. (2016). A Lock-Free Priority Queue Design Based on Multi-Dimensional Linked Lists. IEEE Trans. Parallel Distrib. Syst. 27(3) 613-626. http://dx.doi.org/10.1109/tpds.2015.2419651

Ziv, J., & Lempel, A. (1978). Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory, 24(5). 530-536. http://dx.doi.org/10.1109/tit.1978.1055934

Published
2017-07-29
How to Cite
Sánchez-Salazar, J., & Aymerich-Sánchez, E. (2017). Enhanced Parallel bzip2 Compression with Lock-Free Queue. Uniciencia, 31(2), 37-49. https://doi.org/10.15359/ru.31-2.3
Section
Original scientific papers (evaluated by academic peers)