Table of Contents

Class ParallelDeflateOutputStream

Namespace
Ionic.Zlib
Assembly
SunamoDotNetZip.dll

A class for compressing streams using the Deflate algorithm with multiple threads.

public class ParallelDeflateOutputStream : Stream, IAsyncDisposable, IDisposable
Inheritance
ParallelDeflateOutputStream
Implements
Inherited Members
Extension Methods

Remarks

This class performs DEFLATE compression through writing. For more information on the Deflate algorithm, see IETF RFC 1951, "DEFLATE Compressed Data Format Specification version 1.3."

This class is similar to DeflateStream, except that this class is for compression only, and this implementation uses an approach that employs multiple worker threads to perform the DEFLATE. On a multi-cpu or multi-core computer, the performance of this class can be significantly higher than the single-threaded DeflateStream, particularly for larger streams. How large? Anything over 10mb is a good candidate for parallel compression.

The tradeoff is that this class uses more memory and more CPU than the vanilla DeflateStream, and also is less efficient as a compressor. For large files the size of the compressed data stream can be less than 1% larger than the size of a compressed data stream from the vanialla DeflateStream. For smaller files the difference can be larger. The difference will also be larger if you set the BufferSize to be lower than the default value. Your mileage may vary. Finally, for small files, the ParallelDeflateOutputStream can be much slower than the vanilla DeflateStream, because of the overhead associated to using the thread pool.

Constructors

ParallelDeflateOutputStream(Stream)

Create a ParallelDeflateOutputStream.

public ParallelDeflateOutputStream(Stream stream)

Parameters

stream Stream

The stream to which compressed data will be written.

Examples

This example shows how to use a ParallelDeflateOutputStream to compress data. It reads a file, compresses it, and writes the compressed data to a second, output file.

byte[] buffer = new byte[WORKING_BUFFER_SIZE];
int n= -1;
String outputFile = fileToCompress + ".compressed";
using (System.IO.Stream input = System.IO.File.OpenRead(fileToCompress))
{
    using (var raw = System.IO.File.Create(outputFile))
    {
        using (Stream compressor = new ParallelDeflateOutputStream(raw))
        {
            while ((n= input.Read(buffer, 0, buffer.Length)) != 0)
            {
                compressor.Write(buffer, 0, n);
            }
        }
    }
}
Dim buffer As Byte() = New Byte(4096) {}
Dim n As Integer = -1
Dim outputFile As String = (fileToCompress & ".compressed")
Using input As Stream = File.OpenRead(fileToCompress)
    Using raw As FileStream = File.Create(outputFile)
        Using compressor As Stream = New ParallelDeflateOutputStream(raw)
            Do While (n <> 0)
                If (n > 0) Then
                    compressor.Write(buffer, 0, n)
                End If
                n = input.Read(buffer, 0, buffer.Length)
            Loop
        End Using
    End Using
End Using

Remarks

This stream compresses data written into it via the DEFLATE algorithm (see RFC 1951), and writes out the compressed byte stream.

The instance will use the default compression level, the default buffer sizes and the default number of threads and buffers per thread.

This class is similar to DeflateStream, except that this implementation uses an approach that employs multiple worker threads to perform the DEFLATE. On a multi-cpu or multi-core computer, the performance of this class can be significantly higher than the single-threaded DeflateStream, particularly for larger streams. How large? Anything over 10mb is a good candidate for parallel compression.

ParallelDeflateOutputStream(Stream, CompressionLevel)

Create a ParallelDeflateOutputStream using the specified CompressionLevel.

public ParallelDeflateOutputStream(Stream stream, CompressionLevel level)

Parameters

stream Stream

The stream to which compressed data will be written.

level CompressionLevel

A tuning knob to trade speed for effectiveness.

Remarks

See the ParallelDeflateOutputStream(Stream) constructor for example code.

ParallelDeflateOutputStream(Stream, CompressionLevel, CompressionStrategy, bool)

Create a ParallelDeflateOutputStream using the specified CompressionLevel and CompressionStrategy, and specifying whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.

public ParallelDeflateOutputStream(Stream stream, CompressionLevel level, CompressionStrategy strategy, bool leaveOpen)

Parameters

stream Stream

The stream to which compressed data will be written.

level CompressionLevel

A tuning knob to trade speed for effectiveness.

strategy CompressionStrategy

By tweaking this parameter, you may be able to optimize the compression for data with particular characteristics.

leaveOpen bool

true if the application would like the stream to remain open after inflation/deflation.

Remarks

See the ParallelDeflateOutputStream(Stream) constructor for example code.

ParallelDeflateOutputStream(Stream, bool)

Create a ParallelDeflateOutputStream and specify whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.

public ParallelDeflateOutputStream(Stream stream, bool leaveOpen)

Parameters

stream Stream

The stream to which compressed data will be written.

leaveOpen bool

true if the application would like the stream to remain open after inflation/deflation.

Remarks

See the ParallelDeflateOutputStream(Stream) constructor for example code.

Properties

BufferSize

The size of the buffers used by the compressor threads.

public int BufferSize { get; set; }

Property Value

int

Remarks

The default buffer size is 128k. The application can set this value at any time, but it is effective only before the first Write().

Larger buffer sizes implies larger memory consumption but allows more efficient compression. Using smaller buffer sizes consumes less memory but may result in less effective compression. For example, using the default buffer size of 128k, the compression delivered is within 1% of the compression delivered by the single-threaded DeflateStream. On the other hand, using a BufferSize of 8k can result in a compressed data stream that is 5% larger than that delivered by the single-threaded DeflateStream. Excessively small buffer sizes can also cause the speed of the ParallelDeflateOutputStream to drop, because of larger thread scheduling overhead dealing with many many small buffers.

The total amount of storage space allocated for buffering will be (N*S*2), where N is the number of buffer pairs, and S is the size of each buffer (this property). There are 2 buffers used by the compressor, one for input and one for output. By default, DotNetZip allocates 4 buffer pairs per CPU core, so if your machine has 4 cores, then the number of buffer pairs used will be 16. If you accept the default value of this property, 128k, then the ParallelDeflateOutputStream will use 16 * 2 * 128kb of buffer memory in total, or 4mb, in blocks of 128kb. If you set this property to 64kb, then the number will be 16 * 2 * 64kb of buffer memory, or 2mb.

BytesProcessed

The total number of uncompressed bytes processed by the ParallelDeflateOutputStream.

public long BytesProcessed { get; }

Property Value

long

Remarks

This value is meaningful only after a call to Close().

CanRead

Indicates whether the stream supports Read operations.

public override bool CanRead { get; }

Property Value

bool

Remarks

Always returns false.

CanSeek

Indicates whether the stream supports Seek operations.

public override bool CanSeek { get; }

Property Value

bool

Remarks

Always returns false.

CanWrite

Indicates whether the stream supports Write operations.

public override bool CanWrite { get; }

Property Value

bool

Remarks

Returns true if the provided stream is writable.

Crc32

The CRC32 for the data that was written out, prior to compression.

public int Crc32 { get; }

Property Value

int

Remarks

This value is meaningful only after a call to Close().

Length

Reading this property always throws a NotSupportedException.

public override long Length { get; }

Property Value

long

MaxBufferPairs

The maximum number of buffer pairs to use.

public int MaxBufferPairs { get; set; }

Property Value

int

Remarks

This property sets an upper limit on the number of memory buffer pairs to create. The implementation of this stream allocates multiple buffers to facilitate parallel compression. As each buffer fills up, this stream uses ThreadPool.QueueUserWorkItem() to compress those buffers in a background threadpool thread. After a buffer is compressed, it is re-ordered and written to the output stream.

A higher number of buffer pairs enables a higher degree of parallelism, which tends to increase the speed of compression on multi-cpu computers. On the other hand, a higher number of buffer pairs also implies a larger memory consumption, more active worker threads, and a higher cpu utilization for any compression. This property enables the application to limit its memory consumption and CPU utilization behavior depending on requirements.

For each compression "task" that occurs in parallel, there are 2 buffers allocated: one for input and one for output. This property sets a limit for the number of pairs. The total amount of storage space allocated for buffering will then be (N*S*2), where N is the number of buffer pairs, S is the size of each buffer (BufferSize). By default, DotNetZip allocates 4 buffer pairs per CPU core, so if your machine has 4 cores, and you retain the default buffer size of 128k, then the ParallelDeflateOutputStream will use 4 * 4 * 2 * 128kb of buffer memory in total, or 4mb, in blocks of 128kb. If you then set this property to 8, then the number will be 8 * 2 * 128kb of buffer memory, or 2mb.

CPU utilization will also go up with additional buffers, because a larger number of buffer pairs allows a larger number of background threads to compress in parallel. If you find that parallel compression is consuming too much memory or CPU, you can adjust this value downward.

The default value is 16. Different values may deliver better or worse results, depending on your priorities and the dynamic performance characteristics of your storage and compute resources.

This property is not the number of buffer pairs to use; it is an upper limit. An illustration: Suppose you have an application that uses the default value of this property (which is 16), and it runs on a machine with 2 CPU cores. In that case, DotNetZip will allocate 4 buffer pairs per CPU core, for a total of 8 pairs. The upper limit specified by this property has no effect.

The application can set this value at any time, but it is effective only before the first call to Write(), which is when the buffers are allocated.

Position

Returns the current position of the output stream.

public override long Position { get; set; }

Property Value

long

Remarks

Because the output gets written by a background thread, the value may change asynchronously. Setting this property always throws a NotSupportedException.

Strategy

The ZLIB strategy to be used during compression.

public CompressionStrategy Strategy { get; }

Property Value

CompressionStrategy

Methods

Close()

Close the stream.

public override void Close()

Remarks

You must call Close on the stream to guarantee that all of the data written in has been compressed, and the compressed data has been written out.

Dispose()

Dispose the object

public void Dispose()

Remarks

Because ParallelDeflateOutputStream is IDisposable, the application must call this method when finished using the instance.

This method is generally called implicitly upon exit from a using scope in C# (Using in VB).

Dispose(bool)

The Dispose method

protected override void Dispose(bool disposing)

Parameters

disposing bool

indicates whether the Dispose method was invoked by user code.

Flush()

Flush the stream.

public override void Flush()

Read(byte[], int, int)

This method always throws a NotSupportedException.

public override int Read(byte[] buffer, int offset, int count)

Parameters

buffer byte[]

The buffer into which data would be read, IF THIS METHOD ACTUALLY DID ANYTHING.

offset int

The offset within that data array at which to insert the data that is read, IF THIS METHOD ACTUALLY DID ANYTHING.

count int

The number of bytes to write, IF THIS METHOD ACTUALLY DID ANYTHING.

Returns

int

nothing.

Reset(Stream)

Resets the stream for use with another stream.

public void Reset(Stream stream)

Parameters

stream Stream

The new output stream for this era.

Examples

ParallelDeflateOutputStream deflater = null;
foreach (var inputFile in listOfFiles)
{
    string outputFile = inputFile + ".compressed";
    using (System.IO.Stream input = System.IO.File.OpenRead(inputFile))
    {
        using (var outStream = System.IO.File.Create(outputFile))
        {
            if (deflater == null)
                deflater = new ParallelDeflateOutputStream(outStream,
                                                           CompressionLevel.Best,
                                                           CompressionStrategy.Default,
                                                           true);
            deflater.Reset(outStream);

            while ((n= input.Read(buffer, 0, buffer.Length)) != 0)
            {
                deflater.Write(buffer, 0, n);
            }
        }
    }
}

Remarks

Because the ParallelDeflateOutputStream is expensive to create, it has been designed so that it can be recycled and re-used. You have to call Close() on the stream first, then you can call Reset() on it, to use it again on another stream.

Seek(long, SeekOrigin)

This method always throws a NotSupportedException.

public override long Seek(long offset, SeekOrigin origin)

Parameters

offset long

The offset to seek to.... IF THIS METHOD ACTUALLY DID ANYTHING.

origin SeekOrigin

The reference specifying how to apply the offset.... IF THIS METHOD ACTUALLY DID ANYTHING.

Returns

long

nothing. It always throws.

SetLength(long)

This method always throws a NotSupportedException.

public override void SetLength(long value)

Parameters

value long

The new value for the stream length.... IF THIS METHOD ACTUALLY DID ANYTHING.

Write(byte[], int, int)

Write data to the stream.

public override void Write(byte[] buffer, int offset, int count)

Parameters

buffer byte[]

The buffer holding data to write to the stream.

offset int

the offset within that data array to find the first byte to write.

count int

the number of bytes to write.

Remarks

To use the ParallelDeflateOutputStream to compress data, create a ParallelDeflateOutputStream with CompressionMode.Compress, passing a writable output stream. Then call Write() on that ParallelDeflateOutputStream, providing uncompressed data as input. The data sent to the output stream will be the compressed form of the data written.

To decompress data, use the DeflateStream class.

See Also