Class ParallelDeflateOutputStream
A class for compressing streams using the Deflate algorithm with multiple threads.
public class ParallelDeflateOutputStream : Stream, IAsyncDisposable, IDisposable
- Inheritance
-
ParallelDeflateOutputStream
- Implements
- Inherited Members
- Extension Methods
Remarks
This class performs DEFLATE compression through writing. For more information on the Deflate algorithm, see IETF RFC 1951, "DEFLATE Compressed Data Format Specification version 1.3."
This class is similar to DeflateStream, except that this class is for compression only, and this implementation uses an approach that employs multiple worker threads to perform the DEFLATE. On a multi-cpu or multi-core computer, the performance of this class can be significantly higher than the single-threaded DeflateStream, particularly for larger streams. How large? Anything over 10mb is a good candidate for parallel compression.
The tradeoff is that this class uses more memory and more CPU than the vanilla DeflateStream, and also is less efficient as a compressor. For large files the size of the compressed data stream can be less than 1% larger than the size of a compressed data stream from the vanialla DeflateStream. For smaller files the difference can be larger. The difference will also be larger if you set the BufferSize to be lower than the default value. Your mileage may vary. Finally, for small files, the ParallelDeflateOutputStream can be much slower than the vanilla DeflateStream, because of the overhead associated to using the thread pool.
Constructors
ParallelDeflateOutputStream(Stream)
Create a ParallelDeflateOutputStream.
public ParallelDeflateOutputStream(Stream stream)
Parameters
streamStreamThe stream to which compressed data will be written.
Examples
This example shows how to use a ParallelDeflateOutputStream to compress data. It reads a file, compresses it, and writes the compressed data to a second, output file.
byte[] buffer = new byte[WORKING_BUFFER_SIZE];
int n= -1;
String outputFile = fileToCompress + ".compressed";
using (System.IO.Stream input = System.IO.File.OpenRead(fileToCompress))
{
using (var raw = System.IO.File.Create(outputFile))
{
using (Stream compressor = new ParallelDeflateOutputStream(raw))
{
while ((n= input.Read(buffer, 0, buffer.Length)) != 0)
{
compressor.Write(buffer, 0, n);
}
}
}
}
Dim buffer As Byte() = New Byte(4096) {}
Dim n As Integer = -1
Dim outputFile As String = (fileToCompress & ".compressed")
Using input As Stream = File.OpenRead(fileToCompress)
Using raw As FileStream = File.Create(outputFile)
Using compressor As Stream = New ParallelDeflateOutputStream(raw)
Do While (n <> 0)
If (n > 0) Then
compressor.Write(buffer, 0, n)
End If
n = input.Read(buffer, 0, buffer.Length)
Loop
End Using
End Using
End Using
Remarks
This stream compresses data written into it via the DEFLATE algorithm (see RFC 1951), and writes out the compressed byte stream.
The instance will use the default compression level, the default buffer sizes and the default number of threads and buffers per thread.
This class is similar to DeflateStream, except that this implementation uses an approach that employs multiple worker threads to perform the DEFLATE. On a multi-cpu or multi-core computer, the performance of this class can be significantly higher than the single-threaded DeflateStream, particularly for larger streams. How large? Anything over 10mb is a good candidate for parallel compression.
ParallelDeflateOutputStream(Stream, CompressionLevel)
Create a ParallelDeflateOutputStream using the specified CompressionLevel.
public ParallelDeflateOutputStream(Stream stream, CompressionLevel level)
Parameters
streamStreamThe stream to which compressed data will be written.
levelCompressionLevelA tuning knob to trade speed for effectiveness.
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
ParallelDeflateOutputStream(Stream, CompressionLevel, CompressionStrategy, bool)
Create a ParallelDeflateOutputStream using the specified CompressionLevel and CompressionStrategy, and specifying whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.
public ParallelDeflateOutputStream(Stream stream, CompressionLevel level, CompressionStrategy strategy, bool leaveOpen)
Parameters
streamStreamThe stream to which compressed data will be written.
levelCompressionLevelA tuning knob to trade speed for effectiveness.
strategyCompressionStrategyBy tweaking this parameter, you may be able to optimize the compression for data with particular characteristics.
leaveOpenbooltrue if the application would like the stream to remain open after inflation/deflation.
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
ParallelDeflateOutputStream(Stream, bool)
Create a ParallelDeflateOutputStream and specify whether to leave the captive stream open when the ParallelDeflateOutputStream is closed.
public ParallelDeflateOutputStream(Stream stream, bool leaveOpen)
Parameters
streamStreamThe stream to which compressed data will be written.
leaveOpenbooltrue if the application would like the stream to remain open after inflation/deflation.
Remarks
See the ParallelDeflateOutputStream(Stream) constructor for example code.
Properties
BufferSize
The size of the buffers used by the compressor threads.
public int BufferSize { get; set; }
Property Value
Remarks
The default buffer size is 128k. The application can set this value at any time, but it is effective only before the first Write().
Larger buffer sizes implies larger memory consumption but allows
more efficient compression. Using smaller buffer sizes consumes less
memory but may result in less effective compression. For example,
using the default buffer size of 128k, the compression delivered is
within 1% of the compression delivered by the single-threaded DeflateStream. On the other hand, using a
BufferSize of 8k can result in a compressed data stream that is 5%
larger than that delivered by the single-threaded
DeflateStream. Excessively small buffer sizes can also cause
the speed of the ParallelDeflateOutputStream to drop, because of
larger thread scheduling overhead dealing with many many small
buffers.
The total amount of storage space allocated for buffering will be (N*S*2), where N is the number of buffer pairs, and S is the size of each buffer (this property). There are 2 buffers used by the compressor, one for input and one for output. By default, DotNetZip allocates 4 buffer pairs per CPU core, so if your machine has 4 cores, then the number of buffer pairs used will be 16. If you accept the default value of this property, 128k, then the ParallelDeflateOutputStream will use 16 * 2 * 128kb of buffer memory in total, or 4mb, in blocks of 128kb. If you set this property to 64kb, then the number will be 16 * 2 * 64kb of buffer memory, or 2mb.
BytesProcessed
The total number of uncompressed bytes processed by the ParallelDeflateOutputStream.
public long BytesProcessed { get; }
Property Value
Remarks
This value is meaningful only after a call to Close().
CanRead
Indicates whether the stream supports Read operations.
public override bool CanRead { get; }
Property Value
Remarks
Always returns false.
CanSeek
Indicates whether the stream supports Seek operations.
public override bool CanSeek { get; }
Property Value
Remarks
Always returns false.
CanWrite
Indicates whether the stream supports Write operations.
public override bool CanWrite { get; }
Property Value
Remarks
Returns true if the provided stream is writable.
Crc32
The CRC32 for the data that was written out, prior to compression.
public int Crc32 { get; }
Property Value
Remarks
This value is meaningful only after a call to Close().
Length
Reading this property always throws a NotSupportedException.
public override long Length { get; }
Property Value
MaxBufferPairs
The maximum number of buffer pairs to use.
public int MaxBufferPairs { get; set; }
Property Value
Remarks
This property sets an upper limit on the number of memory buffer pairs to create. The implementation of this stream allocates multiple buffers to facilitate parallel compression. As each buffer fills up, this stream uses ThreadPool.QueueUserWorkItem() to compress those buffers in a background threadpool thread. After a buffer is compressed, it is re-ordered and written to the output stream.
A higher number of buffer pairs enables a higher degree of parallelism, which tends to increase the speed of compression on multi-cpu computers. On the other hand, a higher number of buffer pairs also implies a larger memory consumption, more active worker threads, and a higher cpu utilization for any compression. This property enables the application to limit its memory consumption and CPU utilization behavior depending on requirements.
For each compression "task" that occurs in parallel, there are 2 buffers allocated: one for input and one for output. This property sets a limit for the number of pairs. The total amount of storage space allocated for buffering will then be (N*S*2), where N is the number of buffer pairs, S is the size of each buffer (BufferSize). By default, DotNetZip allocates 4 buffer pairs per CPU core, so if your machine has 4 cores, and you retain the default buffer size of 128k, then the ParallelDeflateOutputStream will use 4 * 4 * 2 * 128kb of buffer memory in total, or 4mb, in blocks of 128kb. If you then set this property to 8, then the number will be 8 * 2 * 128kb of buffer memory, or 2mb.
CPU utilization will also go up with additional buffers, because a larger number of buffer pairs allows a larger number of background threads to compress in parallel. If you find that parallel compression is consuming too much memory or CPU, you can adjust this value downward.
The default value is 16. Different values may deliver better or worse results, depending on your priorities and the dynamic performance characteristics of your storage and compute resources.
This property is not the number of buffer pairs to use; it is an upper limit. An illustration: Suppose you have an application that uses the default value of this property (which is 16), and it runs on a machine with 2 CPU cores. In that case, DotNetZip will allocate 4 buffer pairs per CPU core, for a total of 8 pairs. The upper limit specified by this property has no effect.
The application can set this value at any time, but it is effective only before the first call to Write(), which is when the buffers are allocated.
Position
Returns the current position of the output stream.
public override long Position { get; set; }
Property Value
Remarks
Because the output gets written by a background thread, the value may change asynchronously. Setting this property always throws a NotSupportedException.
Strategy
The ZLIB strategy to be used during compression.
public CompressionStrategy Strategy { get; }
Property Value
Methods
Close()
Close the stream.
public override void Close()
Remarks
You must call Close on the stream to guarantee that all of the data written in has been compressed, and the compressed data has been written out.
Dispose()
Dispose the object
public void Dispose()
Remarks
Because ParallelDeflateOutputStream is IDisposable, the application must call this method when finished using the instance.
This method is generally called implicitly upon exit from
a using scope in C# (Using in VB).
Dispose(bool)
The Dispose method
protected override void Dispose(bool disposing)
Parameters
disposingboolindicates whether the Dispose method was invoked by user code.
Flush()
Flush the stream.
public override void Flush()
Read(byte[], int, int)
This method always throws a NotSupportedException.
public override int Read(byte[] buffer, int offset, int count)
Parameters
bufferbyte[]The buffer into which data would be read, IF THIS METHOD ACTUALLY DID ANYTHING.
offsetintThe offset within that data array at which to insert the data that is read, IF THIS METHOD ACTUALLY DID ANYTHING.
countintThe number of bytes to write, IF THIS METHOD ACTUALLY DID ANYTHING.
Returns
- int
nothing.
Reset(Stream)
Resets the stream for use with another stream.
public void Reset(Stream stream)
Parameters
streamStreamThe new output stream for this era.
Examples
ParallelDeflateOutputStream deflater = null;
foreach (var inputFile in listOfFiles)
{
string outputFile = inputFile + ".compressed";
using (System.IO.Stream input = System.IO.File.OpenRead(inputFile))
{
using (var outStream = System.IO.File.Create(outputFile))
{
if (deflater == null)
deflater = new ParallelDeflateOutputStream(outStream,
CompressionLevel.Best,
CompressionStrategy.Default,
true);
deflater.Reset(outStream);
while ((n= input.Read(buffer, 0, buffer.Length)) != 0)
{
deflater.Write(buffer, 0, n);
}
}
}
}
Remarks
Because the ParallelDeflateOutputStream is expensive to create, it has been designed so that it can be recycled and re-used. You have to call Close() on the stream first, then you can call Reset() on it, to use it again on another stream.
Seek(long, SeekOrigin)
This method always throws a NotSupportedException.
public override long Seek(long offset, SeekOrigin origin)
Parameters
offsetlongThe offset to seek to.... IF THIS METHOD ACTUALLY DID ANYTHING.
originSeekOriginThe reference specifying how to apply the offset.... IF THIS METHOD ACTUALLY DID ANYTHING.
Returns
- long
nothing. It always throws.
SetLength(long)
This method always throws a NotSupportedException.
public override void SetLength(long value)
Parameters
valuelongThe new value for the stream length.... IF THIS METHOD ACTUALLY DID ANYTHING.
Write(byte[], int, int)
Write data to the stream.
public override void Write(byte[] buffer, int offset, int count)
Parameters
bufferbyte[]The buffer holding data to write to the stream.
offsetintthe offset within that data array to find the first byte to write.
countintthe number of bytes to write.
Remarks
To use the ParallelDeflateOutputStream to compress data, create a ParallelDeflateOutputStream with CompressionMode.Compress, passing a writable output stream. Then call Write() on that ParallelDeflateOutputStream, providing uncompressed data as input. The data sent to the output stream will be the compressed form of the data written.
To decompress data, use the DeflateStream class.