This question started for me when I had to handled files that could be either compressed or uncompressed and I needed to do so transparently.

If you look online, there may be only one answer to that, and it is on StackOverflow when I answered it. Here is some more context to what the answer does and what’s the problem with Boost::Iostreams.


The reason why there is no online answer is not very obvious. First zlib can handle compressed and uncompressed streams in C on the fly. So there should be no reason why the Iostreams decompressor has any problem.

The reason stems from the fact that the decompressor doesn’t delegate the header parsons to zlib, but does it manually. And there is no option for no header, as it will just break and stop in that case.

So when lots of GNU tools can handle text files or gz-compressed files without a specific option, Boost::Iostreams throws at you an exception telling you to change your stream stack.

This is not very maintainable. For instance, if you think that you have to open your file first to check that it is compressed or not, create your stack to open the file again, it feels like lots of work for nothing. And it is. When you now have cloud streaming that cost for each access, and you need to multiply by two these requests, this is something not sustainable.

The solution

My solution comes by stealing code from the decompressor itself. First, I wanted to just read the first two characters and then wrap them in a fixed array that I would read again either with the decompressor, or simply by calling read on the parent stream. Unfortunately, the only object in Boost::Iostreams, basic_array_source, doesn’t provide a read interface and it would have been tough to switch after to the main stream.

I also tried implementing the seekable interface, which was a huge pain. Parent filters and sources cannot be told to seek back (even if they have the capability, like a simple ifstream) , and you have to tell your full stack to be seekable. Which means that your own filter also has to implement the seekable API (which is impossible if you don’t have random access, like in a compressed file!). The problem is that even if it works for files, it will not work for other kind of streams, like with the Google Storage Client API. This one will silently skip the current buffer and then throw an exception in a parallel thread, aborting your program. Just horrible.

So instead, I reused the peekable_source private class from eh decompressor. The latter already had to sometimes read data and put it back to the main stream. It could have sought back, but instead, it has a small string buffer that it uses when data is requested. And this works so well that I wondered why it’s not part of the main API.

using namespace boost::iostreams;
template>typename source=""<
struct PeekableSource {
    typedef char char_type;
    struct category : source_tag, peekable_tag { };
    explicit PeekableSource(Source& src, const std::string& putback = "")
            : src_(src), putback_(putback), offset_(0)
    { }
    std::streamsize read(char* s, std::streamsize n)
        std::streamsize result = 0;
        // Copy characters from putback buffer
        std::streamsize pbsize =
        if (offset_ < pbsize) {
            result = (std::min)(n, pbsize - offset_);
                    s, + offset_, result);
            offset_ += result;
            if (result == n)
                return result;
        // Read characters from src_
        std::streamsize amt =
                boost::iostreams::read(src_, s + result, n - result);
        return amt != -1 ?
               result + amt :
               result ? result : -1;
    void putback(const std::string& s)
        putback_.replace(0, offset_, s);
        offset_ = 0;
    Source&          src_;
    std::string      putback_;
    std::streamsize  offset_;

And now we can simply use this to peek at the first two characters of our input stream to see if they are a gz file or not, and then delegate the actual read either to the decompressor or the parent source:

struct GzDecompressor {
    typedef char              char_type;
    typedef multichar_input_filter_tag  category;
    gzip_decompressor m_decompressor;
    bool m_initialized{false};
    bool m_is_compressed{false};
    std::string m_putback;
    template>typename source="">
    void init(Source& src) {
        std::string data;
        m_is_compressed = data[0] == static_cast>char>(0x1f) && data[1] == static_cast>char>(0x8b);
        m_initialized = true;
    template>typename source="">
    std::streamsize read(Source& src, char* s, std::streamsize n) {
        PeekableSource<source> peek(src, m_putback);
        if (!m_initialized) {
        if (m_is_compressed) {
            return, s, n);
        return boost::iostreams::read(peek, s, n);

As we still go through the main read calls, this filter is almost transparent to the user and should not make any impact on performance.

What I regret deeply is that the Iostreams decompressor should have had an option to do so natively.

Buy Me a Coffee!
Other Amount:
Your Email Address:

I played with JavaScript before the web 2.0, so there are many things I didn’t know about JavaScript. We see lots of performance tests in browser tests, server-side JS, but if you don’t know the language, you are limited in your understanding. But if you just know the language, and nothing about client- or server-side applications?

Read More

What’s the common point between the questions of cryptography (US and Australia), vaccines (and link to disease), vitamin C (to cure cancer), spending thousands on power cables for your sound system? Some people use their non-knowledge to bully experts. And I think this book answers the question of why this happens.

Read More

On my quest for a good Flask book, I saw this book from Tarek Ziade. We are more or less of the same generation, both from France and he wrote a far better introductory book to Python in French than mine. He also founded the French Python community (AFPY), so I always had a huge respect for the guy. And the book was appetizing.

Read More

I’m thinking of writing a Web service for a project of mine. For this purpose, I wanted to learn Flask (and a bunch of other technologies), as Flask seems well established and well documented. This is a book from Packt that agglomerates 3 previously released books. One of the main questions is the relevance of them as the Flask API evolves.

Read More

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler.

The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio.

Read More