This question started for me when I had to handled files that could be either compressed or uncompressed and I needed to do so transparently.

If you look online, there may be only one answer to that, and it is on StackOverflow when I answered it. Here is some more context to what the answer does and what’s the problem with Boost::Iostreams.

Context

The reason why there is no online answer is not very obvious. First zlib can handle compressed and uncompressed streams in C on the fly. So there should be no reason why the Iostreams decompressor has any problem.

The reason stems from the fact that the decompressor doesn’t delegate the header parsons to zlib, but does it manually. And there is no option for no header, as it will just break and stop in that case.

So when lots of GNU tools can handle text files or gz-compressed files without a specific option, Boost::Iostreams throws at you an exception telling you to change your stream stack.

This is not very maintainable. For instance, if you think that you have to open your file first to check that it is compressed or not, create your stack to open the file again, it feels like lots of work for nothing. And it is. When you now have cloud streaming that cost for each access, and you need to multiply by two these requests, this is something not sustainable.

The solution

My solution comes by stealing code from the decompressor itself. First, I wanted to just read the first two characters and then wrap them in a fixed array that I would read again either with the decompressor, or simply by calling read on the parent stream. Unfortunately, the only object in Boost::Iostreams, basic_array_source, doesn’t provide a read interface and it would have been tough to switch after to the main stream.

I also tried implementing the seekable interface, which was a huge pain. Parent filters and sources cannot be told to seek back (even if they have the capability, like a simple ifstream) , and you have to tell your full stack to be seekable. Which means that your own filter also has to implement the seekable API (which is impossible if you don’t have random access, like in a compressed file!). The problem is that even if it works for files, it will not work for other kind of streams, like with the Google Storage Client API. This one will silently skip the current buffer and then throw an exception in a parallel thread, aborting your program. Just horrible.

So instead, I reused the peekable_source private class from eh decompressor. The latter already had to sometimes read data and put it back to the main stream. It could have sought back, but instead, it has a small string buffer that it uses when data is requested. And this works so well that I wondered why it’s not part of the main API.

using namespace boost::iostreams;
 
template>typename source=""<
struct PeekableSource {
    typedef char char_type;
    struct category : source_tag, peekable_tag { };
    explicit PeekableSource(Source& src, const std::string& putback = "")
            : src_(src), putback_(putback), offset_(0)
    { }
    std::streamsize read(char* s, std::streamsize n)
    {
        std::streamsize result = 0;
 
        // Copy characters from putback buffer
        std::streamsize pbsize =
                static_cast>std::streamsize>(putback_.size());
        if (offset_ < pbsize) {
            result = (std::min)(n, pbsize - offset_);
            BOOST_IOSTREAMS_CHAR_TRAITS(char)::copy(
                    s, putback_.data() + offset_, result);
            offset_ += result;
            if (result == n)
                return result;
        }
 
        // Read characters from src_
        std::streamsize amt =
                boost::iostreams::read(src_, s + result, n - result);
        return amt != -1 ?
               result + amt :
               result ? result : -1;
    }
    void putback(const std::string& s)
    {
        putback_.replace(0, offset_, s);
        offset_ = 0;
    }
 
    Source&          src_;
    std::string      putback_;
    std::streamsize  offset_;
};

And now we can simply use this to peek at the first two characters of our input stream to see if they are a gz file or not, and then delegate the actual read either to the decompressor or the parent source:

struct GzDecompressor {
    typedef char              char_type;
    typedef multichar_input_filter_tag  category;
 
    gzip_decompressor m_decompressor;
    bool m_initialized{false};
    bool m_is_compressed{false};
    std::string m_putback;
 
    template>typename source="">
    void init(Source& src) {
        std::string data;
        data.push_back(get(src));
        data.push_back(get(src));
        m_is_compressed = data[0] == static_cast>char>(0x1f) && data[1] == static_cast>char>(0x8b);
        src.putback(data);
        m_initialized = true;
    }
 
    template>typename source="">
    std::streamsize read(Source& src, char* s, std::streamsize n) {
        PeekableSource<source> peek(src, m_putback);
        if (!m_initialized) {
            init(peek);
        }
 
        if (m_is_compressed) {
            return m_decompressor.read(peek, s, n);
        }
 
        return boost::iostreams::read(peek, s, n);
    }
};

As we still go through the main read calls, this filter is almost transparent to the user and should not make any impact on performance.

What I regret deeply is that the Iostreams decompressor should have had an option to do so natively.

Buy Me a Coffee!
Other Amount:
Your Email Address:

ATK is updated to 3.1.0 with heavy code refactoring. Old C++ standards are now dropped and it requires now a full C++17 compliant compiler.

The main difference for filter support is that explicit SIMD filters using libsimdpp have been dropped while tr2::simd becomes standard and supported by gcc, clang and Visual Studio.

Read More

Today, I’m presenting at the ADC my work on analog modelling for the past year.

I will make a more detailed post later this year, but I’d like to put some teasers here. SPICE net lists are an efficient way of representing electronics circuits and there are several very good free and paying simulators. Unfortunately, they are not easy to integrate in a VST plugin.

Audio ToolKit now has a sister project around this topic. The lite version is also licensed under the BSD and can generate a dynamic filter of a net list. The full project is now also capable of generating static filter, with a source file (and compiling it in memory) that can be manually tuned.

Future work on this project will include different solvers for the static filter, as well as a tuner that will be able to drop entries in the Jacobian (full entries or component contributions for a given pin) in the Newton Raphson solver.

Buy Me a Coffee!
Other Amount:
Your Email Address:
This entry is part 1 of 5 in the series Travelling in LLVM land

I have tried to find the proper receipts to compile on the fly C++ code with clang and LLVM. It’s actually not that easy to achieve if you are not targeting LLVM Intermediate Representation, and unfortunately, the code here, working for LLVM 7, may not work for LLVM 8. Or 6.

Read More

This entry is part 2 of 6 in the series Analog modelling

After my previous post on SPICE modelling in Python, I need to use a good support example to go up to on the fly compilation in C++. This schema will also require some changes to support more than simple nodal analysis, so this now becomes Modified Nodal Analysis with state equations.

Read More

Address Sanitizer: alternative to valgrind

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.50 out of 5)
Loading...
This entry is part 2 of 5 in the series Travelling in LLVM land

Recently, at work, I encountered a strange bug with GCC 7.2 and clang 6 (I didn’t test it with Visual Studio 2017 for different reasons). The bug was not visible on “old” compilers like gcc 4, Visual Studio 2013 or even Intel Compiler 2017. In debug mode, everything was fine, but in release mode, the application crashed. But not always at the same location.

Read More

I’m happy to announce the update of ATK Side-Chain Compressor based on the Audio Toolkit and JUCE. It is available on Windows (AVX compatible processors) and OS X (min. 10.9, SSE4.2) in different formats.

This update changes storage format and allows linked channels to be steered by a mix of power coming from each channel, each passing through its own attack-release filter. It enables more creative workflows with makeup gain specific to each channel. The rest of the plugin works as before, with an optional Middle/Side processing as well as side-chain working either on each channel separately or in middle/side.

This plugin requires the universal runtime on Windows, which is automatically deployed with Windows update (see tis discussion on the JUCE forum). If you don’t have it installed, please check Microsoft website.

Read More