Thursday, February 4, 2016

Raw loops vs. standard algorithms revisited

I was recently reading a Post on Meeting C++ about raw loops vs. STL algorithms.

I didn't think his solution using STL algorithms took full advantage of what the library really has to offer. He did use std::mismatch to find the point at which items mismatched, which is clearly a good thing. The rest of the code, however, looked to me like he was thinking primarily in terms of loops, and merely changed the syntax from a for loop to use std::for_each instead.

At least in my opinion, this doesn't really gain much. At least in my observation, using `std::for_each` is most often a mistake. It's still little more than a generic loop. The name of algorithm used should usually give me a good idea of what some code is doing, but `std::for_each` really doesn't--I have to look at the content of the "loop" to get any idea of what it's really doing.

In this case, the basic idea (simplifying a tiny bit) is to take the paths to two files, and find a relative path from one to the other. The two paths have already been parsed into components held as strings in a vector, so our job is to generate a path with "../" for all the leading components that are identical, then copy the components from one of the vectors into the output after that (and finally append a file name on the end).

Just going from that description, some choices seem fairly obvious, at least to me. To generate the leading part of the path, we probably want to use `std::generate`. Then to copy the remaining elements, we probably want to use `std::copy`. Although it may not be quite as obvious just from the wording, `std::fill_n` actually works out a little better than `std::generate_n`.

Our result is going to be in the form of a string. We want to append strings together to create the whole output. One obvious way to do that is to use an `std::stringstream`. So, let's start by doing a tiny bit of fudging to turn this into a standalone example (I don't think this has a significant effect on what we're doing). Rather than extract the vectors from a cache as he did, let's start with a couple of fixed vectors:


    std::vector a = { "dir1", "dir2", "dir2_1", "p" };
    std::vector b = { "dir1", "dir2", "dir2_2", "p" };

Then we do the step that (I think) he got right:


    auto pos_p = std::mismatch(a.begin(), a.end(), b.begin(), b.end());

From there, we can define our stringstream, and an iterator to give us access to it in a way that's friendly to standard algorithms:


std::ostringstream path;

std::ostream_iterator(path, "/");

Then we want to generate the leading part, then copy the trailing part of the result path:


std::fill_n(out, pos_p.first - a.begin(), "..");
std::copy(pos_p.second, b.end(), out);

At least in my opinion, this makes it much more apparent that the first part of the path is composed of instances of ".." separated by "/", and the second part is composed of elements from the second vector (again, separated by "/").

This hasn't shortened the code a lot (maybe not at all), but it has (at least in my opinion) made the actual intent much clearer.

For better or worse, I've seen a fair number of arguments/claims that using a stringstream and an ostream_iterator can produce substantially slower code that doing the job by appending data directly to a string. I haven't tried to run a benchmark to verify this, but for the moment, let's assume you were working with a compiler for which it really was true (and true to an extent that the slow-down was a serious concern).

If so, we could still use roughly the same structure--we just have to define our own iterator to handle appending things to a string instead of inserting them into a stringstream. We lose some versatility (we can only insert things that are already strings, rather than handling all sorts of conversions to strings like a stream can), but maybe we gain enough speed for that to be worthwhile. So, if we decide this is really worth doing, implementing it is pretty straightforward (moreso than most people seem to think).

Our iterator to append to a string (with a little front-end function to avoid having to pass template parameters explicitly) can look something like this:


#include 

template 
class appender_t : public std::iterator{
    String *s;
    String sep;
public:
    appender_t(String &s, String sep) : s(&s), sep(sep) { }

    appender_t operator=(String const &add) {
        *s += add + sep;
        return *this;
    }

    appender_t operator++() { return *this; }

    appender_t &operator *() { return *this; }
};

template 
appender_t appender(String &s, String sep) {
    return appender_t(s, sep);
}

Most of this is boilerplate--the only parts that really implement any logic at all are the constructor and `operator=`. The constructor is still pretty trivial: it just saves a pointer to the two strings used to initialize it. The assignment operator isn't much more complex--it just says when a string is written through the iterator, we append that string plus the specified separator to the string we stored.

Using this, the rest of the code barely changes at all. We define our output and iterator a tiny bit differently:


    std::string path;

    auto out = appender(path, "/"s);

From there, however, virtually nothing has changed (which is pretty much the point of using iterators to start with--they isolate the algorithm from the container, so algorithms can work with different kinds of containers.

So anyway, using that we can eliminate the pesky `std::stringstream` intermediary, so we're just appending directly to a string, just like we were to start with--but now doing it with algorithms that actually fit the task at hand.

A complete, working copy of the final code is available on Coliru.

I should probably add that this code is purely an attempt at producing exactly the same result as the code in the original article. As Luigi Ballabio was kind enough to point out in a comment, that's not really the right thing to do.