Wednesday, August 29, 2012

C++ dynamic arrays

I've seen dozens (at least) of questions and proposed answers about how to create a dynamic array in C++. The typical reply recommends using the array form of the `new` operator to allocate space, something on this order (but usually less complete):


template
class dynamic_array {
    T *data;
    size_t size;
public:
    dynamic_array() :data(0), size(0) {}

    dynamic_array(T *init, size_t i_size) :size(i_size) {
        data = new T[size];
        for (size_t i=0; i<size; i++)
            data[i] = init[i];
   }

    bool add(T const &new_item) {
        T *temp = new T[size+1];
        for (size_t i=0; i<size; i++)
            temp[i] = data[i];
        ++size;
        delete [] data;
        data = temp;
    }
     
    T &operator[](size_t index) {
         assert(index < size);
         return data[index];
    }
    // We also want a const version of operator[].
    // We also probably want something like get()
    // in std::vector, that throws if index is out of bounds.

    size_t get_size() { return size; }

    void clear() { size=0; delete [] data; }

    ~dynamic_array() { delete [] data; }
};



Now, it's certainly true that you could do worse than this. I believe it manages memory correctly (doesn't leak memory or create dangling pointers), and it is dynamic (to some degree -- it allows adding, but not currently removing, elements dynamically). While it's not currently written to be exception safe, that's mostly to keep it from getting too long -- making it exception safe wouldn't be particularly difficult.

It still has some pretty serious problems though. First of all, it requires that T be default constructible and assignable. Assignable isn't so bad (though in C++11, we'd prefer movable), but default constructible is kind of a pain, especially since we never actually *use* any default objects -- but when we expand the allocation, we have no real choice but to default construct the objects, then assign each value from the old array into the new one. That can be a pretty serious burden if creating a default object is slow, and worse still if assigning objects is slow as well. Since it's a template, we don't know much about T. Either or both might be true.

That points to a much bigger problem: adding an item to the array is slower than we'd like, and the bigger it gets, the slower adding items gets. In computer science terms, adding a single item has complexity that's linear on the current size of the array.

Unfortunately, we can't just do a minor rewrite inside of add() to improve the situation. In fact, the basic premise of how this dynamic array works turns out to be wrong. The only way to fix the biggest problem is to start over with something that works a bit differently from the beginning.

Our big problem here is that when/if we use `new whatever[size]`, we not only get enough space to store the specified number of objects, but code is inserted to create that number of objects in the storage. Generally speaking, we don't want that. Unless we change the interface at least a little bit, there's never a circumstance in which we really want objects in that memory. Even if they're created, immediately after allocating those objects, we're going to assign some other object to that position anyway. We'd really prefer to copy construct the new objects from the old objects (or, in C++11, move-construct). This way a new object is created with the correct value instead of being created with a value we know is wrong, then immediately assigning the correct value to the new object.

We'd also prefer to avoid re-allocating space every time we add a new item. We really want to allocate a chunk of raw memory, turn that raw memory into new objects as they're added, and only allocate more space once in a while, when our memory is all used up.

Using new T[size] won't let us do either of those though. Instead, we need to use something (usually operator new, but malloc would also work) that lets us allocate raw memory, then use placement new to create objects in that raw memory.


template
class dynamic_array {
    T *data;
    size_t allocated;
size_t in_use;

size_t round_up(size_t input) {
size_t ret = 1;
while (ret < input)
ret <<= 1;
return ret;
}

void alloc(T *init, size_t current_size, size_t new_size) {
char *temp = (char *)operator new(new_size * sizeof(T));
allocated = new_size;
for (size_t i=0; i<current_size; i++)
new(temp+i*sizeof(T)) T(init[i]);
in_use = current_size;
data = (T *)temp;
}

void destroy(size_t b, size_t e) {
for ( ;b!=e; ++b)
data[b].~T();
}

public:

    dynamic_array() :data(0), allocated(0), in_use(0) {
alloc(0, 0, 10);
}

    dynamic_array(T *init, size_t i_size) {
alloc(init, i_size, round_up(i_size));
}

    void add(T new_item) {
if (in_use == allocated)
alloc(data, allocated, allocated * 3 / 2);
new ((char *)data + in_use++ * sizeof(T)) T(new_item);
    }

    ~dynamic_array() {
destroy(0, in_use);
operator delete(data);
}

    T &operator[](size_t index) {
        assert(index < in_use);
        return data[index];
    }

    size_t size() { return in_use; }

    void clear() {
destroy(0, in_use);
in_use = 0;
}
};

This is starting to run pretty long, so I'll leave further embellishment of this class for a later post. This is still quite a ways short of what std::vector already provides though -- thus the standard advice to just use a standard container if at all possible. Even for something as simple as a dynamic array, the amount of code involved becomes fairly substantial. For a more complex container like std::set or std::map, the savings is greater still.

No comments:

Post a Comment