String Concatenation and Immutable Strings / Speeding Spidermonkey

Many garbage-collected languages feature “immutable” strings — once a value is assigned, it can’t be manipulated without creating a new instance, with obvious performance implications.

This means that appending onto a string generally means allocating enough room for the resulting string, copying the data from both strings to the new memory, and then updating the assignment variable.

This isn’t a big deal for the odd concatenation here and there, and usually has no material impact. But when you’re in large loops building strings (for instance “hand” making some XML by iterating through some data, or constructing a large HTML table), string-concatenation is often a ghastly slow affair.

Consider the following:

myString = myString + "A";

As visualized above, generally that would cause the runtime to create a new memory allocation large enough to hold the concatenated length, then copy the first value and then the second value to the new buffer.

In a loop, where a string is continually adding new content to the end of itself — a very common need — the performance will generally take quadratic time (as it’s repeatedly allocating and copying a larger and larger string).

I’ve encountered this frequently enough in .NET, deciding when the ugliness and heft of a StringBuilder is necessitated versus when to just suck up the terrible concatenation performance, however this came up recently in a web project requiring a large amount of display data to be dynamically constructed on the client end.

JavaScript doesn’t have a native string builder (though you can emulate one with arrays), so I needed to evaluate how basic string concatenation compared with some of the string builder similes for disparate cases. I was quickly reminded that while Firefox doesn’t have great string concatenation performance, it absolutely annihilates Internet Explorer (incl. 7) doing the same.

Try this basic benchmark in both browsers to see for yourself: On the machine I’m typing this on, Firefox completes 131,072 iterations in 1984ms, while Internet Explorer had only completed 32768 iterations after 11078ms (not only did just 1/4 the iterations take 5 times as long, but the iterations it did complete were the smaller, much easier ones, making the disparity much greater than it appears).

I became curious how Firefox implemented string concatenation so performantly, and from that to extrapolate how the Internet Explorer team screwed it up so royally, and honestly to see if I could easily tease even more performance out of it. Given that we have the source (and the ability to easily change it and build it to test differences), I jumped into js/jsstr.c in the Firefox source to take a look (all code subject to the MPL and the Mozilla Foundation retains all rights).

JSString *
js_ConcatStrings(JSContext *cx, JSString *left, JSString *right)
{
    size_t rn, ln, lrdist, n;
    jschar *rs, *ls, *s;
    JSDependentString *ldep;    /* non-null if left should become dependent */
    JSString *str;
    if (JSSTRING_IS_DEPENDENT(right)) {
        rn = JSSTRDEP_LENGTH(right);
        rs = JSSTRDEP_CHARS(right);
    } else {
        rn = right->length;
        rs = right->chars;
    }
    if (rn == 0)
        return left;
    if (JSSTRING_IS_DEPENDENT(left) ||
        !(*js_GetGCThingFlags(left) & GCF_MUTABLE)) {
        /* We must copy if left does not own a buffer to realloc. */
        ln = JSSTRING_LENGTH(left);
        if (ln == 0)
            return right;
        ls = JSSTRING_CHARS(left);
        s = (jschar *) JS_malloc(cx, (ln + rn + 1) * sizeof(jschar));
        if (!s)
            return NULL;
        js_strncpy(s, ls, ln);
        ldep = NULL;
    } else {
        /* We can realloc left's space and make it depend on our result. */
        ln = left->length;
        if (ln == 0)
            return right;
        ls = left->chars;
        s = (jschar *) JS_realloc(cx, ls, (ln + rn + 1) * sizeof(jschar));
        if (!s)
            return NULL;
        /* Take care: right could depend on left! */
        lrdist = (size_t)(rs - ls);
        if (lrdist < ln)
            rs = s + lrdist;
        left->chars = ls = s;
        ldep = JSSTRDEP(left);
    }
    js_strncpy(s + ln, rs, rn);
    n = ln + rn;
    s[n] = 0;
    str = js_NewString(cx, s, n, GCF_MUTABLE);
    if (!str) {
        /* Out of memory: clean up any space we (re-)allocated. */
        if (!ldep) {
            JS_free(cx, s);
        } else {
            s = JS_realloc(cx, ls, (ln + 1) * sizeof(jschar));
            if (s)
                left->chars = s;
        }
    } else {
        /* Morph left into a dependent prefix if we realloc'd its buffer. */
        if (ldep) {
            JSPREFIX_SET_LENGTH(ldep, ln);
            JSPREFIX_SET_BASE(ldep, str);
#ifdef DEBUG
          {
            JSRuntime *rt = cx->runtime;
            JS_RUNTIME_METER(rt, liveDependentStrings);
            JS_RUNTIME_METER(rt, totalDependentStrings);
            JS_LOCK_RUNTIME_VOID(rt,
                (rt->strdepLengthSum += (double)ln,
                 rt->strdepLengthSquaredSum += (double)ln * (double)ln));
          }
#endif
        }
    }
    return str;
}

The key to Mozilla’s performance is this line, and the surrounding concept of “dependant” strings:

s = (jschar *) JS_realloc(cx, ls, (ln + rn + 1) * sizeof(jschar));

In a nutshell, when you perform the operation A = B + C, where B and C are strings, it reallocates the memory assigned to B — depending upon the memory manager and fragmentation, this can often be done in place with no memory copy — setting B to indicate that it is now a dependant string (e.g. “don’t free the associated memory when you garbage collect B”), then copying C to the tail.

String Concatenation “http://www.yafla.com/dforbes/images/string_concat1.gif” />

 

String Concatenation “http://www.yafla.com/dforbes/images/string_concat3.gif” height=
“93” width=”494″ />

A and B are now pointing to the same location in memory. In a loop where a string is adding to itself, it is just constantly realloc’ing — copying to a new memory location when the memory manager can’t expand the requested amount in place, otherwise just doing a land grab of memory trailing the string — and copying the new data on the end.

Changing this bit of code to do a straight new malloc and copy on all concatenations yielded an executable with Internet Explorer-like glacially slow string concats, exactly as expected.

Which made me curious: What if we predictively accommodated the often growing nature of strings, allocating a bit of extra space at the outset, making it more likely that realloc’s could be accommodated in place? In the grand scheme of things, the memory taken by JavaScript strings is marginally small, so a little extra padding on result-of-concatenation strings shouldn’t hurt.

I implemented the following function in js/jsstr.c.

size_t js_GetNextSize(size_t desired_size)
{
  int max = 0, i;

  for (i = 1; i != JSSTRFLAG_DEPENDENT; i <<= 1)
  {
    if (desired_size & i)
    {
      max = i;
    }
  }
  if (max)
  {
    return max << 1;
  } else
  {
    return desired_size;
  }
}

That’s nothing more than a trivial implementation of ceil(log2(n)), however I tried to implement in the most transparent, cross-platform way possible.

Now I changed the realloc line referenced above to read:

s = (jschar *) JS_realloc(cx, ls, js_GetNextSize((ln + rn + 1) * sizeof(jschar)));

All reallocs will be to the next power of 2 of the requested size — if a realloc demands 37 bytes, it’ll allocate 64 bytes, and if it demands 7 bytes then it’ll allocate 8 — trading a relatively small amount of memory usage for processing time and memory bandwidth.

Re-running the now shows my custom build of Firefox completing 131,072 iterations in 60 to 90ms (versus 1984ms before, so it completed it in less than 1/20th the time of the already speedy Firefox. Internet Explorer would take minutes to perform that many iterations). Firefox memory usage has not changed to a measurable degree, even during heavy browsing.

This change — allocating a little extra space to accommodate realloc calls — is likely the technique used in Opera 9, as this turbo version of Firefox was neck and neck with the speedy Opera 9 on this bit of functionality.

Now to delve through other parts of the excellent JavaScript code…