EnumSet and EnumMap

February 14, 2017

Alan Laser

 

This article discusses java.util.EnumSet and java.util.EnumMap from Java’s standard libraries.

What are they?

EnumSet
and
EnumMap are compact, efficient implementations of the Set and Map interfaces. They have the constraint that their elements/keys come from a single enum type.

Like HashSet and HashMap, they are modifiable.

In contrast to HashSet, EnumSet:

  • Consumes less memory, usually.
  • Is faster at all the things a Set can do, usually.
  • Iterates over elements in a predictable order (the declaration order of the element type’s enum constants).
  • Rejects null elements.

In contrast to HashMap, EnumMap:

  • Consumes less memory, usually.
  • Is faster at all the things a Map can do, usually.
  • Iterates over entries in a predictable order (the declaration order of the key type’s enum constants).
  • Rejects null keys.

If you’re wondering how this is possible, I encourage you to look at the source code:

  • EnumSetA bit vector of the ordinals of the elements in the Set. This is an abstract superclass of RegularEnumSet and JumboEnumSet.
  • RegularEnumSetAn EnumSet whose bit vector is a single primitive long, which is enough to handle all enum types having 64 or fewer constants.
  • JumboEnumSetAn EnumSet whose bit vector is a long[] array, which is allocated however many slots are necessary for the given enum type. Two slots are allocated for 128 or fewer constants, three slots for 192 or fewer constants, etc.
  • EnumMapA flat array of the Map‘s values indexed by the ordinals of their keys.

EnumSet and EnumMap cheat! They use privileged code like this:


**
* Returns all of the values comprising E.
* The result is uncloned, cached, and shared by all callers.
*/
private static <E extends Enum<E>> E[] getUniverse(Class<E> elementType) {
return SharedSecrets.getJavaLangAccess()
                    .getEnumConstantsShared(elementType);
}
                    

If you want all the Month constants, you might call Month.values(), giving you a Month[] array. There is a single backing array instance of those Month constants living in memory somewhere (a private field in the Class object for Month), but it wouldn’t be safe to pass that array directly to every caller of values(). Imagine if someone modified that array! Instead, values() creates a fresh clone of the array for each caller.

EnumSet and EnumMap get to skip that cloning step. They have direct access to the backing array.

Effectively, no third-party versions of these classes can be as efficient. Third-party libraries that provide enum-specialized collections tend to delegate to EnumSet and EnumMap. It’s not that the library authors are lazy or incapable; delegating is the correct choice for them.

When should they be used?

Historically, Enum{Set,Map} were recommended as a matter of safety, taking better advantage of Java’s type system than the alternatives.

Prefer enum types and Enum{Set,Map} over int flags.

Effective Java goes into detail about this use case for Enum{Set,Map} and enum types in general. If you write a lot of Java code, then you should read that book and follow its advice.

Before enum types existed, people would declare flags as int constants. Sometimes the flags would be powers of two and combined into sets using bitwise arithmetic:


static final int OVERLAY_STREETS  = 1 << 0;
static final int OVERLAY_ELECTRIC = 1 << 1;
static final int OVERLAY_PLUMBING = 1 << 2;
static final int OVERLAY_TERRAIN  = 1 << 3;

void drawCityMap(int overlays) { ... }

drawCityMap(OVERLAY_STREETS | OVERLAY_PLUMBING);
                    

Other times the flags would start at zero and count up by one, and they would be used as array indexes:


static final int MONSTER_SLIME    = 0;
static final int MONSTER_GHOST    = 1;
static final int MONSTER_SKELETON = 2;
static final int MONSTER_GOLEM    = 3;

int[] kills = getMonstersSlain();

if (kills[MONSTER_SLIME] >= 10) { ... }
                    

These approaches got the job done for many people, but they were somewhat error-prone and difficult to maintain.

When enum types were introduced to the language, Enum{Set,Map} came with them. Together they were meant to provide better tooling for problems previously solved with int flags. We would say, “Don’t use int flags, use enum constants. Don’t use bitwise arithmetic for sets of flags, use EnumSet. Don’t use arrays for mappings of flags, use EnumMap.” This was not because the enum-based solutions were faster than int flags — they were probably slower — but because the enum-based solutions were easier to understand and implement correctly.

Fast forward to today, I don’t see many people using int flags anymore (though there are notable exceptions). We’ve had enum types in the language for more than a decade. We’re all using enum types here and there, we’re all using the collections framework. At this point, while Effective Java‘s advice regarding Enum{Set,Map} is still valid, I think most people will never have a chance to put it into practice.

Today, we’re using enum types in the right places, but we’re forgetting about the collection types that came with them.

Prefer Enum{Set,Map} over Hash{Set,Map} as a performance optimization.

  • Prefer EnumSet over HashSet when the elements come from a single enum type.
  • Prefer EnumMap over HashMap when the keys come from a single enum type.

Should you refactor all of your existing code to use Enum{Set,Map} instead of Hash{Set,Map}? No.

Your code that uses Hash{Set,Map} isn’t wrong. Migrating to Enum{Set,Map} might make it faster. That’s it.

If you’ve ever used primitive collection libraries like fastutil or Trove, then it may help to think of Enum{Set,Map} like those primitive collections. The difference is that Enum{Set,Map} are specialized for enum types, not primitive types, and you can use them without depending on any third-party libraries.

Enum{Set,Map} don’t have identical semantics to Hash{Set,Map}, so please don’t make blind, blanket replacements in your existing code.

Instead, try to remember these classes for next time. If you can make your code more efficient for free, then why not go ahead and do that, right?

If you use IntelliJ IDEA, you can have it remind you to use Enum{Set,Map} with inspections:

  • Analyze – Run inspection by name – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”

…or…

  • File – Settings – Editor – Inspections – Java – Performance issues – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”

SonarQube can also remind you to use Enum{Set,Map}:

  • S1641: “Sets with elements that are enum values should be replaced with EnumSet”
  • S1640: “Maps with keys that are enum values should be replaced with EnumMap”

For immutable versions of Enum{Set,Map}, see the following methods from Guava:

If you don’t want to use Guava, then wrap the modifiable Enum{Set,Map} instances in Collections.unmodifiableSet(set) or Collections.unmodifiableMap(map) and throw away the direct references to the modifiable collections.

The resulting collections may be less efficient when it comes to operations like containsAll and equals than their counterparts in Guava, which may in turn be less efficient than the raw modifiable collections themselves.

Could the implementations be improved?

Since they can’t be replaced by third-party libraries, Enum{Set,Map} had better be as good as possible! They’re good already, but they could be better.

Enum{Set,Map} have missed out on potential upgrades since Java 8. New methods were added in Java 8 to Set and Map (or higher-level interfaces like Collection and Iterable). While the default implementations of those methods are correct, we could do better with overrides in Enum{Set,Map}.

This issue is tracked as JDK-8170826.

Specifically, these methods should be overridden:

  • {Regular,Jumbo}EnumSet.forEach(action)
  • {Regular,Jumbo}EnumSet.iterator().forEachRemaining(action)
  • {Regular,Jumbo}EnumSet.spliterator()
  • EnumMap.forEach(action)
  • EnumMap.{keySet,values,entrySet}().forEach(action)
  • EnumMap.{keySet,values,entrySet}().iterator().forEachRemaining(action)
  • EnumMap.{keySet,values,entrySet}().spliterator()

I put sample implementations on GitHub in case you’re curious what these overrides might look like. They’re all pretty straightforward.

Rather than walk through each implementation in detail, I’ll share some high-level observations about them.

  • The optimized forEach and forEachRemaining methods are roughly 50% better than the defaults (in terms of operations per second).
  • EnumMap.forEach(action) benefits the most, becoming twice as fast as the default implementation.
  • The iterable.forEach(action) method is popular. Optimizing it tends to affect a large audience, which increases the likelihood that the optimization (even if small) is worthwhile. (I’d claim that iterable.forEach(action) is too popular, and I’d suggest that the traditional enhanced for loop should be preferred over forEach except when the argument to forEach can be written as a method reference. That’s a topic for another discussion, though.)
  • The iterator.forEachRemaining(action) method is more important than it seems. Few people use it directly, but many people use it indirectly through streams. The default spliterator() delegates to the iterator(), and the default stream() delegates to the spliterator(). In the end, stream traversal may delegate to iterator().forEachRemaining(...). Given the popularity of streams, optimizing this method is a good idea!
  • The iterable.spliterator() method is critical when it comes to stream performance, but writing a custom Spliterator from scratch is a non-trivial task. I recommend this approach:
    • Check whether the characteristics of the default spliterator are correct for your collection (often times the defaults are too conservative — for example, EnumSet‘s spliterator is currently missing the ORDERED, SORTED, and NONNULL characteristics). If they’re not correct, then provide a trivial override of the spliterator that uses Spliterators.spliterator(collection, characteristics) to define the correct characteristics.
    • Don’t go further than that until you’ve read through the implementation of that spliterator, and you understand how it works, and you’re confident that you can do better. In particular, your tryAdvance(action) and trySplit() should both be better. Write a benchmark afterwards to confirm your assumptions.
  • The map.forEach(action) method is extremely popular and is almost always worth overriding. This is especially true for maps like EnumMap that create their Entry objects on demand.
  • It’s usually possible to share code across the forEach and forEachRemaining methods. If you override one, you’re already most of the way there to overriding the others.
  • I don’t think it’s worthwhile to override collection.removeIf(filter) in any of these classes. For RegularEnumSet, where it seemed most likely to be worthwhile, I couldn’t come up with a faster implementation than the default.
  • Enum{Set,Map} could provide faster hashCode() implementations than the ones they currently inherit from AbstractSet and AbstractMap, but I don’t think that would be worthwhile. In general, I don’t think optimizing the hashCode() of collections is worthwhile unless it can somehow become a constant-time (O(1)) operation, and even then it is questionable. Collection hash codes aren’t used very often.

Could the APIs be improved?

The implementation-level changes I’ve described are purely beneficial. There is no downside other than a moderate increase in lines of code, and the new lines of code aren’t all that complicated. (Even if they were complicated, this is java.util! Bring on the micro-optimizations.)

Since the existing code is already so good, though, changes of this nature have limited impact. Cutting one third or one half of the execution time from an operation that’s already measured in nanoseconds is a good thing but not game-changing. I suspect that those changes will cause exactly zero users of the JDK to write their applications differently.

The more tantalizing, meaningful, and dangerous changes are the realm of the APIs.

I think that Enum{Set,Map} are chronically underused. They have a bit of a PR problem. Some developers don’t know these classes exist. Other developers know about these classes but don’t bother to reach for them when the time comes. It’s just not a priority for them. That’s totally understandable, but… There’s avoiding premature optimization and then there’s throwing away performance for no reason — performance nihilism? Maybe we can win their hearts with API-level changes.

No one should have to go out of their way to use Enum{Set,Map}. Ideally it should be easier than using Hash{Set,Map}. The EnumSet.allOf(elementType) method is a great example. If you want a Set containing all the enum constants of some type, then EnumSet.allOf(elementType) is the best solution and the easiest solution.

The high-level JDK-8145048 tracks a couple of ideas for improvements in this area. In the following sections, I expand on these ideas and discuss other API-level changes.

Add immutable Enum{Set,Map} (maybe?)

In a recent conversation on Twitter about JEP 301: Enhanced Enums, Joshua Bloch and Brian Goetz referred to theoretical immutable Enum{Set,Map} types in the JDK.

Joshua Bloch also discussed the possibility of an immutable EnumSet in Effective Java:

“The one real disadvantage of EnumSet is that it is not, as of release 1.6, possible to create an immutable EnumSet, but this will likely be remedied in an upcoming release. In the meantime, you can wrap an EnumSet with Collections.unmodifiableSet, but conciseness and performance will suffer.”

When he said “performance will suffer”, he was probably referring to the fact that certain bulk operations of EnumSet won’t execute as quickly when inside a wrapper collection (tracked as JDK-5039214). Consider RegularEnumSet.equals(object):


public boolean equals(Object o) {
    if (!(o instanceof RegularEnumSet))
        return super.equals(o);

    RegularEnumSet<?> es = (RegularEnumSet<?>)o;
    if (es.elementType != elementType)
        return elements == 0 && es.elements == 0;

    return es.elements == elements;
}
                    

It’s optimized for the case that the argument is another instance of RegularEnumSet. In that case the equality check boils down to a comparison of two primitive long values. Now that’s fast!

If the argument to equals(object) was not a RegularEnumSet but instead a Collections.unmodifiableSet wrapper, that code would fall back to its slow path.

Guava’s approach is similar to the Collections.unmodifiableSet one, although Guava does a bit better in terms of unwrapping the underlying Enum{Set,Map} and delegating to the super-fast optimized paths.

If your application deals exclusively with Guava’s immutable Enum{Set,Map} wrappers, you should get the full benefit of those optimized paths from the JDK. If you mix and match Guava’s collections with the JDK’s though, the results won’t be quite as good. (RegularEnumSet doesn’t know how to unwrap Guava’s ImmutableEnumSet, so a comparison in that direction would invoke the slow path.)

If immutable Enum{Set,Map} had full support in the JDK, however, it would not have those same limitations. RegularEnumSet and friends can be changed.

What should be done in the JDK?

I spent a long time and tested a lot of code trying to come up with an answer to this. Sadly the end result is:

I don’t know.

Personally, I’m content to use Guava for this. I’ll share some observations I made along the way.

Immutable Enum{Set,Map} won’t be faster than mutable Enum{Set,Map}.

The current versions of Enum{Set,Map} are really, really good. They’ll be even better once they override the defaults from Java 8.

Sometimes, having to support mutability comes with a tax on efficiency. I don’t think this is the case with Enum{Set,Map}. At best, immutable versions of these classes will be exactly as efficient as the mutable ones.

The more likely outcome is that immutable versions will come with a small penalty to performance by expanding the Enum{Set,Map} ecosystem.

Take RegularEnumSet.equals(object) for example. Each time we create a new type of EnumSet, are we going to change that code to add a new instanceof check for our new type? If we add the check, we make that code worse at handling everything except our new type. If we don’t add the check, we…. still make that code worse! It’s less effective than it used to be; more EnumSet instances trigger the slow path.

Classes like Enum{Set,Map} have a userbase that is more sensitive to changes in performance than average users. If adding a new type causes some call site to become megamorphic, we might have thrown their carefully-crafted assumptions regarding performance out the window.

If we decide to add immutable Enum{Set,Map}, we should do so for reasons unrelated to performance.

As an exception to the rule, an immutable EnumSet containing all constants of a single enum type would be really fast.

RegularEnumSet sets such a high bar for efficiency. There is almost no wiggle room in Set operations like contains(element) for anyone else to be faster. Here’s the source code for RegularEnumSet.contains(element):


public boolean contains(Object e) {
    if (e == null)
        return false;
    Class<?> eClass = e.getClass();
    if (eClass != elementType && eClass.getSuperclass() != elementType)
        return false;

    return (elements & (1L << ((Enum<?>)e).ordinal())) != 0;
}
                    

If you can’t do contains(element) faster than that, you’ve already lost. Your EnumSet is probably worthless.

There is a worthy contender, which I’ll call FullEnumSet. It is an EnumSet that (always) contains every constant of a single enum type. Here is one way to write that class:



import java.util.function.Consumer;
import java.util.function.Predicate;

class FullEnumSet<E extends Enum&lt;E>> extends EnumSet<E> {

  // TODO: Add a static factory method somewhere.
  FullEnumSet(Class<E> elementType, Enum<?>[] universe) {
    super(elementType, universe);
  }

  @Override
  @SuppressWarnings("unchecked")
  public Iterator<E> iterator() {
    // TODO: Avoid calling Arrays.asList.
    //       The iterator class can be shared and used directly.
    return Arrays.asList((E[]) universe).iterator();
  }

  @Override
  public Spliterator<E> spliterator() {
    return Spliterators.spliterator(
        universe,
        Spliterator.ORDERED |
            Spliterator.SORTED |
            Spliterator.IMMUTABLE |
            Spliterator.NONNULL |
            Spliterator.DISTINCT);
  }

  @Override
  public int size() {
    return universe.length;
  }

  @Override
  public boolean contains(Object e) {
    if (e == null)
      return false;

    Class<?> eClass = e.getClass();
    return eClass == elementType || eClass.getSuperclass() == elementType;
  }

  @Override
  public boolean containsAll(Collection<?> c) {
    if (!(c instanceof EnumSet))
      return super.containsAll(c);

    EnumSet<?> es = (EnumSet<?>) c;
    return es.elementType == elementType || es.isEmpty();
  }

  @Override
  @SuppressWarnings("unchecked")
  public void forEach(Consumer<? super E> action) {
    int i = 0, n = universe.length;
    if (i >= n) {
      Objects.requireNonNull(action);
      return;
    }
    do action.accept((E) universe[i]);
    while (++i < n);
  }

  @Override void addAll()               {throw uoe();}
  @Override void addRange(E from, E to) {throw uoe();}
  @Override void complement()           {throw uoe();}

  @Override public boolean add(E e)                          {throw uoe();}
  @Override public boolean addAll(Collection<? extends E> c) {throw uoe();}
  @Override public void    clear()                           {throw uoe();}
  @Override public boolean remove(Object e)                  {throw uoe();}
  @Override public boolean removeAll(Collection<?> c)        {throw uoe();}
  @Override public boolean removeIf(Predicate<? super E> f)  {throw uoe();}
  @Override public boolean retainAll(Collection<?> c)        {throw uoe();}

  private static UnsupportedOperationException uoe() {
    return new UnsupportedOperationException();
  }

  // TODO: Figure out serialization.
  //       Serialization should preserve these qualities:
  //         - Immutable
  //         - Full
  //         - Singleton?
  //       Maybe it's a bad idea to extend EnumSet?
  private static final long serialVersionUID = 0;
}
                    

FullEnumSet has many desirable properties. Of note:

  • contains(element) only needs to check the type of the argument to know whether it’s a member of the set.
  • containsAll(collection) is extremely fast when the argument is an EnumSet (of any kind); it boils down to comparing the element types of the two sets. It follows that equals(object) is just as fast in that case, since equals delegates the hard work to containsAll.
  • Since all the elements are contained in one flat array with no empty spaces, conditions are ideal for iterating and for splitting (splitting efficiency is important in the context of parallel streams).
  • It beats RegularEnumSet in all important metrics:
    • Query speed (contains(element), etc.)
    • Iteration speed
    • Space consumed

Asking for the full set of enum constants of some type is a very common operation. See: every user of values(), elementType.getEnumConstants(), and EnumSet.allOf(elementType). I bet the vast majority of those users do not modify (their copy of) that set of constants. A class that is specifically tailored to that use case has a good chance of being worthwhile.

Since it’s immutable, the FullEnumSet of each enum type could be a lazy-initialized singleton.

Should immutable Enum{Set,Map} reuse existing code, or should they be rewritten from scratch?

As I said earlier, the immutable versions of these classes aren’t going to be any faster. If they’re built from scratch, that code is going to look near-identical to the existing code. There would be a painful amount of copy and pasting, and I would not envy the people responsible for maintaining that code in the future.

Suppose we want to reuse the existing code. I see two general approaches:

  1. Do what Guava did, basically. Create unmodifiable wrappers around modifiable Enum{Set,Map}. Both the wrappers and the modifiable collections should be able to unwrap intelligently to take advantage of the existing optimizations for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)).
  2. Extend the modifiable Enum{Set,Map} classes with new classes that override modifier methods to throw UnsupportedOperationException. Optimizations that sniff for particular Enum{Set,Map} types (as in RegularEnumSet.equals(object)) remain exactly as effective as before without changes.

Of those two, I prefer the Guava-like approach. Extending the existing classes raises some difficult questions about the public API, particularly with respect to serialization.

What’s the public API for immutable Enum{Set,Map}? What’s the immutable version of EnumSet.of(e1, e2, e3)?

Here’s where I gave up.

  • Should we add public java.util.ImmutableEnum{Set,Map} classes?
  • If not, where do we put the factory methods, and what do we name them? EnumSet.immutableOf(e1, e2, e3)? EnumSet.immutableAllOf(Month.class)? Yuck! (Clever synonyms like “having” and “universeOf” might be even worse.)
  • Are the new classes instances of Enum{Set,Map} or do they exist in an unrelated class hierarchy?
  • If the new classes do extend Enum{Set,Map}, how is serialization affected? Do we add an “isImmutable” bit to the current serialized forms? Can that be done without breaking backwards compatibility?

Good luck to whoever has to produce the final answers to those questions.

That’s enough about this topic. Let’s move on.

Add factory methods

JDK-8145048 mentions the possibility of adding factory methods in Enum{Set,Map} to align them with Java 9’s Set and Map factories. EnumSet already has a varargs EnumSet.of(...) factory method, but EnumMap has nothing like that.

It would be nice to be able to declare EnumMap instances like this, for some reasonable number of key-value pairs:


Map<DayOfWeek, String> dayNames =
    EnumMap.of(
        DayOfWeek.MONDAY,    "lunes",
        DayOfWeek.TUESDAY,   "martes",
        DayOfWeek.WEDNESDAY, "miércoles",
        DayOfWeek.THURSDAY,  "jueves",
        DayOfWeek.FRIDAY,    "viernes",
        DayOfWeek.SATURDAY,  "sábado",
        DayOfWeek.SUNDAY,    "domingo");
                    

Users could use EnumMap‘s copy constructor in conjunction with Java 9’s Map factory methods to achieve the same result less efficiently…


Map<DayOfWeek, String> dayNames =
    new EnumMap<>(
        Map.of(
            DayOfWeek.MONDAY,    "lunes",
            DayOfWeek.TUESDAY,   "martes",
            DayOfWeek.WEDNESDAY, "miércoles",
            DayOfWeek.THURSDAY,  "jueves",
            DayOfWeek.FRIDAY,    "viernes",
            DayOfWeek.SATURDAY,  "sábado",
            DayOfWeek.SUNDAY,    "domingo"));
                    

…but the more we give up efficiency like that, the less EnumMap makes sense in the first place. A reasonable person might start to question why they should bother with EnumMap at all — just get rid of the new EnumMap<>(...) wrapper and use Map.of(...) directly.

Speaking of that EnumMap(Map) copy constructor, the fact that it may throw IllegalArgumentException when provided an empty Map leads people to use this pattern instead:


Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class);
copy.putAll(otherMap);
                    

We could give them a shortcut:


Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class, otherMap);
                    

Similarly, to avoid an IllegalArgumentException from EnumSet.copyOf(collection), I see code like this:


Set<Month> copy = EnumSet.noneOf(Month.class);
copy.addAll(otherCollection);
                    

We could give them a shortcut too:


Set<Month> copy = EnumSet.copyOf(Month.class, otherCollection);
                    

Existing code may define mappings from enum constants to values as standalone functions. Maybe the users of that code would like to view those (function-based) mappings as Map objects.

To that end, we could give people the means to generate an EnumMap from a Function:


Locale locale = Locale.forLanguageTag("es-MX");

Map<DayOfWeek, String> dayNames =
    EnumMap.map(DayOfWeek.class,
                day -> day.getDisplayName(TextStyle.FULL, locale));

// We could interpret the function returning null to mean that the
// key is not present.  That would allow this method to support
// more than the "every constant is a key" use case while dropping
// support for the "there may be present null values" use case,
// which is probably a good trade.
                    

We could provide a similar factory method for EnumSet, accepting a Predicate instead of a Function:


Set<Month> shortMonths =
    EnumSet.filter(Month.class,
                   month -> month.minLength() < 31);
                    

This functionality could be achieved less efficiently and more verbosely with streams. Again, the more we give up efficiency like that, the less sense it makes to use Enum{Set,Map} in the first place. I acknowledge that there is a cost to making API-level changes like the ones I’m discussing, but I feel that we are solidly in the “too little API-level support for Enum{Set,Map}” part of the spectrum and not even close to approaching the opposite “API bloat” end.

I don’t mean to belittle streams. There should also be more support for Enum{Set,Map} in the stream API.

Add collectors

Code written for Java 8+ will often produce collections using streams and collectors rather than invoking collection constructors or factory methods directly. I don’t think it would be outlandish to estimate that one third of collections are produced by collectors. Some of these collections will be (or could be) Enum{Set,Map}, and more could be done to serve that use case.

Collectors with these signatures should exist somewhere in the JDK:


public static <T extends Enum<T>>
Collector<T, ?, EnumSet<T>> toEnumSet(
    Class<T> elementType)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
    Class<K> keyType,
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U>; mergeFunction)
                    

Similar collectors can be obtained from the existing collector factories in the Collectors class (specifically toCollection(collectionSupplier) and toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)) or by using Collector.of(...), but that requires a little more effort on the users’ part, adding a little bit of extra friction to using Enum{Set,Map} that we don’t need.

I referenced these collectors from Guava earlier in this article:

They do not require the Class object argument, making them easier to use than the collectors that I proposed. The reason the Guava collectors can do this is that they produce ImmutableSet and ImmutableMap, not EnumSet and EnumMap. One cannot create an Enum{Set,Map} instance without having the Class object for that enum type. In order to have a collector that reliably produces Enum{Set,Map} (even when the stream contains zero input elements to grab the Class object from), the Class object must be provided up front.

We could provide similar collectors in the JDK that would produce immutable Set and Map instances. For streams with no elements, the collectors would produce Collections.emptySet() or Collections.emptyMap(). For streams with at least one element, the collectors would produce an Enum{Set,Map} instance wrapped by Collections.unmodifiable{Set,Map}.

The signatures would look like this:


public static <T extends Enum<T>>
Collector<T, ?, Set<T>> toImmutableEnumSet()

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper)

public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
    Function<? super T, ? extends K> keyMapper,
    Function<? super T, ? extends U> valueMapper,
    BinaryOperator<U>gt; mergeFunction)
                    

I’m not sure that those collectors are worthwhile. I might never recommend them over their counterparts in Guava.

The StreamEx library also provides a couple of interesting enum-specialized collectors:

They’re interesting because they are potentially short-circuiting. With MoreCollectors.toEnumSet(elementType), when the collector can determine that it has encountered all of the elements of that enum type (which is easy — the set of already-collected elements can be compared to EnumSet.allOf(elementType)), it stops collecting. These collectors may be well-suited for streams having a huge number of elements (or having elements that are expensive to compute) mapping to a relatively small set of enum constants.

I don’t know how feasible it is to port these StreamEx collectors to the JDK. As I understand it, the concept of short-circuiting collectors is not supported by the JDK. Adding support may necessitate other changes to the stream and collector APIs.

Be navigable? (No)

Over the years, many people have suggested that Enum{Set,Map} should implement the NavigableSet and NavigableMap interfaces. Every enum type is Comparable, so it’s technically possible. Why not?

I think the Navigable{Set,Map} interfaces are a poor fit for Enum{Set,Map}.

Those interfaces are huge! Implementing Navigable{Set,Map} would bloat the size of Enum{Set,Map} by 2-4x (in terms of lines of code). It would distract them from their core focus and strengths. Supporting the navigable API would most likely come with a non-zero penalty to runtime performance.

Have you ever looked closely at the specified behavior of methods like subSet and subMap, specifically when they might throw IllegalArgumentException? Those contracts impose a great deal of complexity for what seems like undesirable behavior. Enum{Set,Map} could take a stance on those methods similar to Guava’s ImmutableSortedSet and ImmutableSortedMap: acknowledge the contract of the interface but do something else that is more reasonable instead…

I say forget about it. If you want navigable collections, use TreeSet and TreeMap (or their thread-safe cousins, ConcurrentSkipListSet and ConcurrentSkipListMap). The cross-section of people who need the navigable API and the efficiency of enum-specialized collections must be very small.

There are few cases where the Comparable nature of enum types comes into play at all. In practice, I expect that the ordering of most enum constants is arbitrary (with respect to intended behavior).

I’ll go further than that; I think that making all enum types Comparable in the first place was a mistake.

  • Which ordering of Collector.Characteristics is “natural”, [CONCURRENT,UNORDERED] or [UNORDERED,CONCURRENT]?
  • Which is the “greater” Thread.State, WAITING or TIMED_WAITING?
  • FileVisitOption.FOLLOW_LINKS is “comparable” — to what? (There is no other FileVisitOption.)
  • How many instances of RoundingMode are in the “range” from FLOOR to CEILING?
    
    import java.math.RoundingMode;
    import java.util.EnumSet;
    import java.util.Set;
    
    class RangeTest {
      public static void main(String[] args) {
        Set<RoundingMode> range =
            EnumSet.range(RoundingMode.FLOOR,
                          RoundingMode.CEILING);
        System.out.println(range.size());
      }
    }
    
    // java.lang.IllegalArgumentException: FLOOR > CEILING
                                

There are other enum types where questions like that actually make sense, and those should be Comparable.

  • Is Month.JANUARY “before” Month.FEBRUARY? Yes.
  • Is TimeUnit.HOURS “larger” than TimeUnit.MINUTES? Yes.

Implementing Comparable or not should have been a choice for authors of individual enum types. To serve people who really did want to sort enum constants by declaration order for whatever reason, we could have automatically provided a static Comparator from each enum type:


Comparator<JDBCType> c = JDBCType.declarationOrder();
                    

It’s too late for that now. Let’s not double down on the original mistake by making Enum{Set,Map} navigable.

Conclusion

EnumSet
and
EnumMap are cool collections, and you should use them!

They’re already great, but they can become even better with changes to their private implementation details. I propose some ideas here. If you want to find out what happens in the JDK, the changes (if there are any) should be noted in JDK-8170826.

API-level changes are warranted as well. New factory methods and collectors would make it easier to obtain instances of Enum{Set,Map}, and immutable Enum{Set,Map} could be better-supported. I propose some ideas here, but if there are any actual changes made then they should be noted in JDK-8145048.

Framework Benchmarks Round 13

November 16, 2016

Nate Brady

Round 13 of the ongoing Web Framework Benchmarks project is here! The project now features 230 framework implementations (of our JSON serialization test) and includes new entrants on platforms as diverse as Kotlin and Qt. Yes, that Qt. We also congratulate the ASP.NET team for the most dramatic performance improvement we’ve ever seen, making ASP.NET Core a top performer.View Round 13 resultsThe large filters panel on our results web site is a testament to the ever-broadening spectrum of options for web developers. What a great time to be building web apps! A great diversity of frameworks means there are likely many options that provide high-performance while meeting your language and productivity requirements.

Good fortunes

As the previous round—Round 12—was wrapping up, we were unfortunately rushed as the project’s physical hardware environment was being decommissioned. But good fortune was just around the corner, thanks to the lucky number 13!

New hardware and cloud environments

For Round 13, we have all new test environments, for both physical hardware and the virtualized public cloud.

Microsoft has provided the project with Azure credits, so starting with Round 13, the cloud environment is on Azure D3v2 instances. Previous rounds’ cloud tests were run on AWS.

Meanwhile, ServerCentral has provided the project a trio of physical servers in one of their development lab environments with 10 gigabit Ethernet. Starting with Round 13, the physical hardware environment is composed of a Dell R910 application server (4x 10-Core E7-4850 CPUs) and a Dell R420 database server (2x 4-Core E5-2406 CPUs).

We’d like to extend huge thanks to ServerCentral and Microsoft for generously supporting the project!

We recognize that as a result of these changes, Round 13 is not easy to directly compare to Round 12. Although changing the test environments was not intentional, it was necessary. We believe the results are still as valuable as ever. An upside of this environment diversity is visibility into the ways various frameworks and platforms work with the myriad variables of cores, clock speed, and virtualization technologies. For example, our new physical application server has twice as many HT cores as the previous environment, but the CPUs are older, so there is an interesting balance of higher concurrency but potentially lower throughput. In aggregate, the Round 13 results on physical hardware are generally lower due to the older CPUs, all else being equal.

Many fixes to long-broken tests

Along with the addition of new frameworks, Round 13 also marks a sizeable decrease in the number of existing framework tests that have failed to execute properly in previous rounds. This is largely the result of a considerable community effort over the past few months to identify and fix dozens of frameworks, some of which we haven’t been able to successfully test since 2014.

Continuous benchmarking

Round 13 is the first round conducted with what we’re calling Continuous Benchmarking. Continuous Benchmarking is the notion of setting up the test environment to automatically reset to a clean state, pull the latest from the source repository, prepare the environment, execute the test suite, deliver results, and repeat.

There are many benefits of Continuous Benchmarking. For example:

  • At any given time, we can grab the most recent results and mark them as a preview or final for an official Round. This should allow us to accelerate the delivery of Rounds.
  • With some additional work, we will be able to capture and share results as they are made available. This should give participants in the project much quicker insight into how their performance tuning efforts are playing out in our test environment. Think of it as continuous integration but for benchmark results. Our long-term goal is to provide a results viewer that plots performance results over time.
  • Any changes that break the test environment as a whole or a specific framework’s test implementation should be visible much earlier. Prior to Continuous Benchmarking, breaking changes were often not detected until a preview run.

Microsoft’s ASP.NET Core

We consider ourselves very fortunate that our project has received the attention that it has from the web framework community. It has become a source of a great pride for our team. Among every reaction and piece of feedback we’ve received, our very favorite kind is when a framework maintainer recognizes a performance deficiency highlighted by this project and then works to improve that performance. We love this because we think of it as a small way of improving performance of the whole web, and we are passionate about performance.

Round 13 is especially notable for us because we are honored that Microsoft has made it a priority to improve ASP.NET’s performance in these benchmarks, and in so doing, improve the performance of all applications built on ASP.NET.

Thanks to Microsoft’s herculean performance tuning effort, ASP.NET—in the new cross-platform friendly form of ASP.NET Core—is now a top performer in our Plaintext test, making it among the fastest platforms at the fundamentals of web request routing. The degree of improvement is absolutely astonishing, going from 2,120 requests per second on Mono in Round 11 to 1,822,366 requests per second on ASP.NET Core in Round 13. That’s an approximately 85,900% improvement, and that doesn’t even account for Round 11’s hardware being faster than our new hardware. That is not a typo, it’s 859 times faster! We believe this to be the most significant performance improvement that this project has ever seen.

By delivering cross-platform performance alongside their development toolset, Microsoft has made C# and ASP.NET one of the most interesting web development platforms available. We have a brief message to those developers who have avoided Microsoft’s web stack thinking it’s “slow” or that it’s for Windows only: ASP.NET Core is now wicked sick fast at the fundamentals and is improving in our other tests. Oh, and of course we’re running it on Linux. You may be thinking about the Microsoft of 10 years ago.

The best part, in our opinion, is that Microsoft is making performance a long-term priority. There is room to improve on our other more complex tests such as JSON serialization and Fortunes (which exercises database connectivity, data structures, encoding of unsafe text, and templating). Microsoft is taking on those challenges and will continue to improve the performance of its platform.

Our Plaintext test has historically been a playground for the ultra-fast Netty platform and several lesser-known/exotic platforms. (To be clear, there is nothing wrong with being exotic! We love them too!) Microsoft’s tuning work has brought a mainstream platform into the frontrunners. That achievement stands on its own. We congratulate the Microsoft .NET team for a massive performance improvement and for making ASP.NET Core
a mainstream option that has the performance characteristics of an acutely-tuned fringe platform. It’s like an F1 car that anyone can drive. We should all be so lucky.

 

Is your webapp slow? We can help!?

 

I want to combine the elements of multiple Stream instances into a single Stream. What’s the best way to do this?

This article compares a few different solutions.

Stream.concat(a, b)

The JDK provides Stream.concat(a, b) for concatenating two streams.

void exampleConcatTwo() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> out = Stream.concat(a, b);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
}

What if we have more than two streams?

We could use Stream.concat(a, b) multiple times. With three streams we could write Stream.concat(Stream.concat(a, b), c).

To me that approach is depressing at three streams, and it rapidly gets worse as we add more streams.

Reduce

Alternatively, we can use reduce to perform the multiple incantations of Stream.concat(a, b) for us. The code adapts elegantly to handle any number of input streams.

void exampleReduce() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = Stream.of(a, b, c)
      .reduce(Stream::concat)
      .orElseGet(Stream::empty);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

Be careful using this pattern! Note the warning in the documentation of Stream.concat(a, b):

Use caution when constructing streams from repeated concatenation. Accessing an element of a deeply concatenated stream can result in deep call chains, or even StackOverflowError.

It takes quite a few input streams to trigger this problem, but it is trivial to demonstrate:

void exampleStackOverflow() {
  List<Stream<String>> inputs = new AbstractList<Stream<String>>() {
    @Override
    public Stream<String> get(int index) {
      return Stream.of("one", "two");
    }

    @Override
    public int size() {
      return 1_000_000; // try changing this number
    }
  };
  Stream<String> out = inputs.stream()
      .reduce(Stream::concat)
      .orElseGet(Stream::empty);
  long count = out.count(); // probably throws
  System.out.println("count: " + count); // probably never reached
}

On my workstation, this method throws StackOverflowError after several seconds of churning.

What’s going on here?

We can think of the calls to Stream.concat(a, b) as forming a binary tree. At the root is the concatenation of all the input streams. At the leaves are the individual input streams. Let’s look at the trees for up to five input streams as formed by our reduce operation.

Two streams:
concat(a,b)ab
Three streams:
concat(concat(a,b),c)concat(a,b)cab
Four streams:
concat(concat(concat(a,b),c),d)concat(concat(a,b),c)dconcat(a,b)cab
Five streams:
concat(concat(concat(concat(a,b),c),d),e)concat(concat(econcat(a,b),c),d)concat(concat(a,b),c)dconcat(a,b)cab

The trees are perfectly unbalanced! Each additional input stream adds one layer of depth to the tree and one layer of indirection to reach all the other streams. This can have a noticeable negative impact on performance. With enough layers of indirection we’ll see a StackOverflowError.

Balance

If we’re worried that we’ll concatenate a large number of streams and run into the aforementioned problems, we can balance the tree. This is as if we’re optimizing a O(n) algorithm into a O(logn) one. We won’t totally eliminate the possibility of StackOverflowError, and there may be other approaches that perform even better, but this should be quite an improvement over the previous solution.

void exampleBalance() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = concat(a, b, c);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

@SafeVarargs
static <T> Stream<T> concat(Stream<T>... in) {
  return concat(in, 0, in.length);
}

static <T> Stream<T> concat(Stream<T>[] in, int low, int high) {
  switch (high - low) {
    case 0: return Stream.empty();
    case 1: return in[low];
    default:
      int mid = (low + high) >>> 1;
      Stream<T> left = concat(in, low, mid);
      Stream<T> right = concat(in, mid, high);
      return Stream.concat(left, right);
  }
}

Flatmap

There is another way to concatenate streams that is built into the JDK, and it does not involve Stream.concat(a, b) at all. It is flatMap.

void exampleFlatMap() {
  Stream<String> a = Stream.of("one", "two");
  Stream<String> b = Stream.of("three", "four");
  Stream<String> c = Stream.of("five", "six");
  Stream<String> out = Stream.of(a, b, c).flatMap(s -> s);
  out.forEach(System.out::println);
  // Output:
  // one
  // two
  // three
  // four
  // five
  // six
}

This generally outperforms the solutions based on Stream.concat(a, b) when each input stream contains fewer than 32 elements. As we increase the element count past 32, flatMap performs comparatively worse and worse as the element count rises.

flatMap avoids the StackOverflowError issue but it comes with its own set of quirks. For example, it interacts poorly with infinite streams. Calling findAny on the concatenated stream may cause the program to enter an infinite loop, whereas the other solutions would terminate almost immediately.

void exampleInfiniteLoop() {
  Stream<String> a = Stream.generate(() -> "one");
  Stream<String> b = Stream.generate(() -> "two");
  Stream<String> c = Stream.generate(() -> "three");
  Stream<String> out = Stream.of(a, b, c).flatMap(s -> s);
  Optional<String> any = combined.findAny(); // infinite loop
  System.out.println(any); // never reached
}

(The infinite loop is an implementation detail. This could be fixed in the JDK without changing the contract of flatMap.)

Also, flatMap forces its input streams into sequential mode even if they were originally parallel. The outermost concatenated stream can still be made parallel, and we will be able to process elements from distinct input streams in parallel, but the elements of each individual input stream must all be processed sequentially.

Analysis

Let me share a few trends that I’ve noticed when dealing with streams and stream concatenation in general, having written a fair amount of code in Java 8 by now.

  • There have been maybe one dozen cases where I’ve needed to concatenate streams. That’s not all that many, so no matter how good the solution is, it’s not going to have much of an impact for me.
  • In all but one of those one dozen cases, I needed to concatenate exactly two streams, so Stream.concat(a, b) was sufficient.
  • In the remaining case, I needed to concatenate exactly three streams. I was not even close to the point where StackOverflowError would become an issue. Stream.concat(Stream.concat(a, b), c) would have worked just fine, although I went with flatMap because I felt that it was easier to read.
  • I have never needed to concatenate streams in performance-critical sections of code.
  • I use infinite streams very rarely. When I do use them, it is obvious in context that they are infinite. And so concatenating infinite streams together and then asking a question like findAny on the result is just not something that I would be tempted to do. That particular issue with flatMap seems like one that I’ll never come across.
  • I use parallel streams very rarely. I think I’ve only used them twice in production code. It is almost never the case that going parallel improves performance, and even when it might improve performance, it is unlikely that processing them in the singleton ForkJoinPool.commonPool() is how I will want to manage that work. The issue with flatMap forcing the input streams to be sequential seems very unlikely to be a real problem for me.
  • Let’s suppose that I do want to concatenate parallel streams and have them processed in parallel. If I have eight input streams on an eight core machine, and each stream has roughly the same number of elements, the fact that flatMap forces the individual streams to be sequential will not degrade performance for me at all. All eight cores will be fully utilized, each core processing one of the eight input streams. If I have seven input streams on that same machine, I will see only slightly degraded performance. With six, slightly more degraded, and so on.

What’s the takeaway from all this? Here is my advice:

For two input streams, use:
Stream.concat(a, b)

For more than two input streams, use:
Stream.of(a, b, c, ...).flatMap(s -> s)

That solution is good enough…

Overboard

…but what if we’re not satisfied with “good enough”? What if we want a solution that’s really fast no matter the size and shape of the input and doesn’t have any of the quirks of the other solutions?

It is a bit much to inline in a blog article, so take a look at StreamConcatenation.java for the source code.

This implementation is similar to Stream.concat(a, b) in that it uses a custom Spliterator, except this implementation handles any number of input streams.

It performs quite well. It does not outperform every other solution in every scenario (flatMap is generally better for very small input streams), but it never performs much worse and it scales nicely with the number and size of the input streams.

Benchmark

I wrote a JMH benchmark to compare the four solutions discussed in this article. The benchmark uses each solution to concatenate a variable number of input streams with a variable number of elements per stream, then iterates over the elements of the concatenated stream. Here is the raw JMH output from my workstation and a prettier visualization of the benchmark results.

 

Do you questions about Java? Or do you have an idea for a website or app? Either way, we can help!

 

Mangling JSON numbers

July 5, 2016

Alan Laser

If we have a long (64-bit integer) that we serialize into JSON, we might be in trouble if JavaScript consumes that JSON. JavaScript has the equivalent of double (64-bit floating point) for its numbers, and double cannot represent the same set of numbers as long. If we are not careful, our long is mangled in transit.

Consider 253 + 1. We can store that number in a long but not a double. Above 253, double does not have the bits required to represent every integer, creating gaps between the integers it can represent. 253 + 1 is the first integer to fall in one of these gaps. We can store 253 or 253 + 2 in a double, but 253 + 1 does not fit.

If we store 253 + 1 in a long and that number is meant to be precise, then we should avoid encoding it as a JSON number and sending it to a JavaScript client. The instant that client invokes JSON.parse they are doomed — they see a different number.

The JSON format does not mandate a particular number precision, but the application code on either side usually does. See also: Re: [Json] Limitations on number size?

This problem only occurs with very large numbers. Perhaps all the numbers we use are safe. Are we actually mangling our numbers? Probably not…

…but will we know? Will anything blow up, or will our application be silently, subtly wrong?

I suspect that when this problem does occur, it goes undetected for longer than it should. In the remainder of this article, we examine potential improvements to our handling of long.

Failing fast

We can change the way we serialize long into JSON.

When we encounter a long, we can require that the number fits into a double without losing information. If no information would be lost, we serialize the long as usual and move on. If information would be lost, we throw an exception and cause serialization to fail. We detonate immediately at the source of the error rather than letting it propagate around, doing who knows what.

Here is a utility method that can be used for this purpose:


public static void verifyLongFitsInDouble(long x) {
  double result = x;
  if (x != (long) result || x == Long.MAX_VALUE) {
    throw new IllegalArgumentException("Overflow: " + x);
  }
}

This approach appeals to me because it is unobtrusive. The check can be made in one central location, no changes to our view classes or client-side code are required, and it only throws exceptions in the specific cases where our default behavior is wrong.

A number that should be safe

Consider the number 262, which spelled out in base ten is 4611686018427387904. This number fits in both a long and a double. It passes our verifyLongFitsInDouble check. Theoretically we can send it from a Java server to a JavaScript client via JSON and both sides see exactly the same number.

To convince ourselves that this number is safe, we examine various representations of this number in Java and JavaScript:


// In Java
long x = 1L << 62;
System.out.println(Long.toString(x));    // 4611686018427387904
System.out.println(Double.toString(x));  // 4.6116860184273879E18
// 100000000000000000000000000000000000000000000000000000000000000
System.out.println(Long.toString(x, 2)); 
  
// In JavaScript
var x = Math.pow(2, 62);
console.log(x.toString());               // 4611686018427388000
console.log(x.toExponential());          // 4.611686018427388e+18
console.log(x.toFixed());                // 4611686018427387904
// 100000000000000000000000000000000000000000000000000000000000000
console.log(x.toString(2));

The output of x.toString() in JavaScript is suspicious. Do we really have the right number? We do, but we print it lazily.

x.toString() is similar in spirit to x.toExponential() and Double.toString(double) from Java. These algorithms essentially print significant digits, from most significant to least, until the output is unambiguously closer to this floating point number than any other floating point number. (And that is true here. The next lowest floating point number is 262 - 512, the next highest is 262 + 1024, and 4611686018427388000 is closer to 262 than either of those two nearby numbers.) See also: ES6 specification for ToString(Number)

x.toFixed() and the base two string give us more confidence that we have the correct number.

Verifying our assumptions with code

If 262 really is a safe number, we should be able to send it from the server to the client and back again. To verify that this number survives a round trip, we create an HTTP server with two kinds of endpoints:

  • GET endpoints that serialize a Java object into a JSON string like {"x":number}, where the number is a known constant (262). The number and the JSON string are printed to stdout. The response is that JSON string.
  • POST endpoints that deserialize a client-provided JSON string like {"x":number} into a Java object. The number and JSON string are printed to stdout. We hope that the number printed here is the same as the known constant (262) used in our GET endpoints.

Any server-side web framework or HTTP server will do. We happen to use JAX-RS in our example code.

Behavior may differ between JSON (de)serialization libraries, so we test two:

In total the server provides four endpoints, each named after the JSON serialization library used by that endpoint:


GET   /gson
POST  /gson
GET   /jackson
POST  /jackson

In the JavaScript client, we:

  • Loop through each library-specific pair of GET/POST endpoints.
  • Make a request to the GET endpoint.
  • Use JSON.parse to deserialize the response text (a JSON string) into a JavaScript object.
  • Use JSON.stringify to serialize that JavaScript object back into a JSON string.
  • Print each of the following to the console:
    • the incoming JSON string
    • the number contained in the JavaScript object, using x.toString()
    • the number contained in the JavaScript object, using x.toFixed()
    • the outgoing JSON string
  • Make a request to the POST endpoint, providing the (re)serialized JSON string as the request body.

Here is the server-side Java code:


package test;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;

import javax.ws.rs.Consumes;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import java.io.IOException;

@Path("/")
public final class JsonResource {

  public static final class Payload {
    public long x;
  }

  private static final long EXPECTED_NUMBER = 1L << 62;

  @GET
  @Path("gson")
  @Produces("application/json")
  public String getGson() {
    Payload object = new Payload();
    object.x = EXPECTED_NUMBER;
    String json = new Gson().toJson(object);
    System.out.println("GET   /gson     outgoing number:  "
        + object.x);
    System.out.println("GET   /gson     outgoing JSON:    " 
        + json);
    return json;
  }

  @POST
  @Path("gson")
  @Consumes("application/json")
  public void postGson(String json) {
    Payload object = new Gson().fromJson(json, Payload.class);
    System.out.println("POST  /gson     incoming JSON:    " 
        + json);
    System.out.println("POST  /gson     incoming number:  "
        + object.x);
  }

  @GET
  @Path("jackson")
  @Produces("application/json")
  public String getJackson() throws IOException {
    Payload object = new Payload();
    object.x = EXPECTED_NUMBER;
    String json = new ObjectMapper().writeValueAsString(object);
    System.out.println("GET   /jackson  outgoing number:  "
        + object.x);
    System.out.println("GET   /jackson  outgoing JSON:    "
        + json);
    return json;
  }

  @POST
  @Path("jackson")
  @Consumes("application/json")
  public void postJackson(String json) throws IOException {
    Payload object = new ObjectMapper().readValue(json, Payload.class);
    System.out.println("POST  /jackson  incoming JSON:    "
        + json);
    System.out.println("POST  /jackson  incoming number:  "
        + object.x);
  }
}

Here is the client-side JavaScript code:


[ "/gson", "/jackson" ].forEach(function(endpoint) {
  function handleResponse() {
    var incomingJson = this.responseText;
    var object = JSON.parse(incomingJson);
    var outgoingJson = JSON.stringify(object);
    console.log(endpoint + " incoming JSON: " + incomingJson);
    console.log(endpoint + " number toString: " + object.x);
    console.log(endpoint + " number toFixed: " + object.x.toFixed());
    console.log(endpoint + " outgoing JSON: " + outgoingJson);
    var post = new XMLHttpRequest();
    post.open("POST", endpoint);
    post.setRequestHeader("Content-Type", "application/json");
    post.send(outgoingJson);
  };
  var get = new XMLHttpRequest();
  get.addEventListener("load", handleResponse);
  get.open("GET", endpoint);
  get.send();
});

The results are disappointing

Here is the server-side output:


GET   /gson     outgoing number:  4611686018427387904
GET   /gson     outgoing JSON:    {"x":4611686018427387904}
POST  /gson     incoming JSON:    {"x":4611686018427388000}
POST  /gson     incoming number:  4611686018427388000
GET   /jackson  outgoing number:  4611686018427387904
GET   /jackson  outgoing JSON:    {"x":4611686018427387904}
POST  /jackson  incoming JSON:    {"x":4611686018427388000}
POST  /jackson  incoming number:  4611686018427388000

Here is the client-side output:


/gson incoming JSON: {"x":4611686018427387904}
/gson number toString: 4611686018427388000
/gson number toFixed: 4611686018427387904
/gson outgoing JSON: {"x":4611686018427388000}
/jackson incoming JSON: {"x":4611686018427387904}
/jackson number toString: 4611686018427388000
/jackson number toFixed: 4611686018427387904
/jackson outgoing JSON: {"x":4611686018427388000}

Both of our POST endpoints print the wrong number. Yuck!

We do send the correct number to JavaScript, which we can verify by looking at the output of x.toFixed() in the console. Something bad happens between when we print x.toFixed() and when we print the number out on the server.

Why is our code wrong?

Maybe there is a particular line of our own code where we can point our finger and say, “Aha! You are wrong!” Maybe it is an issue with our architecture.

There are many ways we could choose to address this problem (or not), and what follows is certainly not an exhaustive list.

“We call JSON.parse then JSON.stringify. We should echo back the original JSON string.”

This avoids the problem but is nothing like a real application. The test code is standing in for an application that gets the payload object from the server, uses it as an object throughout, then later/maybe makes a request back to the server containing some or all of the data from that object.

In practice, most applications will not even see the JSON.parse call. The call will be hidden. The front-end framework will do it, $.getJSON will do it, etc.

“We use JSON.stringify. We should write an alternative to JSON.stringify that produces an exact representation of our number.”

JSON.stringify delegates to x.toString(). If we never use JSON.stringify, and instead we use something like x.toFixed() to print numbers like this, we can avoid this problem.

This is probably infeasible in practice.

If we need to produce JSON from JavaScript, of course we expect that JSON.stringify will be involved. As with JSON.parse, most calls happen at a distance in a library rather than our own application code.

Besides, if we really plan to avoid x.toString(), we must do so everywhere. This is hopeless.

Suppose we commit to avoiding x.toString() and we have user objects that each have a numeric id field. We can no longer write Mustache or Handlebars templates like this:


<div id="user{{id}}">    {{! functionally wrong }}
  <p>ID: {{id}}</p>      {{! visually wrong }}
  <p>Name: {{name}}</p>
</div>

We can no longer write functions like this:


function updateEmailAddress(user, newEmail) {
  // Oops, we failed for user #2^62!
  var url = "/user/" + user.id + "/email";

  // Tries to update the wrong user (and fails, hopefully)
  $.post(url, { email: newEmail });
}

It is extremely unlikely that we will remember to avoid x.toString() everywhere. It is much more likely that we will forget and end up with incorrect behavior all over the place.

“We treat the number as a long literal in the POST handlers. We should treat the number as a double literal.”

If we parse the number as a double and cast it to a long, we produce the correct result in all test cases.

Such a cast should be guarded with a check similar to our verifyLongFitsInDouble(long) code from earlier. Here is a utility method that can be used for this purpose:


public static void verifyDoubleFitsInLong(double x) {
  long result = (long) x;
  if (Double.compare(x, result) != 0 || result == Long.MAX_VALUE) {
    throw new IllegalArgumentException("Overflow: " + x);
  }
}

What if the client really does mean to send us precisely the integer 4611686018427388000? If we parse it as a double then cast it to a long, we mangle the intended number!

Here it is worth considering who we actually talk to as we design our APIs. If we only talk to JavaScript clients, then we only receive numbers that fit in double because that is all our clients have. Often times these APIs are internal and the API authors are the same as the client code authors. It is reasonable in cases like that to make assumptions about who is calling us, even if technically some other caller could use our API, because we make no claim to support other callers.

If our API is designed to be public and usable by any client, we should document our behavior with respect to number precision. verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) are tricky to communicate, so we may prefer a simpler rule…

“We permit some values of long outside of the range -253 < x < 253. We should reject values outside of that range even when they fit in double.”

In other words, perform a bounds check on every long number that we (de)serialize. If the absolute value of that number is less than 253 then we (de)serialize that number as usual, otherwise we throw an exception.

JavaScript clients may find this range familiar, with built-in constants to express its bounds: Number.MIN_SAFE_INTEGER and Number.MAX_SAFE_INTEGER.

This approach is less permissive than our verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) utility methods from earlier. Those methods permit every number in this range and then more. Those methods permit numbers whose adjacent values are invalid, meaning the range of valid inputs is not contiguous.

Advantages of the less permissive approach include:

  • It is easier to express in documentation. verifyLongFitsInDouble(long) and verifyDoubleFitsInLong(double) would permit 255 + 8 but not 255 + 4. Understanding the reason for that is more difficult than understanding that neither of those numbers are permitted with the |x| < 253 approach.
  • If we are actually serializing numbers like 255 + 8, it is likely that we are trying serialize nearby numbers that cannot be stored in double. Permitting the extra numbers may only mask the underlying problem: this data should not be serialized into JSON numbers.

“We encode a long as a JSON number. We should encode it as a JSON string.”

Encoding the number as a string avoids this problem.

Twitter provides string representations of its numeric ids for this reason.

This is easy to accomplish on the server. JSON serialization libraries provide a way to adopt this convention without changing the field types of our Java classes. Our Payload class keeps using long for its field, but any time the server serializes that field into JSON, it surrounds the numeric literal with quotation marks.

How viable is this approach for the client? If the number is only being used as an identifier—passed between functions as-is, compared using the === operator, used as a key in maps—then treating it as a string makes a lot of sense. If we are lucky, the client-side code is identical between the string-using and number-using versions.

If the number is used in arithmetic or passed to libraries that expect numbers, then this solution becomes less practical.

“We use JSON as the serialization format. We should use some other serialization format.”

The JSON format is not to blame for our problems, but it allows us to be sloppy.

When we use JSON we lose information about our numbers. We do not lose the values of the numbers, but we do lose the types, which tell us the precision.

A different serialization format such as Protobuf might have forced us to clarify how precise our numbers are.

“There is no problem.”

We could declare that there is no problem. Our code breaks when provided with obscenely large numbers as input, but we simply do not use numbers that large and we never will. And even though our numbers are never this large, we still want to use long in the Java code because that is convenient for us. Other Java libraries produce or consume long numbers, and we want to use those libraries without casting.

I suspect this is the solution that most people choose (conscious of that choice or not), and it is often not a bad solution. We really do not encounter this problem most of the time. There are other problems we could spend our time solving.

Numbers smaller in magnitude than 253 do not trigger this problem. Where are our long numbers coming from, and how likely are they to fall outside that range?

Auto-incrementing primary keys in a SQL database
Will we insert more than 9,007,199,254,740,992 rows into one table? Knowing nothing at all about our theoretical application, I will venture a guess: “No.”
Epoch millisecond timestamps
253 milliseconds has us covered for ±300,000 years, roughly. Are we dealing with dates outside of that range? If we are, perhaps epoch milliseconds are a poor choice for units and we should solve that problem with our units first.
Randomly-generated, unbounded long numbers
The majority of these do not fit in double. If we send these to JavaScript via JSON numbers, we will have a bad time. Are we actually doing that?
User-provided, unbounded long numbers
Most of these numbers should not trigger problems, but some will. The solution may be to add bounds checking on input, filtering out misbehaving numbers before they are used.

No matter what solution (or non-solution) we choose, we should make our choice deliberately. Being oblivious is not the answer.

Performance competition is a good thing

February 24, 2016

Nate Brady

We love this!

If you’ve not been watching the ASP.NET team’s community standups, you have missed some surprisingly transparent, interesting, and oftentimes funny updates from a major web application framework development team. Every time I watch one I marvel, this is Microsoft? Clearly we’re seeing a new Microsoft.

If you’re not watching, you would have also missed how much emphasis the ASP.NET team is putting on performance.

Recently, they reached 1.15 million plaintext requests per second from ASP.NET Core within their test environment. As Ben Adams writes in his article detailing the achievement, that is 23 times better than prior to the start of optimization work within the framework!

Big congratulations are in order. Not only for the specific achievement, or for the continued performance tuning to come, but also for what it represents: a concerted effort to make the platform provide application developers as much performance headroom as possible.

As we discussed in our previous entry, a high performance platform/framework gives application developers the freedom to build more quickly by deferring performance tuning within the application’s domain. We’ll have more to say on that in the future.

For the time being, we wanted to join the celebration of the ASP.NET team and toot our own horn a bit. We’re proud of this from our own point of view because the Framework Benchmarks project inspired Microsoft’s team to focus on performance. They could have dismissed it as unimportant, but instead they saw the value and attacked performance with conviction.

We started the Framework Benchmarks project to collect data about performance. As the project has matured, we’ve realized it has a new, perhaps even more important reason for being: encouraging both application developers and framework creators to think more about performance. We have been absolutely floored by how many GitHub-hosted projects aim to join or improve their positioning within the project. Or even win outright, though just aiming for the high-performance tier is a reasonable goal for us mere mortals.

We deeply feel that competition of this sort is a good thing. It helps motivate performance improvement across the web framework ecosystem and that in turn improves performance of thousands of web applications. It makes us tremendously happy to see so many people striving to build the best performance into their frameworks and platforms.

So congratulations to the ASP.NET team for giving performance such attention. And to everyone else doing the same in their respective frameworks. And thank you to everyone who has and continues to participate in the benchmarks project. May all your requests be fulfilled in mere milliseconds!

“It was running fine…”

In our performance consulting work, we often hear variations of the following: “Our web application was running fine with a few hundred users. Now when we run a promotion with our new partner and get a thousand users coming in at one time, it grinds to a halt.”

We’ve heard this from startup founders, product managers, development team leads, CTOs, and others who see their product gaining traction, but simultaneously see performance falling off a cliff. Often this situation is characterized as a “good problem to have” until you’re the technical person who needs to solve the problem—and quickly. User experience is suffering and it’s the worst possible time with the product taking off.

We don’t know just how often this occurs, but judging from the calls we get there are lots of anecdotal examples. So, why does this happen? Well, there are a number of technical reasons for applications suffering performance issues. Too often though, it’s the result of performance needs not being properly assessed at the start of work. The “good problem to have” mentality led to a collective blind eye to performance along the way.

Our goal in this article is to paint a realistic picture of how to think about performance early in the life of an application.

What do we mean by performance?

Web application performance is a broad discipline. Here, when we speak of performance, we are specifically referring to the speed of the application in providing responses to requests. This article is focused on the server side, but we encourage you to also look at your application holistically.

Speaking of the server, high performance encompasses the following attributes:

  • Low latency. A high-performance application will minimize latency and provide responses to user actions very quickly. Generally speaking, the quicker the better: ~0ms is superb; 100ms is good; 500ms is okay; a few seconds is bad; and several seconds may be approaching awful.
  • High throughput. A high-performance application will be able to service a large number of requests simultaneously, staying ahead of the incoming demand so that any short-term queuing of requests remains short-term. Commonly, high-throughput and low-latency go hand-in-hand.
  • Performing well across a spectrum of use-cases. A high-performance application functions well with a wide variety of request types (some cheap, some expensive). A well-performing application will not slow, block, or otherwise frustrate the fulfillment of requests based on other users’ concurrent activities. Conversely, a low-performance application might have an architectural bottleneck through which many request types flow (e.g., a single-threaded search engine).
  • Generally respecting the value of users’ time. Overall, performance is in the service of user expectations. A user knows low performance when they see it, and unfortunately, they won’t usually tell you if the performance doesn’t meet their expectations; they’ll just leave.
  • Scales with concurrency. A high-performance application provides sufficient capacity to handle a reasonable amount of concurrent usage. Handling 1 user is nothing; 5 concurrently is easy; 500 is good; 50,000 is hard.
  • Scales with data size. In many applications, network effects mean that as usage grows, so does the amount of “hot” data in play. A high-performance application is designed to perform well with a reasonable amount of hot data. A match-making system with 200 entities is trivial; 20,000 entities is good; 2,000,000 entities is hard.
  • Performs modestly complex calculations without requiring complex architecture. A high-performance application (notably, one based on a high-performance platform) will be capable of performing modestly complex algorithmic work on behalf of users without necessarily requiring the construction of a complex system architecture. (Though to be clear, no platform is infinitely performant; so at some point an algorithm’s complexity will require more complex architecture.)

What are your application’s performance needs?

It helps to start by determining whether your application needs, or will ever need to consider performance more than superficially.
We routinely see three situations with respect to application performance needs:

  • Applications with known high-performance needs. These are applications that, for example, expect to see large data or complex algorithmic work from day one. Examples would be applications that match users based on non-trivial math, or make recommendations based on some amount of analysis of past and present behavior, or process large documents on the users’ behalf. You should go through each of the aspects in the previous section to consider what the performance characteristics are of your application.
  • Applications with known low-performance needs. Some applications are built for the exclusive use of a small number of users with a known volume of data. In these scenarios, the needed capacity can be calculated fairly simply and isn’t expected to change during the application’s usable lifetime. Examples would be intranet or special-purpose B2B applications.
  • Applications with as-yet unknown performance needs. We find a lot of people don’t know really how their application will be used or how many people will be using it. Either they haven’t put a lot of thought into performance matters, the complexity of key algorithms isn’t yet known, or the business hasn’t yet explored user interest levels.

If your application is in the known low-performance tier, the only advantage of high-performance foundation technologies (all else being equal) would be reducing long term hosting costs. Congratulations, you can stop reading here!

But for the rest of us, those of us lucky/unlucky enough to know we need high-performance, or are uncertain whether performance is a concern, and for whom I write the remainder of this article, performance should be on our minds early. Either we know performance is important, or we don’t know if performance is important but want to plan to avoid unnecessary pain if it turns out to be important. In either case, it is in our best interest to spend a modest amount of time and thought to plan accordingly.

Sometimes we will hear retrospectives suggesting an application had been working acceptably for a few months but eventually bogged down. Therefore, for new projects with unknown performance needs, we often advise to select technology platforms that do not constrain your optimization efforts unnecessarily if and when performance does become a problem. Give yourself the luxury to scale without replatforming by working in your application’s domain.

Performance in your technology selection process

Many of us at TechEmpower are interested in performance and how it affects user experience. Personal anecdotes of frustration with slow-responding applications are legion. Meanwhile, Amazon famously claimed 1% fewer sales for each additional 100ms of latency. And retail sites are not alone; all sites will silently lose visitors who are made to wait. Todd Hoff’s comprehensive 2009 blog entry on the costs of latency in web applications is still relevant today.

TechEmpower Framework Benchmarks

Framework Benchmarks

We created our Web Framework Benchmarks project because we’ve run into situations where well known frameworks seem to cause significant performance pain for the applications built upon them. The nature of our business sees us working with a broad spectrum of technologies and we wanted to know roughly what to expect, performance wise, from each.

High-performance and high-scalability technologies are so numerous that understanding the landscape can feel overwhelming. In our Framework Benchmarks project, as of this writing, we have about 300 permutations covering 145 frameworks, and a huge universe of possible permutations we don’t presently cover. With so many noteworthy options, and with the resulting performance data at hand, we feel quite strongly:

Performance should be part of your technology selection process. There are two important elements to this statement: should and part of.

  1. Performance should be considered because it affects user experience both in responsiveness and in resilience to expected and unexpected load. Early high-performance capacity affords you the luxury of deferring more complicated architectural decisions until later in your project’s lifespan. The data from our Framework Benchmarks project suggests that the pain of early performance problems can be avoided either in whole or in part by making informed framework and platform decisions.
  2. On the flip side, performance should be just part of your selection process. Performance is important, but it’s definitely not everything. Several other aspects (which we will review in more detail in an upcoming post) may be equally or more important to your selection process. Do not make decisions based exclusively on a technology having “won” a performance comparison such as ours.

“A Good Problem to Have.” Really?

A common refrain is to characterize problems that surface when your business is ramping up as good problems to have because their existence tacitly confirms your success. Though the euphemism is well-meaning, when your team is stressed out coping with “good problem” fires, you should forgive them when they don’t quite share the sentiment.

Unfortunately, if you’re only starting to think about performance after experiencing a problem, the resolution options are fewer because you have already made selections and accrued performance technical debt—call it performance debt if you will. We often hear, “Had I known it would be used this way, I would have selected something higher performance.”

While the industry has a bunch of survivor stories suggesting projects built on low-performance technology can and do succeed, not all teams find themselves with the luxury to re-platform their products to resolve performance problems, as re-platforming is a costly and painful proposition.

We therefore routinely advise leads of new projects to consider performance early. Use the performance characteristics of the languages, libraries, and frameworks you are considering as a filter during the architectural process.

Bandages

Some may have residual doubt because, when performance problems come up, so goes the line of thinking—you’ll just put a reverse proxy like Varnish in front of your application and call it a day.

Reverse proxies are terrific for what they do: caching and serving static or semi-static content. If your web application is a blog or a news site, a reverse proxy can act as a bandage to cover a wide set of performance problems inherent in the back-end system. This article is not intended to attack any particular technologies, but an obvious example is WordPress. Reverse-proxies and other front-end caching mechanisms are commonplace in WordPress deployments.

However, most web applications are not blogs or news sites. Most applications deal with dynamic information and create personalized responses. And even if your MVP is “basically a news site with a few other bells and whistles,” you should consider where you’re going after MVP. As your functionality grows beyond serving a bunch of static content, using a reverse proxy as a bandage to enable selecting a low-performance platform may end up a regrettable decision.

But to be clear, if you’re building a site where the vast majority of responses can be pre-computed and cached, you can use a reverse proxy paired with a low-performance platform. You can also use a high-performance platform without a reverse proxy. All else being equal, our preference is the architecturally simpler option that avoids adding a reverse proxy as another potential point of failure.

It’s reasonable to consider the future

Performance is not the be-all, end-all. However, technology evolution has given modern developers a wide spectrum of high-performance options, many of which can be highly-productive as well. It is reasonable, and we believe valuable, to consider performance needs early. Your future self will likely thank you.

As we and our collaborators prepare Round 9 of our Framework Benchmarks project, we had an epiphany:

With high-performance software, a single modern server processes over 1 million HTTP requests per second.

Five months ago, Google talked about load-balancing to achieve 1 million requests per second. We understand their excitement is about the performance of their load balancer1. Part of what we do is performance consulting—so we are routinely deep in request-per-second data—and we recognized a million requests per second as an impressive milestone.

But fast-forward to today, where we see the same response rate from a single server. We had been working with virtual servers and our modest workstations for so long that these data were a bit of a surprise.

The mind immediately begins painting a world of utter simplicity, where our applications’ scores of virtual servers are rendered obsolete. Especially poignant is the reduced architectural complexity that an application can reap if its performance requirement can be satisfied by a single server. You probably still want at least two servers for resilience, but even after accounting for resilience, your architectural complexity will likely remain simpler than with hundreds of instances.

Our project’s new hardware

For Round 9 of our benchmarks project, Peak Hosting has generously provided us with a number of Dell R720xd servers each powered by dual Xeon E5-2660 v2 CPUs and 10-gigabit Ethernet. Loaded up with disks, these servers are around $8,000 a piece direct from Dell. Not cheap.

But check out what they can do:

techempower@lg01:~$ wrk -d 30 -c 256 -t 40 http://10.0.3.2:8080/byte
Running 30s test @ http://10.0.3.2:8080/byte
  40 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   247.05us    3.52ms 624.37ms   99.90%
    Req/Sec    27.89k     6.24k   50.22k    71.15%
  31173283 requests in 29.99s, 3.83GB read
  Socket errors: connect 0, read 0, write 0, timeout 9
Requests/sec: 1039305.27
Transfer/sec:    130.83MB

This is output from Wrk testing a single server running Undertow using conditions similar to Google’s test (1-byte response body, no HTTP pipelining, no special request headers). 1.039 million requests per second.

Obviously there are myriad variables that make direct comparison to Google’s achievement an impossibility. Nevertheless, achieving a million HTTP requests per second over a network without pipelining to a single server says something about the capacity of modern hardware.

It’s possible even higher numbers would be reported had we tested a purpose-built static web server such as nginx. Undertow is the lightweight Java web application server used in WildFly. It just happens to be quite quick at HTTP. Here’s the code we used for this test:

public class ByteHandler implements HttpHandler {
  private static final String aByte = "a";
  @Override
  public void handleRequest(HttpServerExchange exchange) 
      throws Exception {
    exchange.getResponseHeaders().put(
        Headers.CONTENT_TYPE, TEXT_PLAIN);
    exchange.getResponseSender().send(aByte);
  }
}

In Round 9 (coming soon, we swear!), you’ll be able to see the other test types on Peak’s hardware alongside our i7 workstations and the EC2 instances we’ve tested in all previous rounds. Spoiler: I feel bad for our workstations.

Incidentally, if you think $8,000 is not cheap, you might want to run the monthly numbers on 200 virtual server instances. Yes, on-demand capacity and all the usual upsides of cloud deployments are real. But the simplified system architecture and cost advantage of high-performance options deserve some time in the sun.

1. Not only that, Google expressly said they were not using this exercise to demonstrate the capacity of their instances but rather to showcase their load balancer’s performance. However, the scenario they created achieved massive request-per-second scale by load balancing hundreds of instances. We are simply providing a counter-point that the massive scale achieved by hundreds of instances can be trivially mimicked by a single modern server with modern tools. The capacity of a single server may not be surprising to some, but it may come as a surprise to others.

Framework Benchmarks Round 1

March 28, 2013

Nate Brady

How much does your framework choice affect performance? The answer may surprise you.

Authors’ Note: We’re using the word “framework” loosely to refer to platforms, micro-frameworks, and full-stack frameworks. We have our own personal favorites among these frameworks, but we’ve tried our best to give each a fair shot.

Show me the winners!

We know you’re curious (we were too!) so here is a chart of representative results.

Whoa! Netty, Vert.x, and Java servlets are fast, but we were surprised how much faster they are than Ruby, Django, and friends. Before we did the benchmarks, we were guessing there might be a 4x difference. But a 40x difference between Vert.x and Ruby on Rails is staggering. And let us simply draw the curtain of charity over the Cake PHP results.

If these results were surprising to you, too, then read on so we can share our methodology and other test results. Even better, maybe you can spot a place where we mistakenly hobbled a framework and we can improve the tests. We’ve done our best, but we are not experts in most of them so help is welcome!

Motivation

Among the many factors to consider when choosing a web development framework, raw performance is easy to objectively measure. Application performance can be directly mapped to hosting dollars, and for a start-up company in its infancy, hosting costs can be a pain point. Weak performance can also cause premature scale pain, user experience degradation, and associated penalties levied by search engines.

What if building an application on one framework meant that at the very best your hardware is suitable for one tenth as much load as it would be had you chosen a different framework? The differences aren’t always that extreme, but in some cases, they might be. It’s worth knowing what you’re getting into.

Simulating production environments

For this exercise, we aimed to configure every framework according to the best practices for production deployments gleaned from documentation and popular community opinion. Our goal is to approximate a sensible production deployment as accurately as possible. For each framework, we describe the configuration approach we’ve used and cite the sources recommending that configuration.

We want it all to be as transparent as possible, so we have posted our test suites on GitHub.

Results

We ran each test on EC2 and our i7 hardware. See Environment Details below for more information.

JSON serialization test

First up is plain JSON serialization on Amazon EC2 large instances. This is repeated in the introduction, above.

The high-performance Netty platform takes a commanding lead for JSON serialization on EC2. Since Vert.x is built on Netty, it too achieved full saturation of the CPU cores and impressive numbers. In third place is plain Java Servlets running on Caucho’s Resin Servlet container. Plain Go delivers the best showing for a non-JVM framework.
We expected a fairly wide field, but we were surprised to see results that span four orders of magnitude.

Dedicated hardware

Here is the same test on our Sandy Bridge i7 hardware.

On our dedicated hardware, plain Servlets take the lead with over 210,000 requests per second. Vert.x remains strong but tapers off at higher concurrency levels despite being given eight workers, one for each HT core.

Database access test (single query)

How many requests can be handled per second if each request is fetching a random record from a data store? Starting again with EC2.

For database access tests, we considered dropping Cake to constrain our EC2 costs. This test exercises the database driver and connection pool and illustrates how well each scales with concurrency. Compojure makes a respectable showing but plain Servlets paired with the standard connection pool provided by MySQL is strongest at high concurrency. Gemini is using its built-in connection pool and lightweight ORM.
It’s worth pausing to appreciate that this shows an EC2 Large instance can query a remote MySQL instance at least 8,800 times per second, putting aside the additional work of each query being part of an HTTP request and response cycle.

Dedicated hardware

The dedicated hardware impresses us with its ability to process nearly 100,000 requests per second with one query per request. JVM frameworks are especially strong here thanks to JDBC and efficient connection pools. In this test, we suspect Vert.x is being hit very hard by its connectivity to MongoDB. We are especially interested in community feedback related to tuning these MongoDB numbers.

Database access test (multiple queries)

The following tests are all run at 256 concurrency and vary the number of database queries per request. The tests are 1, 5, 10, 15, and 20 queries per request. The 1-query samples, leftmost on the line charts, should be similar (within sampling error) of the single-query test above.

As expected, as we increase the number of queries per request, the lines converge to zero. However, looking at the 20-queries bar chart, roughly the same ranked order we’ve seen elsewhere is still in play, demonstrating the headroom afforded by higher-performance frameworks.
We were surprised by the performance of Raw PHP in this test. We suspect the PHP MySQL driver and connection pool are particularly well tuned. However, the penalty for using an ORM on PHP is severe.

Dedicated hardware

The dedicated hardware produces numbers nearly ten times greater than EC2 with the punishing 20 queries per request. Again, Raw PHP makes an extremely strong showing, but PHP with an ORM and Cake—the only PHP framework in our test—are at the opposite end of the spectrum.

How we designed the tests

This exercise aims to provide a “baseline” for performance across the variety of frameworks. By baseline we mean the starting point, from which any real-world application’s performance can only get worse. We aim to know the upper bound being set on an application’s performance per unit of hardware by each platform and framework.

But we also want to exercise some of the frameworks’ components such as its JSON serializer and data-store/database mapping. While each test boils down to a measurement of the number of requests per second that can be processed by a single server, we are exercising a sample of the components provided by modern frameworks, so we believe it’s a reasonable starting point.

For the data-connected test, we’ve deliberately constructed the tests to avoid any framework-provided caching layer. We want this test to require repeated requests to an external service (MySQL or MongoDB, for example) so that we exercise the framework’s data mapping code. Although we expect that the external service is itself caching the small number of rows our test consumes, the framework is not allowed to avoid the network transmission and data mapping portion of the work.

Not all frameworks provide components for all of the tests. For these situations, we attempted to select a popular best-of-breed option.

Each framework was tested using 2^3 to 2^8 (8, 16, 32, 64, 128, and 256) request concurrency. On EC2, WeigHTTP was configured to use two threads (one per core) and on our i7 hardware, it was configured to use eight threads (one per HT core). For each test, WeigHTTP simulated 100,000 HTTP requests with keep-alives enabled.

For each test, the framework was warmed up by running a full test prior to capturing performance numbers.

Finally, for each framework, we collected the framework’s best performance across the various concurrency levels for plotting as peak bar charts.

We used two machines for all tests, configured in the following roles:

  • Application server. This machine is responsible for hosting the web application exclusively. Note, however, that when community best practices specified use of a web server in front of the application container, we had the web server installed on the same machine.
  • Load client and database server. This machine is responsible for generating HTTP traffic to the application server using WeigHTTP and also for hosting the database server. In all of our tests, the database server (MySQL or MongoDB) used very little CPU time; and WeigHTTP was not starved of CPU resource. In the database tests, the network was being used to provide result sets to the application server and to provide HTTP responses in the opposite direction. However, even with the quickest frameworks, network utilization was lower in database tests than in the plain JSON tests, so this is unlikely to be a concern.

Ultimately, a three-machine configuration would dismiss the concern of double-duty for the second machine. However, we doubt that the results would be noticeably different.

The Tests

We ran three types of tests. Not all tests were run for all frameworks. See details below.

JSON serialization

For this test, each framework simply responds with the following object, encoded using the framework’s JSON serializer.

{"message" : "Hello, World!"}

With the content type set to application/json. If the framework provides no JSON serializer, a best-of-breed for the platform is selected. For example, on the Java platform, Jackson was used for frameworks that do not provide a serializer.

Database access (single query)

In this test, we use the ORM of choice for each framework to grab one simple object selected at random from a table containing 10,000 rows. We use the same JSON serialization tested earlier to serialize that object as JSON. Caveat: when the data store provides data as JSON in situ (such as with the MongoDB tests), no transcoding is done; the string of JSON is sent as-is.

As with JSON serialization, we’ve selected a best-of-breed ORM when the framework is agnostic. For example, we used Sequelize for the JavaScript MySQL tests.

We tested with MySQL for most frameworks, but where MongoDB is more conventional (as with node.js), we tested that instead or in addition. We also did some spot tests with PostgreSQL but have not yet captured any of those results in this effort. Preliminary results showed RPS performance about 25% lower than with MySQL. Since PostgreSQL is considered favorable from a durability perspective, we plan to include more PostgreSQL testing in the future.

Database access (multiple queries)

This test repeats the work of the single-query test with an adjustable queries-per-request parameter. Tests are run at 5, 10, 15, and 20 queries per request. Each query selects a random row from the same table exercised in the previous test with the resulting array then serialized to JSON as a response.

This test is intended to illustrate how all frameworks inevitably will converge to zero requests per second as the complexity of each request increases. Admittedly, especially at 20 queries per request, this particular test is unnaturally database heavy compared to real-world applications. Only grossly inefficient applications or uncommonly complex requests would make that many database queries per request.

Environment Details

Hardware
  • Two Intel Sandy Bridge Core i7-2600K workstations with
  • 8 GB memory each (early 2011 vintage) for the i7 tests
  • Two Amazon EC2 m1.large instances for the EC2 tests
  • Switched gigabit Ethernet
Load simulator
Databases
Ruby
JavaScript
PHP
Operating system
Web servers
Python
Go
Java / JVM

Notes

  • For the database tests, any framework with the suffix “raw” in its name is using its platform’s raw database connectivity without an object-relational map (ORM) of any flavor. For example, servlet-raw is using raw JDBC. All frameworks without the “raw” suffix in their name are using either the framework-provided ORM or a best-of-breed for the platform (e.g., ActiveRecord).

Code examples

You can find the full source code for all of the tests on Github. Below are the relevant portions of the code to fetch a configurable number of random database records, serialize the list of records as JSON, and then send the JSON as an HTTP response.

Cake

View on Github


public function index() {
    $query_count = $this->request->query('queries');
    if ($query_count == null) {
        $query_count = 1;
    }
    $arr = array();
    for ($i = 0; $i < $query_count; $i++) { $id = mt_rand(1, 10000); $world = $this->World->find('first', array('conditions' =>
            array('id' => $id)));
        $arr[] = array("id" => $world['World']['id'], "randomNumber" =>
            $world['World']['randomNumber']);
    }
    $this->set('worlds', $arr);
    $this->set('_serialize', array('worlds'));
}
        

Compojure

View on Github


(defn get-world []
  (let [id (inc (rand-int 9999))] ; Num between 1 and 10,000
    (select world
            (fields :id :randomNumber)
            (where {:id id }))))

(defn run-queries [queries]
  (vec ; Return as a vector
    (flatten ; Make it a list of maps
      (take
        queries ; Number of queries to run
        (repeatedly get

Django

View on Github

def db(request):
    queries = int(request.GET.get('queries', 1))
    worlds = []
    for i in range(queries):
        worlds.append(World.objects.get(id=random.randint(1, 10000)))
    return HttpResponse(serializers.serialize("json", worlds), mimetype="application/json")

Express

View on Github


app.get('/mongoose', function(req, res) {
    var queries = req.query.queries || 1,
        worlds = [],
        queryFunctions = [];

    for (var i = 1; i <= queries; i++ ) {
        queryFunctions.push(function(callback) {
            MWorld.findOne({ id: (Math.floor(Math.random() * 10000) + 1 )})
            .exec(function (err, world) {
                worlds.push(world);
                callback(null, 'success');
            });
        });
    }

    async.parallel(queryFunctions, function(err, results) {
        res.send(worlds);
    });
});

Gemini

View on Github


@PathSegment
public boolean db() {
    final Random random = ThreadLocalRandom.current();
    final int queries = context().getInt("queries", 1, 1, 500);
    final World[] worlds = new World[queries];
    for (int i = 0; i < queries; i++) {
        worlds[i] = store.get(World.class, random.nextInt(DB_ROWS) + 1);
    }
    return json(worlds);
}

Grails

View on Github

def db() {
    def random = ThreadLocalRandom.current()
    def queries = params.queries ? params.int('queries') : 1
    def worlds = []

    for (int i = 0; i < queries; i++) {
        worlds.add(World.read(random.nextInt(10000) + 1))
    }

    render worlds as JSON
}

Node.js

View on Github


if (path === '/mongoose') {
    var queries = 1;
    var worlds = [];
    var queryFunctions = [];
    var values = url.parse(req.url, true);

    if (values.query.queries) {
        queries = values.query.queries;
    }
    res.writeHead(200, {'Content-Type': 'application/json; charset=UTF-8'});

    for (var i = 1; i <= queries; i++) {
        queryFunctions.push(function(callback) {
            MWorld.findOne({ id: (Math.floor(Math.random() * 10000) + 1 )})
                .exec(function (err, world) {
                    worlds.push(world);
                    callback(null, 'success');
                });
        });
    }

    async.parallel(queryFunctions, function(err, results) {
        res.end(JSON.stringify(worlds));

PHP (Raw)

View on Github


$query_count = 1;
if (!empty($_GET)) {
    $query_count = $_GET["queries"];
}
$arr = array();
$statement = $pdo->prepare("SELECT * FROM World WHERE id = :id");
for ($i = 0; $i < $query_count; $i++) { $id = mt_rand(1, 10000); $statement->bindValue(':id', $id, PDO::PARAM_INT);
    $statement->execute();
    $row = $statement->fetch(PDO::FETCH_ASSOC);
    $arr[] = array("id" => $id, "randomNumber" => $row['randomNumber']);
}
echo json_encode($arr);

PHP (ORM)

View on Github


$query_count = 1;
if (!empty($_GET)) {
    $query_count = $_GET["queries"];
}
$arr = array();
for ($i = 0; $i < $query_count; $i++) { $id = mt_rand(1, 10000); $world = World::find_by_id($id); $arr[] = $world->to_json();
}
echo json_encode($arr);

	

Play

View on Github

public static Result db(Integer queries) {
    final Random random = ThreadLocalRandom.current();
    final World[] worlds = new World[queries];

    for (int i = 0; i < queries; i++) {
        worlds[i] = World.find.byId((long)(random.nextInt(DB_ROWS) + 1));
    }

    return ok(Json.toJson(worlds));
}

Rails

View on Github


def db
  queries = params[:queries] || 1
  results = []

  (1..queries.to_i).each do
    results << World.find(Random.rand(10000) + 1) end render :json => results

Servlet

View on Github


        res.setHeader(HEADER_CONTENT_TYPE, CONTENT_TYPE_JSON);

final DataSource source = mysqlDataSource;
int count = 1;

try {
    count = Integer.parseInt(req.getParameter("queries"));
} catch (NumberFormatException nfexc) {
    // Handle exception
}

final World[] worlds = new World[count];
final Random random = ThreadLocalRandom.current();

try (Connection conn = source.getConnection()) {
    try (PreparedStatement statement = conn.prepareStatement(
            DB_QUERY,
            ResultSet.TYPE_FORWARD_ONLY,
            ResultSet.CONCUR_READ_ONLY)) {

        for (int i = 0; i < count; i++) {
            final int id = random.nextInt(DB_ROWS) + 1;
            statement.setInt(1, id);

            try (ResultSet results = statement.executeQuery()) {
                if (results.next()) {
                    worlds[i] = new World(id, results.getInt("randomNumber"));
                }
            }
        }
    }
} catch (SQLException sqlex) {
    System.err.println("SQL Exception: " + sqlex);
}

try {
    mapper.writeValue(res.getOutputStream(), worlds);
} catch (IOException ioe) {
    }
	

Sinatra

View on Github

get '/db' do
  queries = params[:queries] || 1
  results = []

  (1..queries.to_i).each do
    results << World.find(Random.rand(10000) + 1)
  end

  results.to_json
end

Spring

View on Github


@RequestMapping(value = "/db")
public Object index(HttpServletRequest request,
                    HttpServletResponse response, Integer queries) {

    if (queries == null) {
        queries = 1;
    }

    final World[] worlds = new World[queries];
    final Random random = ThreadLocalRandom.current();
    final Session session = HibernateUtil.getSessionFactory().openSession();

    for(int i = 0; i < queries; i++) {
        worlds[i] = (World)session.byId(World.class).load(random.nextInt(DB_ROWS) + 1);
    }

    session.close();

    try {
        new MappingJackson2HttpMessageConverter().write(
        worlds, MediaType.APPLICATION_JSON,
        new ServletServerHttpResponse(response));
    } catch (IOException e) {
        // Handle exception
    }

    return null;
}
	

Tapestry

View on Github


StreamResponse onActivate() {
    int queries = 1;
    String qString = this.request.getParameter("queries");

    if (qString != null) {
        queries = Integer.parseInt(qString);
    }

    if (queries <= 0) {
        queries = 1;
    }

    final World[] worlds = new World[queries];
    final Random rand = ThreadLocalRandom.current();

    for (int i = 0; i < queries; i++) {
        worlds[i] = (World)session.get(World.class, new Integer(rand.nextInt(DB_ROWS) + 1));
    }

    String response = "";

    try {
        response = HelloDB.mapper.writeValueAsString(worlds);
    } catch (IOException ex) {
        // Handle exception
    }

    return new TextStreamResponse("application/json", response);
}
	

Vert.x

View on Github


private void handleDb(final HttpServerRequest req) {
    int queriesParam = 1;
    try {
        queriesParam = Integer.parseInt(req.params().get("queries"));
    } catch(Exception e) {
    }
    final DbHandler dbh = new DbHandler(req, queriesParam);
    final Random random = ThreadLocalRandom.current();
    for (int i = 0; i < queriesParam; i++) {
        this.getVertx().eventBus().send(
            "hello.persistor",
            new JsonObject()
                .putString("action", "findone")
                .putString("collection", "world")
                .putObject("matcher", new JsonObject().putNumber("id",
                (random.nextInt(10000) + 1))), dbh);
    }
}

class DbHandler implements Handler<Message<JsonObject>> {
    private final HttpServerRequest req;
    private final int queries;
    private final List<Object> worlds = new CopyOnWriteArrayList<>();

    public DbHandler(HttpServerRequest request, int queriesParam) {
        this.req = request;
        this.queries = queriesParam;
    }

    @Override
    public void handle(Message<JsonObject> reply) {
        final JsonObject body = reply.body;

        if ("ok".equals(body.getString("status"))) {
            this.worlds.add(body.getObject("result"));
        }

        if (this.worlds.size() == this.queries) {
            try {
                final String result = mapper.writeValueAsString(worlds);
                final int contentLength = result
                    .getBytes(StandardCharsets.UTF_8).length;
                this.req.response.putHeader("Content-Type",
                    "application/json; charset=UTF-8");
                this.req.response.putHeader("Content-Length", contentLength);
                this.req.response.write(result);
                this.req.response.end();
            } catch (IOException e) {
                req.response.statusCode = 500;
                req.response.end();
            }
        }
    }
}

Wicket

View on Github


protected ResourceResponse newResourceResponse(Attributes attributes) {
    final int queries = attributes.getRequest().getQueryParameters()
        .getParameterValue("queries").toInt(1);
    final World[] worlds = new World[queries];
    final Random random = ThreadLocalRandom.current();
    final ResourceResponse response = new ResourceResponse();
    response.setContentType("application/json");
    response.setWriteCallback(new WriteCallback() {
        public void writeData(Attributes attributes) {
            final Session session = HibernateUtil.getSessionFactory()
                .openSession();
            for (int i = 0; i < queries; i++) {
                worlds[i] = (World)session.byId(World.class)
                    .load(random.nextInt(DB_ROWS) + 1);
            }
            session.close();
            try {
                attributes.getResponse().write(HelloDbResponse.mapper
                    .writeValueAsString(worlds));
            } catch (IOException ex) {
            }
        }
    });
    return response;
}

Expected questions

We expect that you might have a bunch of questions. Here are some that we’re anticipating. But please contact us if you have a question we’re not dealing with here or just want to tell us we’re doing it wrong.

  1. “You configured framework x incorrectly, and that explains the numbers you’re seeing.” Whoops! Please let us know how we can fix it, or submit a Github pull request, so we can get it right.
  2. “Why WeigHTTP?” Although many web performance tests use ApacheBench from Apache to generate HTTP requests, we have opted to use WeigHTTP from the LigHTTP team. ApacheBench remains a single-threaded tool, meaning that for higher-performance test scenarios, ApacheBench itself is a limiting factor. WeigHTTP is essentially a multithreaded clone of ApacheBench. If you have a recommendation for an even better benchmarking tool, please let us know.
  3. “Doesn’t benchmarking on Amazon EC2 invalidate the results?” Our opinion is that doing so confirms precisely what we’re trying to test: performance of web applications within realistic production environments. Selecting EC2 as a platform also allows the tests to be readily verified by anyone interested in doing so. However, we’ve also executed tests on our Core i7 (Sandy Bridge) workstations running Ubuntu 12.04 as a non-virtualized sanity check. Doing so confirmed our suspicion that the ranked order and relative performance across frameworks is mostly consistent between EC2 and physical hardware. That is, while the EC2 instances were slower than the physical hardware, they were slower by roughly the same proportion across the spectrum of frameworks.
  4. “Why include this Gemini framework I’ve never heard of?” We have included our in-house Java web framework, Gemini, in our tests. We’ve done so because it’s of interest to us. You can consider it a stand-in for any relatively lightweight minimal-locking Java framework. While we’re proud of how it performs among the well-established field, this exercise is not about Gemini. We routinely use other frameworks on client projects and we want this data to inform our recommendations for new projects.
  5. “Why is JRuby performance all over the map?” During the evolution of this project, in some test runs, JRuby would slighly edge out traditional Ruby, and in some cases—with the same test code—the opposite would be true. We also don’t have an explanation for the weak performance of Sinatra on JRuby, which is no better than Rails. Ultimately we’re not sure about the discrepancy. Hopefully an expert in JRuby can help us here.
  6. “Framework X has in-memory caching, why don’t you use that?” In-memory caching, as provided by Gemini and some other frameworks, yields higher performance than repeatedly hitting a database, but isn’t available in all frameworks, so we omitted in-memory caching from these tests.
  7. “What about other caching approaches, then?” Remote-memory or near-memory caching, as provided by Memcached and similar solutions, also improves performance and we would like to conduct future tests simulating a more expensive query operation versus Memcached. However, curiously, in spot tests, some frameworks paired with Memcached were conspicuously slower than other frameworks directly querying the authoritative MySQL database (recognizing, of course, that MySQL had its entire data-set in its own memory cache). For simple “get row ID n” and “get all rows” style fetches, a fast framework paired with MySQL may be faster and easier to work with versus a slow framework paired with Memcached.
  8. “Do all the database tests use connection pooling?” Sadly Django provides no connection pooling and in fact closes and re-opens a connection for every request. All the other tests use pooling.
  9. “What is Resin? Why aren’t you using Tomcat for the Java frameworks?” Resin is a Java application server. The GPL version that we used for our tests is a relatively lightweight Servlet container. Although we recommend Caucho Resin for Java deployments, in our tests, we found Tomcat to be easier to configure. We ultimately dropped Tomcat from our tests because Resin was slightly faster across all frameworks.
  10. “Why don’t you test framework X?” We’d love to, if we can find the time. Even better, craft the test yourself and submit a Github pull request so we can get it in there faster!
  11. “Why doesn’t your test include more substantial algorithmic work, or building an HTML response with a server-side template?” Great suggestion. We hope to in the future!
  12. “Why are you using a (slightly) old version of framework X?” It’s nothing personal! We tried to keep everything fully up-to-date, but with so many frameworks it became a never-ending game of whack-a-mole. If you think an update will affect the results, please let us know (or submit a Github pull request) and we’ll get it updated!

Conclusion

Let go of your technology prejudices.

We think it is important to know about as many good tools as possible to help make the best choices you can. Hopefully we’ve helped with one aspect of that.

Thanks for sticking with us through all of this! We had fun putting these tests together, and experienced some genuine surprises with the results. Hopefully others find it interesting too. Please let us know what you think or submit Github pull requests to help us out.

About TechEmpower

We provide web and mobile application development services and are passionate about application performance. Read more about what we do.

Everything about Java 8

March 27, 2013

Alan Laser

The following post is a comprehensive summary of the developer-facing changes coming in Java 8. As of March 18, 2014, Java 8 is now generally available.

I used preview builds of IntelliJ for my IDE. It had the best support for the Java 8 language features at the time I went looking. You can find those builds here: IntelliJIDEA EAP.

Interface improvements

Interfaces can now define static methods. For instance, a naturalOrder method was added to java.util.Comparator:

public static <T extends Comparable<? super T>>
Comparator<T> naturalOrder() {
    return (Comparator<T>)
        Comparators.NaturalOrderComparator.INSTANCE;
}

A common scenario in Java libraries is, for some interface Foo, there would be a companion utility class Foos with static methods for generating or working with Foo instances. Now that static methods can exist on interfaces, in many cases the Foos utility class can go away (or be made package-private), with its public methods going on the interface instead.

Additionally, more importantly, interfaces can now define default methods. For instance, a forEach method was added to java.lang.Iterable:

public default void forEach(Consumer<? super T> action) {
    Objects.requireNonNull(action);
    for (T t : this) {
        action.accept(t);
    }
}

In the past it was essentially impossible for Java libraries to add methods to interfaces. Adding a method to an interface would mean breaking all existing code that implements the interface. Now, as long as a sensible default implementation of a method can be provided, library maintainers can add methods to these interfaces.

In Java 8, a large number of default methods have been added to core JDK interfaces. I’ll discuss many of them later.

Why can’t default methods override equals, hashCode, and toString?

An interface cannot provide a default implementation for any of the methods of the Object class. In particular, this means one cannot provide a default implementation for equals, hashCode, or toString from within an interface.

This seems odd at first, given that some interfaces actually define their equals behavior in documentation. The List interface is an example. So, why not allow this?

Brian Goetz gave four reasons in a lengthy response on the Project Lambda mailing list. I’ll only describe one here, because that one was enough to convince me:

It would become more difficult to reason about when a default method is invoked. Right now it’s simple: if a class implements a method, that always wins over a default implementation. Since all instances of interfaces are Objects, all instances of interfaces have non-default implementations of equals/hashCode/toString already. Therefore, a default version of these on an interface is always useless, and it may as well not compile.

For further reading, see this explanation written by Brian Goetz: response to “Allow default methods to override Object’s methods”

Functional interfaces

A core concept introduced in Java 8 is that of a “functional interface”. An interface is a functional interface if it defines exactly one abstract method. For instance, java.lang.Runnable is a functional interface because it only defines one abstract method:

public abstract void run();

Note that the “abstract” modifier is implied because the method lacks a body. It is not necessary to specify the “abstract” modifier, as this code does, in order to qualify as a functional interface.

Default methods are not abstract, so a functional interface can define as many default methods as it likes.

A new annotation, @FunctionalInterface, has been introduced. It can be placed on an interface to declare the intention of it being a functional interface. It will cause the interface to refuse to compile unless you’ve managed to make it a functional interface. It’s sort of like @Override in this way; it declares intention and doesn’t allow you to use it incorrectly.

Lambdas

An extremely valuable property of functional interfaces is that they can be instantiated using lambdas. Here are a few examples of lambdas:

Comma-separated list of inputs with specified types on the left, a block with a return on the right:

(int x, int y) -> { return x + y; }

Comma-separated list of inputs with inferred types on the left, a return value on the right:

(x, y) -> x + y

Single parameter with inferred type on the left, a return value on the right:

x -> x * x

No inputs on left (official name: “burger arrow”), return value on the right:

() -> x

Single parameter with inferred type on the left, a block with no return (void return) on the right:

x -> { System.out.println(x); }

Static method reference:

String::valueOf

Non-static method reference:

Object::toString

Capturing method reference:

x::toString

Constructor reference:

ArrayList::new

You can think of method reference forms as shorthand for the other lambda forms.

Method reference Equivalent lambda expression
String::valueOf x -> String.valueOf(x)
Object::toString x -> x.toString()
x::toString () -> x.toString()
ArrayList::new () -> new ArrayList<>()

Of course, methods in Java can be overloaded. Classes can have multiple methods with the same name but different parameters. The same goes for its constructors. ArrayList::new could refer to any of its three constructors. The method it resolves to depends on which functional interface it’s being used for.

A lambda is compatible with a given functional interface when their “shapes” match. By “shapes”, I’m referring to the types of the inputs, outputs, and declared checked exceptions.

To give a couple of concrete, valid examples:

Comparator<String> c = (a, b) -> Integer.compare(a.length(),
                                                 b.length());

A Comparator<String>’s compare method takes two strings as input, and returns an int. That’s consistent with the lambda on the right, so this assignment is valid.

Runnable r = () -> { System.out.println("Running!"); }

A Runnable’s run method takes no arguments and does not have a return value. That’s consistent with the lambda on the right, so this assignment is valid.

The checked exceptions (if present) in the abstract method’s signature matter too. The lambda can only throw a checked exception if the functional interface declares that exception in its signature.

Capturing versus non-capturing lambdas

Lambdas are said to be “capturing” if they access a non-static variable or object that was defined outside of the lambda body. For example, this lambda captures the variable x:

int x = 5;
return y -> x + y;

In order for this lambda declaration to be valid, the variables it captures must be “effectively final”. So, either they must be marked with the final modifier, or they must not be modified after they’re assigned.

Whether a lambda is capturing or not has implications for performance. A non-capturing lambda is generally going to be more efficient than a capturing one. Although this is not defined in any specifications (as far as I know), and you shouldn’t count on it for a program’s correctness, a non-capturing lambda only needs to be evaluated once. From then on, it will return an identical instance. Capturing lambdas need to be evaluated every time they’re encountered, and currently that performs much like instantiating a new instance of an anonymous class.

What lambdas don’t do

There are a few features that lambdas don’t provide, which you should keep in mind. They were considered for Java 8 but were not included, for simplicity and due to time constraints.

Non-final variable capture – If a variable is assigned a new value, it can’t be used within a lambda. The “final” keyword is not required, but the variable must be “effectively final” (discussed earlier). This code does not compile:

int count = 0;
List<String> strings = Arrays.asList("a", "b", "c");
strings.forEach(s -> {
    count++; // error: can't modify the value of count
});

Exception transparency – If a checked exception may be thrown from inside a lambda, the functional interface must also declare that checked exception can be thrown. The exception is not propogated to the containing method. This code does not compile:

void appendAll(Iterable<String> values, Appendable out)
        throws IOException { // doesn't help with the error
    values.forEach(s -> {
        out.append(s); // error: can't throw IOException here
                       // Consumer.accept(T) doesn't allow it
    });
}

There are ways to work around this, where you can define your own functional interface that extends Consumer and sneaks the IOException through as a RuntimeException. I tried this out in code and found it to be too confusing to be worthwhile.

Control flow (break, early return) – In the forEach examples above, a traditional continue is possible by placing a “return;” statement within the lambda. However, there is no way to break out of the loop or return a value as the result of the containing method from within the lambda. For example:

final String secret = "foo";
boolean containsSecret(Iterable<String> values) {
    values.forEach(s -> {
        if (secret.equals(s)) {
            ??? // want to end the loop and return true, but can't
        }
    });
}

For further reading about these issues, see this explanation written by Brian Goetz: response to “Checked exceptions within Block<T>

Why abstract classes can’t be instantiated using a lambda

An abstract class, even if it declares only one abstract method, cannot be instantiated with a lambda.

Two examples of classes with one abstract method are Ordering and CacheLoader from the Guava library. Wouldn’t it be nice to be able to declare instances of them using lambdas like this?

Ordering<String> order = (a, b) -> ...;
CacheLoader<String, String> loader = (key) -> ...;

The most common argument against this was that it would add to the difficulty of reading a lambda. Instantiating an abstract class in this way could lead to execution of hidden code: that in the constructor of the abstract class.

Another reason is that it throws out possible optimizations for lambdas. In the future, it may be the case that lambdas are not evaluated into object instances. Letting users declare abstract classes with lambdas would prevent optimizations like this.

Besides, there’s an easy workaround. Actually, the two example classes from Guava already demonstrate this workaround. Add factory methods to convert from a lambda to an instance:

Ordering<String> order = Ordering.from((a, b) -> ...);
CacheLoader<String, String> loader =
    CacheLoader.from((key) -> ...);

For further reading, see this explanation written by Brian Goetz: response to “Allow lambdas to implement abstract classes”

java.util.function

Package summary: java.util.function

As demonstrated earlier with Comparator and Runnable, interfaces already defined in the JDK that happen to be functional interfaces are compatible with lambdas. The same goes for any functional interfaces defined in your own code or in third party libraries.

But there are certain forms of functional interfaces that are widely, commonly useful, which did not exist previously in the JDK. A large number of these interfaces have been added to the new java.util.function package. Here are a few:

  • Function<T, R> – take a T as input, return an R as ouput
  • Predicate<T> – take a T as input, return a boolean as output
  • Consumer<T> – take a T as input, perform some action and don’t return anything
  • Supplier<T> – with nothing as input, return a T
  • BinaryOperator<T> – take two T’s as input, return one T as output, useful for “reduce” operations

Primitive specializations for most of these exist as well. They’re provided in int, long, and double forms. For instance:

  • IntConsumer – take an int as input, perform some action and don’t return anything

These exist for performance reasons, to avoid boxing and unboxing when the inputs or outputs are primitives.

java.util.stream

Package summary: java.util.stream

The new java.util.stream package provides utilities “to support functional-style operations on streams of values” (quoting the javadoc). Probably the most common way to obtain a stream will be from a collection:

Stream<T> stream = collection.stream();

A stream is something like an iterator. The values “flow past” (analogy to a stream of water) and then they’re gone. A stream can only be traversed once, then it’s used up. Streams may also be infinite.

Streams can be sequential or parallel. They start off as one and may be switched to the other using stream.sequential() or stream.parallel(). The actions of a sequential stream occur in serial fashion on one thread. The actions of a parallel stream may be happening all at once on multiple threads.

So, what do you do with a stream? Here is the example given in the package javadocs:

int sumOfWeights = blocks.stream().filter(b -> b.getColor() == RED)
                                  .mapToInt(b -> b.getWeight())
                                  .sum();

Note: The above code makes use of a primitive stream, and a sum() method is only available on primitive streams. There will be more detail on primitive streams shortly.

A stream provides a fluent API for transforming values and performing some action on the results. Stream operations are either “intermediate” or “terminal”.

  • Intermediate – An intermediate operation keeps the stream open and allows further operations to follow. The filter and map methods in the example above are intermediate operations. The return type of these methods is Stream; they return the current stream to allow chaining of more operations.
  • Terminal – A terminal operation must be the final operation invoked on a stream. Once a terminal operation is invoked, the stream is “consumed” and is no longer usable. The sum method in the example above is a terminal operation.

Usually, dealing with a stream will involve these steps:

  1. Obtain a stream from some source.
  2. Perform one or more intermediate operations.
  3. Perform one terminal operation.

It’s likely that you’ll want to perform all those steps within one method. That way, you know the properties of the source and the stream and can ensure that it’s used properly. You probably don’t want to accept arbitrary Stream<T> instances as input to your method because they may have properties you’re ill-equipped to deal with, such as being parallel or infinite.

There are a couple more general properties of stream operations to consider:

  • Stateful – A stateful operation imposes some new property on the stream, such as uniqueness of elements, or a maximum number of elements, or ensuring that the elements are consumed in sorted fashion. These are typically more expensive than stateless intermediate operations.
  • Short-circuiting – A short-circuiting operation potentially allows processing of a stream to stop early without examining all the elements. This is an especially desirable property when dealing with infinite streams; if none of the operations being invoked on a stream are short-circuiting, then the code may never terminate.

Here are short, general descriptions for each Stream method. See the javadocs for more thorough explanations. Links are provided below for each overloaded form of the operation.

Intermediate operations:

  • filter [1] – Exclude all elements that don’t match a Predicate.
  • map [1] [2] [3] [4] – Perform a one-to-one transformation of elements using a Function.
  • flatMap [1] [2] [3] [4] – Transform each element into zero or more elements by way of another Stream.
  • peek [1] – Perform some action on each element as it is encountered. Primarily useful for debugging.
  • distinct [1] – Exclude all duplicate elements according to their .equals behavior. This is a stateful operation.
  • sorted [1] [2] – Ensure that stream elements in subsequent operations are encountered according to the order imposed by a Comparator. This is a stateful operation.
  • limit [1] – Ensure that subsequent operations only see up to a maximum number of elements. This is a stateful, short-circuiting operation.
  • skip [1] – Ensure that subsequent operations do not see the first n elements. This is a stateful operation.

Terminal operations:

  • forEach [1] – Perform some action for each element in the stream.
  • toArray [1] [2] – Dump the elements in the stream to an array.
  • reduce [1] [2] [3] – Combine the stream elements into one using a BinaryOperator.
  • collect [1] [2] – Dump the elements in the stream into some container, such as a Collection or Map.
  • min [1] – Find the minimum element of the stream according to a Comparator.
  • max [1] – Find the maximum element of the stream according to a Comparator.
  • count [1] – Find the number of elements in the stream.
  • anyMatch [1] – Find out whether at least one of the elements in the stream matches a Predicate. This is a short-circuiting operation.
  • allMatch [1] – Find out whether every element in the stream matches a Predicate. This is a short-circuiting operation.
  • noneMatch [1] – Find out whether zero elements in the stream match a Predicate. This is a short-circuiting operation.
  • findFirst [1] – Find the first element in the stream. This is a short-circuiting operation.
  • findAny [1] – Find any element in the stream, which may be cheaper than findFirst for some streams. This is a short-circuiting operation.

As noted in the javadocs, intermediate operations are lazy. Only a terminal operation will start the processing of stream elements. At that point, no matter how many intermediate operations were included, the elements are then consumed in (usually, but not quite always) a single pass. (Stateful operations such as sorted() and distinct() may require a second pass over the elements.)

Streams try their best to do as little work as possible. There are micro-optimizations such as eliding a sorted() operation when it can determine the elements are already in order. In operations that include limit(x) or substream(x,y), a stream can sometimes avoid performing intermediate map operations on the elements it knows aren’t necessary to determine the result. I’m not going to be able to do the implementation justice here; it’s clever in lots of small but significant ways, and it’s still improving.

Returning to the concept of parallel streams, it’s important to note that parallelism is not free. It’s not free from a performance standpoint, and you can’t simply swap out a sequential stream for a parallel one and expect the results to be identical without further thought. There are properties to consider about your stream, its operations, and the destination for its data before you can (or should) parallelize a stream. For instance: Does encounter order matter to me? Are my functions stateless? Is my stream large enough and are my operations complex enough to make parallelism worthwhile?

There are primitive-specialized versions of Stream for ints, longs, and doubles:

One can convert back and forth between an object stream and a primitive stream using the primitive-specialized map and flatMap functions, among others. To give a few contrived examples:

List<String> strings = Arrays.asList("a", "b", "c");
strings.stream()                    // Stream<String>
       .mapToInt(String::length)    // IntStream
       .longs()                     // LongStream
       .mapToDouble(x -> x / 10.0)  // DoubleStream
       .boxed()                     // Stream<Double>
       .mapToLong(x -> 1L)          // LongStream
       .mapToObj(x -> "")           // Stream<String>
       ...

The primitive streams also provide methods for obtaining basic numeric statistics about the stream as a data structure. You can find the count, sum, min, max, and mean of the elements all from one terminal operation.

There are not primitive versions for the rest of the primitive types because it would have required an unacceptable amount of bloat in the JDK. IntStream, LongStream, and DoubleStream were deemed useful enough to include, and streams of other numeric primitives can represented using these three via widening primitive conversion.

One of the most confusing, intricate, and useful terminal stream operations is collect. It introduces a new interface called Collector. This interface is somewhat difficult to understand, but fortunately there is a Collectors utility class for generating all sorts of useful Collectors. For example:

List<String> strings = values.stream()
                             .filter(...)
                             .map(...)
                             .collect(Collectors.toList());

If you want to put your stream elements into a Collection, Map, or String, then Collectors probably has what you need. It’s definitely worthwhile to browse through the javadoc of that class.

Generic type inference improvements

Summary of proposal: JEP 101: Generalized Target-Type Inference

This was an effort to improve the ability of the compiler to determine generic types where it was previously unable to. There were many cases in previous versions of Java where the compiler could not figure out the generic types for a method in the context of nested or chained method invocations, even when it seemed “obvious” to the programmer. Those situations required the programmer to explicitly specify a “type witness”. It’s a feature of generics that surprisingly few Java programmers know about (I’m saying this based on personal interactions and reading StackOverflow questions). It looks like this:

// In Java 7:
foo(Utility.<Type>bar());
Utility.<Type>foo().bar();

Without the type witnesses, the compiler might fill in <Object> as the generic type, and the code would fail to compile if a more specific type was required instead.

Java 8 improves this situation tremendously. In many more cases, it can figure out a more specific generic type based on the context.

// In Java 8:
foo(Utility.bar());
Utility.foo().bar();

This one is still a work in progress, so I’m not sure how many of the examples listed in the proposal will actually be included for Java 8. Hopefully it’s all of them.

java.time

Package summary: java.time

The new date/time API in Java 8 is contained in the java.time package. If you’re familiar with Joda Time, it will be really easy to pick up. Actually, I think it’s so well-designed that even people who have never heard of Joda Time should find it easy to pick up.

Almost everything in the API is immutable, including the value types and the formatters. No more worrying about exposing Date fields or dealing with thread-local date formatters.

The intermingling with the legacy date/time API is minimal. It was a clean break:

The new API prefers enums over integer constants for things like months and days of the week.

So, what’s in it? The package-level javadocs do an excellent job of explaining the additional types. I’ll give a brief rundown of some noteworthy parts.

Extremely useful value types:

Less useful value types:

Other useful types:

  • DateTimeFormatter – for converting datetime objects to strings
  • ChronoUnit – for figuring out the amount of time bewteen two points, e.g. ChronoUnit.DAYS.between(t1, t2)
  • TemporalAdjuster – e.g. date.with(TemporalAdjuster.firstDayOfMonth())

The new value types are, for the most part, supported by JDBC. There are minor exceptions, such as ZonedDateTime which has no counterpart in SQL.

Collections API additions

The fact that interfaces can define default methods allowed the JDK authors to make a large number of additions to the collection API interfaces. Default implementations for these are provided on all the core interfaces, and more efficient or well-behaved overridden implementations were added to all the concrete classes, where applicable.

Here’s a list of the new methods:

Also, Iterator.remove() now has a default, throwing implementation, which makes it slightly easier to define unmodifiable iterators.

Collection.stream() and Collection.parallelStream() are the main gateways into the stream API. There are other ways to generate streams, but those are going to be the most common by far.

The addition of List.sort(Comparator) is fantastic. Previously, the way to sort an ArrayList was this:

Collections.sort(list, comparator);

That code, which was your only option in Java 7, was frustratingly inefficient. It would dump the list into an array, sort the array, then use a ListIterator to insert the array contents into the list in new positions.

The default implementation of List.sort(Comparator) still does this, but concrete implementing classes are free to optimize. For instance, ArrayList.sort invokes Arrays.sort on the ArrayList’s internal array. CopyOnWriteArrayList does the same.

Performance isn’t the only potential gain from these new methods. They can have more desirable semantics, too. For instance, sorting a Collections.synchronizedList() is an atomic operation using list.sort. You can iterate over all its elements as an atomic operation using list.forEach. Previously this was not possible.

Map.computeIfAbsent makes working with multimap-like structures easier:

// Index strings by length:
Map<Integer, List<String>> map = new HashMap<>();
for (String s : strings) {
    map.computeIfAbsent(s.length(),
                        key -> new ArrayList<String>())
       .add(s);
}

// Although in this case the stream API may be a better choice:
Map<Integer, List<String>> map = strings.stream()
    .collect(Collectors.groupingBy(String::length));

Concurrency API additions

ForkJoinPool.commonPool() is the structure that handles all parallel stream operations. It is intended as an easy, good way to obtain a ForkJoinPool/ExecutorService/Executor when you need one.

ConcurrentHashMap<K, V> was completely rewritten. Internally it looks nothing like the version that was in Java 7. Externally it’s mostly the same, except it has a large number of bulk operation methods: many forms of reduce, search, and forEach.

ConcurrentHashMap.newKeySet() provides a concurrent java.util.Set implementation. It is essentially another way of writing Collections.newSetFromMap(new ConcurrentHashMap<T, Boolean>()).

StampedLock is a new lock implementation that can probably replace ReentrantReadWriteLock in most cases. It performs better than RRWL when used as a plain read-write lock. Is also provides an API for “optimistic reads”, where you obtain a weak, cheap version of a read lock, do the read operation, then check afterwards if your lock was invalidated by a write. There’s more detail about this class and its performance in a set of slides put together by Heinz Kabutz (starting about half-way through the set of slides): “Phaser and StampedLock Presentation”

CompletableFuture<T> is a nice implementation of the Future interface that provides a ton of methods for performing (and chaining together) asynchronous tasks. It relies on functional interfaces heavily; lambdas are a big reason this class was worth adding. If you are currently using Guava’s Future utilities, such as Futures, ListenableFuture, and SettableFuture, you may want to check out CompletableFuture as a potential replacement.

IO/NIO API additions

Most of these additions give you ways to obtain java.util.stream.Stream from files and InputStreams. They’re a bit different from the streams you obtain from regular collections though. For one, they may throw UncheckedIOException. Also, they are instances of streams where using the stream.close() method is necessary. Streams implement AutoCloseable and can therefore be used in try-with-resources statements.
Streams also have an onClose(Runnable) intermediate operation that I didn’t list in the earlier section about streams. It allows you to attach handlers to a stream that execute when it is closed. Here is an example:

// Print the lines in a file, then "done"
try (Stream lines = Files.lines(path, UTF_8)) {
    lines.onClose(() -> System.out.println("done"))
	     .forEach(System.out::println);
}

Reflection and annotation changes

Annotations are allowed in more places, e.g. List<@Nullable String>. The biggest impact of this is likely to be for static analysis tools such as Sonar and FindBugs.

This JSR 308 website does a better job of explaining the motivation for these changes than I could possibly do: “Type Annotations (JSR 308) and the Checker Framework”

Nashorn JavaScript Engine

Summary of proposal: JEP 174: Nashorn JavaScript Engine

I did not experiment with Nashorn so I know very little beyond what’s described in the proposal above. Short version: It’s the successor to Rhino. Rhino is old and a little bit slow, and the developers decided they’d be better off starting from scratch.

Other miscellaneous additions to java.lang, java.util, and elsewhere

There is too much there to talk about, but I’ll pick out a few noteworthy items.

ThreadLocal.withInitial(Supplier<T>) makes declaring thread-local variables with initial values much nicer. Previously you would supply an initial value like this:

ThreadLocal<List<String>> strings =
    new ThreadLocal<List<String>>() {
        @Override
        protected List<String> initialValue() {
             return new ArrayList<>();
        }
    };

Now it’s like this:

ThreadLocal<List<String>> strings =
    ThreadLocal.withInital(ArrayList::new);

Optional<T> appears in the stream API as the return value for methods like min/max, findFirst/Any, and some forms of reduce. It’s used because there might not be any elements in the stream, and it provides a fluent API for handling the “some result” versus “no result” cases. You can provide a default value, throw an exception, or execute some action only if the result exists.

It’s very, very similar to Guava’s Optional class. It’s nothing at all like Option in Scala, nor is it trying to be, and the name similarity there is purely coincidental.

Aside: it’s interesting that Java 8’s Optional and Guava’s Optional ended up being so similar, despite the absurd amount of debate that occurred over its addition to both libraries.

“FYI…. Optional was the cause of possibly the single greatest conflagration on the internal Java libraries discussion lists ever.”

Kevin Bourrillion in response to “Some new Guava classes targeted for release 10”

“On a purely practical note, the discussions surrounding Optional have exceeded its design budget by several orders of magnitude.”

Brian Goetz in response to “Optional require(s) NonNull”

StringJoiner and String.join(...) are long, long overdue. They are so long overdue that the vast majority of Java developers likely have already written or have found utilities for joining strings, but it is nice for the JDK to finally provide this itself. Everyone has encountered situations where joining strings is required, and it is a Good Thing™ that we can now express that through a standard API that every Java developer (eventually) will know.

Comparator provides some very nice new methods for doing chained comparisons and field-based comparisons. For example:

people.sort(
    Comparator.comparing(Person::getLastName)
        .thenComparing(Person::getFirstName)
        .thenComparing(
            Person::getEmailAddress,
            Comparator.nullsLast(CASE_INSENSITIVE_ORDER)));

These additions provide good, readable shorthand for complex sorts. Many of the use cases served by Guava’s ComparisonChain and Ordering utility classes are now served by these JDK additions. And for what it’s worth, I think the JDK verions read better than the functionally-equivalent versions expressed in Guava-ese.

More?

There are lots of various small bug fixes and performance improvements that were not covered in this post. But they are appreciated too!

This post was intended to cover every single language-level and API-level change coming in Java 8. If any were missed, it was an error that should be corrected. Please let me know if you discover an omission.

Storage Worries

January 8, 2013

Alan Laser

 

In the 1980s, high-tech companies stored information about their customers on their sophisticated and high-cost computer equipment. Back then such practices were exceptional except at relatively large companies. Thirty years later, it’s so commonplace that there are numerous services to store customer data for you “in the cloud.”

A layperson would be excused to think that in 2012, questions about how to store data on computers have been worked out. The established options have indeed remained consistent for years, decades. The commonplace language for structuring, storing, finding, and fetching data-SQL-has been with us a very long time. So long, in fact, that layers of conventional thinking have been built up. More recently, a healthy desire to shake off that conventional thinking and re-imagine data storage has emerged.

Data storage is enjoying a deserved renaissance thanks to the research and hands-on work of a number of people who together are informally known as the NoSQL movement.

The good stuff

The objectives of each NoSQL implementation vary, but generally, the appeal to developers (like us) comes from these common advantages versus traditional SQL:

  • Thank goodness, there is no SQL language to deal with. The APIs are purpose-built for modern notions of structure, store, find, and retrieve. That usually means one fewer layer of translation between stored data and live in-memory data.
  • Clients and servers generally communicate over human-readable protocols like HTTP. This makes us happy because we know this protocol and we don’t know old database protocols like TDS.
  • As our server(s) reach their request-processing limits, we can theoretically add more with relatively little pain.
  • Data is automatically duplicated according to tunable rules so that we don’t panic (severely) when a server blows up.
  • Some implementations can distribute complex queries to multiple servers automatically.

These advantages need to be evaluated in context. For us, the context is building technology solutions that meet business objectives for our clients. From that perspective, they reduce into:

  • A fresh approach to integration with applications, which in some cases may decrease programmer effort. Decreased effort translates into lower implementation costs.
  • Less cumbersome resolution to future scale events.

Our clients rightfully don’t care how hip using a NoSQL server makes us feel. They don’t care precisely how NoSQL may reduce the pain of future scaling, but knowing that future scale is somewhat less painful does give them some comfort.

The not-so-good stuff

This context sheds some light on disadvantages that are often downplayed when building an application in-house. As mentioned earlier, internal teams are afforded the luxury (whether or not their bosses would necessarily agree) to adopt new technologies with less consideration of risk.

Higher than average risk-aversion and a horizon of hand-off to an internal team means we consider the following:

  • There are few standards (yet) with NoSQL.
  • Community popularity is slowly coalescing but volatility remains.
  • Finding the right team in the future may be difficult for our client.
  • Reality is that most clients don’t have a reasonable expectation of scale that suggest NoSQL.
  • Real-time performance characteristics may be an issue.

Let’s start with scale. The lure of smooth, no-pain scaling will appeal to anyone who has been through efforts to scale traditional databases. Talking about scale is important even early on in a project’s lifecycle. However, the usage level where scaling a traditional database server by “throwing better hardware at it” stops being practical is quite high. A single modern high-performance server can process a tremendous amount of user data, at least from the point of view of a start-up company.

Favoring ease of scalability in selection of a data storage platform makes sense if the scale plans are real. But if scale plans are imaginary, hopeful, ambitious, or wishful thinking, ease of scalability is not as important. Everyone wants to believe they will be the next Facebook. But more likely you’ll be the next site with ten thousand users working really hard to get to your first one hundred thousand.

In that 10,000 users to 100,000 users bracket, many applications’ entire data set fits in system memory. Performance is going to be reasonably quick (at least in terms of basic get and put operations) with any kind of database.

How about ease of development? Working with NoSQL databases can be more “fluent” because the native APIs of NoSQL discard database legacy and focus on basic verbs like put and get. As a result, developer efficiency may be slightly improved with a NoSQL platform.

Traditional relational databases are encumbered by an impedance mismatch between in-memory objects and relational data entities. This mismatch lead to object-relational mapping tools (ORMs) and subsequent debates between fans and critics of ORMs. However, lightweight ORMs reduce the most commonplace database operations into interfaces as fluent as those offered by NoSQL.

Ultimately, it’s essentially a wash in terms of comparing the level of effort. NoSQL may enjoy a small advantage in the form of developer happiness: most developers innately like working with new, cool things.

Where do we land?

Clearly, the decision is predicated on the specifics of the application. Systems with pre-existing large scale are generally well-fitted to NoSQL. Systems with anticipated but undefined analytical needs are generally better served by SQL. Often we select traditional SQL databases because we want clients to be well-positioned to deal with unknowns and traditional databases are not performance slouches.

In other words, with fairly vanilla application requirements and scale targets, a traditional SQL database avoids some degree of risk, and risk moderation is compelling even in small doses.

If and when the client wants to build an internal team, it will be easier to find developers with the necessary experience.

Traditional platforms are more stable. For example, although the risk of being abandoned by a popular NoSQL platform are low, there is virtually no risk of MySQL, Postgres, or Microsoft SQL Server being outright abandoned in the next decade. That sort of huge horizon is unnecessary, but still comforting.

Finally, scaling a traditional database may be slightly more difficult than a NoSQL option, but not sufficiently to be a factor unless scale-growth concerns are well justified.