java.util.EnumSet
and java.util.EnumMap
from Java’s standard libraries.
- What are they?
- When should they be used?
- Could the implementations be improved?
- Could the APIs be improved?
- Conclusion
What are they?
EnumSet
and
EnumMap
are compact, efficient implementations of the Set
and Map
interfaces. They have the constraint that their elements/keys come from a single enum
type.
Like HashSet
and HashMap
, they are modifiable.
In contrast to HashSet
, EnumSet
:
- Consumes less memory, usually.
- Is faster at all the things a
Set
can do, usually. - Iterates over elements in a predictable order (the declaration order of the element type’s
enum
constants). - Rejects
null
elements.
In contrast to HashMap
, EnumMap
:
- Consumes less memory, usually.
- Is faster at all the things a
Map
can do, usually. - Iterates over entries in a predictable order (the declaration order of the key type’s
enum
constants). - Rejects
null
keys.
If you’re wondering how this is possible, I encourage you to look at the source code:
EnumSet
A bit vector of the ordinals of the elements in theSet
. This is an abstract superclass ofRegularEnumSet
andJumboEnumSet
.RegularEnumSet
AnEnumSet
whose bit vector is a single primitivelong
, which is enough to handle allenum
types having 64 or fewer constants.JumboEnumSet
AnEnumSet
whose bit vector is along[]
array, which is allocated however many slots are necessary for the givenenum
type. Two slots are allocated for 128 or fewer constants, three slots for 192 or fewer constants, etc.EnumMap
A flat array of theMap
‘s values indexed by the ordinals of their keys.
EnumSet
and EnumMap
cheat! They use privileged code like this:
**
* Returns all of the values comprising E.
* The result is uncloned, cached, and shared by all callers.
*/
private static <E extends Enum<E>> E[] getUniverse(Class<E> elementType) {
return SharedSecrets.getJavaLangAccess()
.getEnumConstantsShared(elementType);
}
If you want all the Month
constants, you might call Month.values()
, giving you a Month[]
array. There is a single backing array instance of those Month
constants living in memory somewhere (a private field in the Class
object for Month
), but it wouldn’t be safe to pass that array directly to every caller of values()
. Imagine if someone modified that array! Instead, values()
creates a fresh clone of the array for each caller.
EnumSet
and EnumMap
get to skip that cloning step. They have direct access to the backing array.
Effectively, no third-party versions of these classes can be as efficient. Third-party libraries that provide enum
-specialized collections tend to delegate to EnumSet
and EnumMap
. It’s not that the library authors are lazy or incapable; delegating is the correct choice for them.
When should they be used?
Historically, Enum{Set,Map}
were recommended as a matter of safety, taking better advantage of Java’s type system than the alternatives.
Prefer enum
types and Enum{Set,Map}
over int
flags.
Effective Java goes into detail about this use case for Enum{Set,Map}
and enum
types in general. If you write a lot of Java code, then you should read that book and follow its advice.
Before enum
types existed, people would declare flags as int
constants. Sometimes the flags would be powers of two and combined into sets using bitwise arithmetic:
static final int OVERLAY_STREETS = 1 << 0;
static final int OVERLAY_ELECTRIC = 1 << 1;
static final int OVERLAY_PLUMBING = 1 << 2;
static final int OVERLAY_TERRAIN = 1 << 3;
void drawCityMap(int overlays) { ... }
drawCityMap(OVERLAY_STREETS | OVERLAY_PLUMBING);
Other times the flags would start at zero and count up by one, and they would be used as array indexes:
static final int MONSTER_SLIME = 0;
static final int MONSTER_GHOST = 1;
static final int MONSTER_SKELETON = 2;
static final int MONSTER_GOLEM = 3;
int[] kills = getMonstersSlain();
if (kills[MONSTER_SLIME] >= 10) { ... }
These approaches got the job done for many people, but they were somewhat error-prone and difficult to maintain.
When enum
types were introduced to the language, Enum{Set,Map}
came with them. Together they were meant to provide better tooling for problems previously solved with int
flags. We would say, “Don’t use int
flags, use enum
constants. Don’t use bitwise arithmetic for sets of flags, use EnumSet
. Don’t use arrays for mappings of flags, use EnumMap
.” This was not because the enum
-based solutions were faster than int
flags — they were probably slower — but because the enum
-based solutions were easier to understand and implement correctly.
Fast forward to today, I don’t see many people using int
flags anymore (though there are notable exceptions). We’ve had enum
types in the language for more than a decade. We’re all using enum
types here and there, we’re all using the collections framework. At this point, while Effective Java‘s advice regarding Enum{Set,Map}
is still valid, I think most people will never have a chance to put it into practice.
Today, we’re using enum
types in the right places, but we’re forgetting about the collection types that came with them.
Prefer Enum{Set,Map}
over Hash{Set,Map}
as a performance optimization.
- Prefer
EnumSet
overHashSet
when the elements come from a singleenum
type. - Prefer
EnumMap
overHashMap
when the keys come from a singleenum
type.
Should you refactor all of your existing code to use Enum{Set,Map}
instead of Hash{Set,Map}
? No.
Your code that uses Hash{Set,Map}
isn’t wrong. Migrating to Enum{Set,Map}
might make it faster. That’s it.
If you’ve ever used primitive collection libraries like fastutil or Trove, then it may help to think of Enum{Set,Map}
like those primitive collections. The difference is that Enum{Set,Map}
are specialized for enum
types, not primitive types, and you can use them without depending on any third-party libraries.
Enum{Set,Map}
don’t have identical semantics to Hash{Set,Map}
, so please don’t make blind, blanket replacements in your existing code.
Instead, try to remember these classes for next time. If you can make your code more efficient for free, then why not go ahead and do that, right?
If you use IntelliJ IDEA, you can have it remind you to use Enum{Set,Map}
with inspections:
- Analyze – Run inspection by name – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”
…or…
- File – Settings – Editor – Inspections – Java – Performance issues – “Set replaceable with EnumSet” or “Map replaceable with EnumMap”
SonarQube can also remind you to use Enum{Set,Map}
:
S1641
: “Sets with elements that are enum values should be replaced with EnumSet”S1640
: “Maps with keys that are enum values should be replaced with EnumMap”
For immutable versions of Enum{Set,Map}
, see the following methods from Guava:
- Factory methods:
- Collectors:
If you don’t want to use Guava, then wrap the modifiable Enum{Set,Map}
instances in Collections.unmodifiableSet(set)
or Collections.unmodifiableMap(map)
and throw away the direct references to the modifiable collections.
The resulting collections may be less efficient when it comes to operations like containsAll
and equals
than their counterparts in Guava, which may in turn be less efficient than the raw modifiable collections themselves.
Could the implementations be improved?
Since they can’t be replaced by third-party libraries, Enum{Set,Map}
had better be as good as possible! They’re good already, but they could be better.
Enum{Set,Map}
have missed out on potential upgrades since Java 8. New methods were added in Java 8 to Set
and Map
(or higher-level interfaces like Collection
and Iterable
). While the default implementations of those methods are correct, we could do better with overrides in Enum{Set,Map}
.
This issue is tracked as JDK-8170826.
Specifically, these methods should be overridden:
{Regular,Jumbo}EnumSet.forEach(action)
{Regular,Jumbo}EnumSet.iterator().forEachRemaining(action)
{Regular,Jumbo}EnumSet.spliterator()
EnumMap.forEach(action)
EnumMap.{keySet,values,entrySet}().forEach(action)
EnumMap.{keySet,values,entrySet}().iterator().forEachRemaining(action)
EnumMap.{keySet,values,entrySet}().spliterator()
I put sample implementations on GitHub in case you’re curious what these overrides might look like. They’re all pretty straightforward.
Rather than walk through each implementation in detail, I’ll share some high-level observations about them.
- The optimized
forEach
andforEachRemaining
methods are roughly 50% better than the defaults (in terms of operations per second). EnumMap.forEach(action)
benefits the most, becoming twice as fast as the default implementation.- The
iterable.forEach(action)
method is popular. Optimizing it tends to affect a large audience, which increases the likelihood that the optimization (even if small) is worthwhile. (I’d claim thatiterable.forEach(action)
is too popular, and I’d suggest that the traditional enhancedfor
loop should be preferred overforEach
except when the argument toforEach
can be written as a method reference. That’s a topic for another discussion, though.) - The
iterator.forEachRemaining(action)
method is more important than it seems. Few people use it directly, but many people use it indirectly through streams. The defaultspliterator()
delegates to theiterator()
, and the defaultstream()
delegates to thespliterator()
. In the end, stream traversal may delegate toiterator().forEachRemaining(...)
. Given the popularity of streams, optimizing this method is a good idea! - The
iterable.spliterator()
method is critical when it comes to stream performance, but writing a customSpliterator
from scratch is a non-trivial task. I recommend this approach:- Check whether the characteristics of the default spliterator are correct for your collection (often times the defaults are too conservative — for example,
EnumSet
‘s spliterator is currently missing theORDERED
,SORTED
, andNONNULL
characteristics). If they’re not correct, then provide a trivial override of the spliterator that usesSpliterators.spliterator(collection, characteristics)
to define the correct characteristics. - Don’t go further than that until you’ve read through the implementation of that spliterator, and you understand how it works, and you’re confident that you can do better. In particular, your
tryAdvance(action)
andtrySplit()
should both be better. Write a benchmark afterwards to confirm your assumptions.
- Check whether the characteristics of the default spliterator are correct for your collection (often times the defaults are too conservative — for example,
- The
map.forEach(action)
method is extremely popular and is almost always worth overriding. This is especially true for maps likeEnumMap
that create theirEntry
objects on demand. - It’s usually possible to share code across the
forEach
andforEachRemaining
methods. If you override one, you’re already most of the way there to overriding the others. - I don’t think it’s worthwhile to override
collection.removeIf(filter)
in any of these classes. ForRegularEnumSet
, where it seemed most likely to be worthwhile, I couldn’t come up with a faster implementation than the default. Enum{Set,Map}
could provide fasterhashCode()
implementations than the ones they currently inherit fromAbstractSet
andAbstractMap
, but I don’t think that would be worthwhile. In general, I don’t think optimizing thehashCode()
of collections is worthwhile unless it can somehow become a constant-time (O(1)
) operation, and even then it is questionable. Collection hash codes aren’t used very often.
Could the APIs be improved?
The implementation-level changes I’ve described are purely beneficial. There is no downside other than a moderate increase in lines of code, and the new lines of code aren’t all that complicated. (Even if they were complicated, this is java.util
! Bring on the micro-optimizations.)
Since the existing code is already so good, though, changes of this nature have limited impact. Cutting one third or one half of the execution time from an operation that’s already measured in nanoseconds is a good thing but not game-changing. I suspect that those changes will cause exactly zero users of the JDK to write their applications differently.
The more tantalizing, meaningful, and dangerous changes are the realm of the APIs.
I think that Enum{Set,Map}
are chronically underused. They have a bit of a PR problem. Some developers don’t know these classes exist. Other developers know about these classes but don’t bother to reach for them when the time comes. It’s just not a priority for them. That’s totally understandable, but… There’s avoiding premature optimization and then there’s throwing away performance for no reason — performance nihilism? Maybe we can win their hearts with API-level changes.
No one should have to go out of their way to use Enum{Set,Map}
. Ideally it should be easier than using Hash{Set,Map}
. The EnumSet.allOf(elementType)
method is a great example. If you want a Set
containing all the enum
constants of some type, then EnumSet.allOf(elementType)
is the best solution and the easiest solution.
The high-level JDK-8145048 tracks a couple of ideas for improvements in this area. In the following sections, I expand on these ideas and discuss other API-level changes.
Add immutable Enum{Set,Map}
(maybe?)
In a recent conversation on Twitter about JEP 301: Enhanced Enums, Joshua Bloch and Brian Goetz referred to theoretical immutable Enum{Set,Map}
types in the JDK.
Not sure I see the point. ImmutableEnumSet (a library change) seems far more valuable and easier to implement. Am I missing something? https://t.co/mLL3cb8etc
— Joshua Bloch (@joshbloch) December 7, 2016
a) not either/or; b) not solving same prob; c) gen enum easier than you think; d) we’d gladly take contrib of IES!
— Brian Goetz (@BrianGoetz) December 7, 2016
Still open to IE{S,M} contributions!
— Brian Goetz (@BrianGoetz) December 7, 2016
Joshua Bloch also discussed the possibility of an immutable EnumSet
in Effective Java:
“The one real disadvantage of EnumSet is that it is not, as of release 1.6, possible to create an immutable EnumSet, but this will likely be remedied in an upcoming release. In the meantime, you can wrap an EnumSet with Collections.unmodifiableSet, but conciseness and performance will suffer.”
When he said “performance will suffer”, he was probably referring to the fact that certain bulk operations of EnumSet
won’t execute as quickly when inside a wrapper collection (tracked as JDK-5039214). Consider RegularEnumSet.equals(object)
:
public boolean equals(Object o) {
if (!(o instanceof RegularEnumSet))
return super.equals(o);
RegularEnumSet<?> es = (RegularEnumSet<?>)o;
if (es.elementType != elementType)
return elements == 0 && es.elements == 0;
return es.elements == elements;
}
It’s optimized for the case that the argument is another instance of RegularEnumSet
. In that case the equality check boils down to a comparison of two primitive long
values. Now that’s fast!
If the argument to equals(object)
was not a RegularEnumSet
but instead a Collections.unmodifiableSet
wrapper, that code would fall back to its slow path.
Guava’s approach is similar to the Collections.unmodifiableSet
one, although Guava does a bit better in terms of unwrapping the underlying Enum{Set,Map}
and delegating to the super-fast optimized paths.
If your application deals exclusively with Guava’s immutable Enum{Set,Map}
wrappers, you should get the full benefit of those optimized paths from the JDK. If you mix and match Guava’s collections with the JDK’s though, the results won’t be quite as good. (RegularEnumSet
doesn’t know how to unwrap Guava’s ImmutableEnumSet
, so a comparison in that direction would invoke the slow path.)
If immutable Enum{Set,Map}
had full support in the JDK, however, it would not have those same limitations. RegularEnumSet
and friends can be changed.
What should be done in the JDK?
I spent a long time and tested a lot of code trying to come up with an answer to this. Sadly the end result is:
I don’t know.
Personally, I’m content to use Guava for this. I’ll share some observations I made along the way.
Immutable Enum{Set,Map}
won’t be faster than mutable Enum{Set,Map}
.
The current versions of Enum{Set,Map}
are really, really good. They’ll be even better once they override the defaults from Java 8.
Sometimes, having to support mutability comes with a tax on efficiency. I don’t think this is the case with Enum{Set,Map}
. At best, immutable versions of these classes will be exactly as efficient as the mutable ones.
The more likely outcome is that immutable versions will come with a small penalty to performance by expanding the Enum{Set,Map}
ecosystem.
Take RegularEnumSet.equals(object)
for example. Each time we create a new type of EnumSet
, are we going to change that code to add a new instanceof
check for our new type? If we add the check, we make that code worse at handling everything except our new type. If we don’t add the check, we…. still make that code worse! It’s less effective than it used to be; more EnumSet
instances trigger the slow path.
Classes like Enum{Set,Map}
have a userbase that is more sensitive to changes in performance than average users. If adding a new type causes some call site to become megamorphic, we might have thrown their carefully-crafted assumptions regarding performance out the window.
If we decide to add immutable Enum{Set,Map}
, we should do so for reasons unrelated to performance.
As an exception to the rule, an immutable EnumSet
containing all constants of a single enum
type would be really fast.
RegularEnumSet
sets such a high bar for efficiency. There is almost no wiggle room in Set
operations like contains(element)
for anyone else to be faster. Here’s the source code for RegularEnumSet.contains(element)
:
public boolean contains(Object e) {
if (e == null)
return false;
Class<?> eClass = e.getClass();
if (eClass != elementType && eClass.getSuperclass() != elementType)
return false;
return (elements & (1L << ((Enum<?>)e).ordinal())) != 0;
}
If you can’t do contains(element)
faster than that, you’ve already lost. Your EnumSet
is probably worthless.
There is a worthy contender, which I’ll call FullEnumSet
. It is an EnumSet
that (always) contains every constant of a single enum
type. Here is one way to write that class:
import java.util.function.Consumer;
import java.util.function.Predicate;
class FullEnumSet<E extends Enum<E>> extends EnumSet<E> {
// TODO: Add a static factory method somewhere.
FullEnumSet(Class<E> elementType, Enum<?>[] universe) {
super(elementType, universe);
}
@Override
@SuppressWarnings("unchecked")
public Iterator<E> iterator() {
// TODO: Avoid calling Arrays.asList.
// The iterator class can be shared and used directly.
return Arrays.asList((E[]) universe).iterator();
}
@Override
public Spliterator<E> spliterator() {
return Spliterators.spliterator(
universe,
Spliterator.ORDERED |
Spliterator.SORTED |
Spliterator.IMMUTABLE |
Spliterator.NONNULL |
Spliterator.DISTINCT);
}
@Override
public int size() {
return universe.length;
}
@Override
public boolean contains(Object e) {
if (e == null)
return false;
Class<?> eClass = e.getClass();
return eClass == elementType || eClass.getSuperclass() == elementType;
}
@Override
public boolean containsAll(Collection<?> c) {
if (!(c instanceof EnumSet))
return super.containsAll(c);
EnumSet<?> es = (EnumSet<?>) c;
return es.elementType == elementType || es.isEmpty();
}
@Override
@SuppressWarnings("unchecked")
public void forEach(Consumer<? super E> action) {
int i = 0, n = universe.length;
if (i >= n) {
Objects.requireNonNull(action);
return;
}
do action.accept((E) universe[i]);
while (++i < n);
}
@Override void addAll() {throw uoe();}
@Override void addRange(E from, E to) {throw uoe();}
@Override void complement() {throw uoe();}
@Override public boolean add(E e) {throw uoe();}
@Override public boolean addAll(Collection<? extends E> c) {throw uoe();}
@Override public void clear() {throw uoe();}
@Override public boolean remove(Object e) {throw uoe();}
@Override public boolean removeAll(Collection<?> c) {throw uoe();}
@Override public boolean removeIf(Predicate<? super E> f) {throw uoe();}
@Override public boolean retainAll(Collection<?> c) {throw uoe();}
private static UnsupportedOperationException uoe() {
return new UnsupportedOperationException();
}
// TODO: Figure out serialization.
// Serialization should preserve these qualities:
// - Immutable
// - Full
// - Singleton?
// Maybe it's a bad idea to extend EnumSet?
private static final long serialVersionUID = 0;
}
FullEnumSet
has many desirable properties. Of note:
contains(element)
only needs to check the type of the argument to know whether it’s a member of the set.containsAll(collection)
is extremely fast when the argument is anEnumSet
(of any kind); it boils down to comparing the element types of the two sets. It follows thatequals(object)
is just as fast in that case, sinceequals
delegates the hard work tocontainsAll
.- Since all the elements are contained in one flat array with no empty spaces, conditions are ideal for iterating and for splitting (splitting efficiency is important in the context of parallel streams).
- It beats
RegularEnumSet
in all important metrics:- Query speed (
contains(element)
, etc.) - Iteration speed
- Space consumed
- Query speed (
Asking for the full set of enum
constants of some type is a very common operation. See: every user of values()
, elementType.getEnumConstants()
, and EnumSet.allOf(elementType)
. I bet the vast majority of those users do not modify (their copy of) that set of constants. A class that is specifically tailored to that use case has a good chance of being worthwhile.
Since it’s immutable, the FullEnumSet
of each enum
type could be a lazy-initialized singleton.
Should immutable Enum{Set,Map}
reuse existing code, or should they be rewritten from scratch?
As I said earlier, the immutable versions of these classes aren’t going to be any faster. If they’re built from scratch, that code is going to look near-identical to the existing code. There would be a painful amount of copy and pasting, and I would not envy the people responsible for maintaining that code in the future.
Suppose we want to reuse the existing code. I see two general approaches:
- Do what Guava did, basically. Create unmodifiable wrappers around modifiable
Enum{Set,Map}
. Both the wrappers and the modifiable collections should be able to unwrap intelligently to take advantage of the existing optimizations for particularEnum{Set,Map}
types (as inRegularEnumSet.equals(object)
). - Extend the modifiable
Enum{Set,Map}
classes with new classes that override modifier methods to throwUnsupportedOperationException
. Optimizations that sniff for particularEnum{Set,Map}
types (as inRegularEnumSet.equals(object)
) remain exactly as effective as before without changes.
Of those two, I prefer the Guava-like approach. Extending the existing classes raises some difficult questions about the public API, particularly with respect to serialization.
What’s the public API for immutable Enum{Set,Map}
? What’s the immutable version of EnumSet.of(e1, e2, e3)
?
Here’s where I gave up.
- Should we add public
java.util.ImmutableEnum{Set,Map}
classes? - If not, where do we put the factory methods, and what do we name them?
EnumSet.immutableOf(e1, e2, e3)
?EnumSet.immutableAllOf(Month.class)
? Yuck! (Clever synonyms like “having” and “universeOf” might be even worse.) - Are the new classes instances of
Enum{Set,Map}
or do they exist in an unrelated class hierarchy? - If the new classes do extend
Enum{Set,Map}
, how is serialization affected? Do we add an “isImmutable” bit to the current serialized forms? Can that be done without breaking backwards compatibility?
Good luck to whoever has to produce the final answers to those questions.
That’s enough about this topic. Let’s move on.
Add factory methods
JDK-8145048 mentions the possibility of adding factory methods in Enum{Set,Map}
to align them with Java 9’s Set
and Map
factories. EnumSet
already has a varargs EnumSet.of(...)
factory method, but EnumMap
has nothing like that.
It would be nice to be able to declare EnumMap
instances like this, for some reasonable number of key-value pairs:
Map<DayOfWeek, String> dayNames =
EnumMap.of(
DayOfWeek.MONDAY, "lunes",
DayOfWeek.TUESDAY, "martes",
DayOfWeek.WEDNESDAY, "miércoles",
DayOfWeek.THURSDAY, "jueves",
DayOfWeek.FRIDAY, "viernes",
DayOfWeek.SATURDAY, "sábado",
DayOfWeek.SUNDAY, "domingo");
Users could use EnumMap
‘s copy constructor in conjunction with Java 9’s Map
factory methods to achieve the same result less efficiently…
Map<DayOfWeek, String> dayNames =
new EnumMap<>(
Map.of(
DayOfWeek.MONDAY, "lunes",
DayOfWeek.TUESDAY, "martes",
DayOfWeek.WEDNESDAY, "miércoles",
DayOfWeek.THURSDAY, "jueves",
DayOfWeek.FRIDAY, "viernes",
DayOfWeek.SATURDAY, "sábado",
DayOfWeek.SUNDAY, "domingo"));
…but the more we give up efficiency like that, the less EnumMap
makes sense in the first place. A reasonable person might start to question why they should bother with EnumMap
at all — just get rid of the new EnumMap<>(...)
wrapper and use Map.of(...)
directly.
Speaking of that EnumMap(Map)
copy constructor, the fact that it may throw IllegalArgumentException
when provided an empty Map
leads people to use this pattern instead:
Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class);
copy.putAll(otherMap);
We could give them a shortcut:
Map<DayOfWeek, String> copy = new EnumMap<>(DayOfWeek.class, otherMap);
Similarly, to avoid an IllegalArgumentException
from EnumSet.copyOf(collection)
, I see code like this:
Set<Month> copy = EnumSet.noneOf(Month.class);
copy.addAll(otherCollection);
We could give them a shortcut too:
Set<Month> copy = EnumSet.copyOf(Month.class, otherCollection);
Existing code may define mappings from enum
constants to values as standalone functions. Maybe the users of that code would like to view those (function-based) mappings as Map
objects.
To that end, we could give people the means to generate an EnumMap
from a Function
:
Locale locale = Locale.forLanguageTag("es-MX");
Map<DayOfWeek, String> dayNames =
EnumMap.map(DayOfWeek.class,
day -> day.getDisplayName(TextStyle.FULL, locale));
// We could interpret the function returning null to mean that the
// key is not present. That would allow this method to support
// more than the "every constant is a key" use case while dropping
// support for the "there may be present null values" use case,
// which is probably a good trade.
We could provide a similar factory method for EnumSet
, accepting a Predicate
instead of a Function
:
Set<Month> shortMonths =
EnumSet.filter(Month.class,
month -> month.minLength() < 31);
This functionality could be achieved less efficiently and more verbosely with streams. Again, the more we give up efficiency like that, the less sense it makes to use Enum{Set,Map}
in the first place. I acknowledge that there is a cost to making API-level changes like the ones I’m discussing, but I feel that we are solidly in the “too little API-level support for Enum{Set,Map}
” part of the spectrum and not even close to approaching the opposite “API bloat” end.
I don’t mean to belittle streams. There should also be more support for Enum{Set,Map}
in the stream API.
Add collectors
Code written for Java 8+ will often produce collections using streams and collectors rather than invoking collection constructors or factory methods directly. I don’t think it would be outlandish to estimate that one third of collections are produced by collectors. Some of these collections will be (or could be) Enum{Set,Map}
, and more could be done to serve that use case.
Collectors with these signatures should exist somewhere in the JDK:
public static <T extends Enum<T>>
Collector<T, ?, EnumSet<T>> toEnumSet(
Class<T> elementType)
public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
Class<K> keyType,
Function<? super T, ? extends K> keyMapper,
Function<? super T, ? extends U> valueMapper)
public static <T, K extends Enum<K>, U>
Collector<T, ?, EnumMap<K, U>> toEnumMap(
Class<K> keyType,
Function<? super T, ? extends K> keyMapper,
Function<? super T, ? extends U> valueMapper,
BinaryOperator<U>; mergeFunction)
Similar collectors can be obtained from the existing collector factories in the Collectors
class (specifically toCollection(collectionSupplier)
and toMap(keyMapper, valueMapper, mergeFunction, mapSupplier)
) or by using Collector.of(...)
, but that requires a little more effort on the users’ part, adding a little bit of extra friction to using Enum{Set,Map}
that we don’t need.
I referenced these collectors from Guava earlier in this article:
Sets.toImmutableEnumSet()
Maps.toImmutableEnumMap(keyMapper, valueMapper)
Maps.toImmutableEnumMap(keyMapper, valueMapper, mergeFunction)
They do not require the Class
object argument, making them easier to use than the collectors that I proposed. The reason the Guava collectors can do this is that they produce ImmutableSet
and ImmutableMap
, not EnumSet
and EnumMap
. One cannot create an Enum{Set,Map}
instance without having the Class
object for that enum
type. In order to have a collector that reliably produces Enum{Set,Map}
(even when the stream contains zero input elements to grab the Class
object from), the Class
object must be provided up front.
We could provide similar collectors in the JDK that would produce immutable Set
and Map
instances. For streams with no elements, the collectors would produce Collections.emptySet()
or Collections.emptyMap()
. For streams with at least one element, the collectors would produce an Enum{Set,Map}
instance wrapped by Collections.unmodifiable{Set,Map}
.
The signatures would look like this:
public static <T extends Enum<T>>
Collector<T, ?, Set<T>> toImmutableEnumSet()
public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
Function<? super T, ? extends K> keyMapper,
Function<? super T, ? extends U> valueMapper)
public static <T, K extends Enum<K>, U>
Collector<T, ?, Map<K, U>> toImmutableEnumMap(
Function<? super T, ? extends K> keyMapper,
Function<? super T, ? extends U> valueMapper,
BinaryOperator<U>gt; mergeFunction)
I’m not sure that those collectors are worthwhile. I might never recommend them over their counterparts in Guava.
The StreamEx library also provides a couple of interesting enum
-specialized collectors:
MoreCollectors.toEnumSet(elementType)
MoreCollectors.groupingByEnum(keyType, classifier, downstreamCollector)
They’re interesting because they are potentially short-circuiting. With MoreCollectors.toEnumSet(elementType)
, when the collector can determine that it has encountered all of the elements of that enum
type (which is easy — the set of already-collected elements can be compared to EnumSet.allOf(elementType)
), it stops collecting. These collectors may be well-suited for streams having a huge number of elements (or having elements that are expensive to compute) mapping to a relatively small set of enum
constants.
I don’t know how feasible it is to port these StreamEx collectors to the JDK. As I understand it, the concept of short-circuiting collectors is not supported by the JDK. Adding support may necessitate other changes to the stream and collector APIs.
Be navigable? (No)
Over the years, many people have suggested that Enum{Set,Map}
should implement the NavigableSet
and NavigableMap
interfaces. Every enum
type is Comparable
, so it’s technically possible. Why not?
I think the Navigable{Set,Map}
interfaces are a poor fit for Enum{Set,Map}
.
Those interfaces are huge! Implementing Navigable{Set,Map}
would bloat the size of Enum{Set,Map}
by 2-4x (in terms of lines of code). It would distract them from their core focus and strengths. Supporting the navigable API would most likely come with a non-zero penalty to runtime performance.
Have you ever looked closely at the specified behavior of methods like subSet
and subMap
, specifically when they might throw IllegalArgumentException
? Those contracts impose a great deal of complexity for what seems like undesirable behavior. Enum{Set,Map}
could take a stance on those methods similar to Guava’s ImmutableSortedSet
and ImmutableSortedMap
: acknowledge the contract of the interface but do something else that is more reasonable instead…
I say forget about it. If you want navigable collections, use TreeSet
and TreeMap
(or their thread-safe cousins, ConcurrentSkipListSet
and ConcurrentSkipListMap
). The cross-section of people who need the navigable API and the efficiency of enum
-specialized collections must be very small.
There are few cases where the Comparable
nature of enum
types comes into play at all. In practice, I expect that the ordering of most enum
constants is arbitrary (with respect to intended behavior).
I’ll go further than that; I think that making all enum
types Comparable
in the first place was a mistake.
- Which ordering of
Collector.Characteristics
is “natural”,[CONCURRENT,UNORDERED]
or[UNORDERED,CONCURRENT]
? - Which is the “greater”
Thread.State
,WAITING
orTIMED_WAITING
? FileVisitOption.FOLLOW_LINKS
is “comparable” — to what? (There is no otherFileVisitOption
.)- How many instances of
RoundingMode
are in the “range” fromFLOOR
toCEILING
?import java.math.RoundingMode; import java.util.EnumSet; import java.util.Set; class RangeTest { public static void main(String[] args) { Set<RoundingMode> range = EnumSet.range(RoundingMode.FLOOR, RoundingMode.CEILING); System.out.println(range.size()); } } // java.lang.IllegalArgumentException: FLOOR > CEILING
There are other enum
types where questions like that actually make sense, and those should be Comparable
.
- Is
Month.JANUARY
“before”Month.FEBRUARY
? Yes. - Is
TimeUnit.HOURS
“larger” thanTimeUnit.MINUTES
? Yes.
Implementing Comparable
or not should have been a choice for authors of individual enum
types. To serve people who really did want to sort enum
constants by declaration order for whatever reason, we could have automatically provided a static Comparator
from each enum
type:
Comparator<JDBCType> c = JDBCType.declarationOrder();
It’s too late for that now. Let’s not double down on the original mistake by making Enum{Set,Map}
navigable.
Conclusion
EnumSet
and
EnumMap
are cool collections, and you should use them!
They’re already great, but they can become even better with changes to their private implementation details. I propose some ideas here. If you want to find out what happens in the JDK, the changes (if there are any) should be noted in JDK-8170826.
API-level changes are warranted as well. New factory methods and collectors would make it easier to obtain instances of Enum{Set,Map}
, and immutable Enum{Set,Map}
could be better-supported. I propose some ideas here, but if there are any actual changes made then they should be noted in JDK-8145048.