If we have a long
(64-bit integer) that we serialize into JSON, we might be in trouble if JavaScript consumes that JSON. JavaScript has the equivalent of double
(64-bit floating point) for its numbers, and double
cannot represent the same set of numbers as long
. If we are not careful, our long
is mangled in transit.
Consider 253 + 1
. We can store that number in a long
but not a double
. Above 253
, double
does not have the bits required to represent every integer, creating gaps between the integers it can represent. 253 + 1
is the first integer to fall in one of these gaps. We can store 253
or 253 + 2
in a double
, but 253 + 1
does not fit.
If we store 253 + 1
in a long
and that number is meant to be precise, then we should avoid encoding it as a JSON number and sending it to a JavaScript client. The instant that client invokes JSON.parse
they are doomed — they see a different number.
The JSON format does not mandate a particular number precision, but the application code on either side usually does. See also: Re: [Json] Limitations on number size?
This problem only occurs with very large numbers. Perhaps all the numbers we use are safe. Are we actually mangling our numbers? Probably not…
…but will we know? Will anything blow up, or will our application be silently, subtly wrong?
I suspect that when this problem does occur, it goes undetected for longer than it should. In the remainder of this article, we examine potential improvements to our handling of long
.
Failing fast
We can change the way we serialize long
into JSON.
When we encounter a long
, we can require that the number fits into a double
without losing information. If no information would be lost, we serialize the long
as usual and move on. If information would be lost, we throw an exception and cause serialization to fail. We detonate immediately at the source of the error rather than letting it propagate around, doing who knows what.
Here is a utility method that can be used for this purpose:
public static void verifyLongFitsInDouble(long x) {
double result = x;
if (x != (long) result || x == Long.MAX_VALUE) {
throw new IllegalArgumentException("Overflow: " + x);
}
}
This approach appeals to me because it is unobtrusive. The check can be made in one central location, no changes to our view classes or client-side code are required, and it only throws exceptions in the specific cases where our default behavior is wrong.
A number that should be safe
Consider the number 262
, which spelled out in base ten is 4611686018427387904
. This number fits in both a long
and a double
. It passes our verifyLongFitsInDouble
check. Theoretically we can send it from a Java server to a JavaScript client via JSON and both sides see exactly the same number.
To convince ourselves that this number is safe, we examine various representations of this number in Java and JavaScript:
// In Java
long x = 1L << 62;
System.out.println(Long.toString(x)); // 4611686018427387904
System.out.println(Double.toString(x)); // 4.6116860184273879E18
// 100000000000000000000000000000000000000000000000000000000000000
System.out.println(Long.toString(x, 2));
// In JavaScript
var x = Math.pow(2, 62);
console.log(x.toString()); // 4611686018427388000
console.log(x.toExponential()); // 4.611686018427388e+18
console.log(x.toFixed()); // 4611686018427387904
// 100000000000000000000000000000000000000000000000000000000000000
console.log(x.toString(2));
The output of x.toString()
in JavaScript is suspicious. Do we really have the right number? We do, but we print it lazily.
x.toString()
is similar in spirit to x.toExponential()
and Double.toString(double)
from Java. These algorithms essentially print significant digits, from most significant to least, until the output is unambiguously closer to this floating point number than any other floating point number. (And that is true here. The next lowest floating point number is 262 - 512
, the next highest is 262 + 1024
, and 4611686018427388000
is closer to 262
than either of those two nearby numbers.) See also: ES6 specification for ToString(Number)
x.toFixed()
and the base two string give us more confidence that we have the correct number.
Verifying our assumptions with code
If 262
really is a safe number, we should be able to send it from the server to the client and back again. To verify that this number survives a round trip, we create an HTTP server with two kinds of endpoints:
GET
endpoints that serialize a Java object into a JSON string like{"x":number}
, where the number is a known constant (262
). The number and the JSON string are printed tostdout
. The response is that JSON string.POST
endpoints that deserialize a client-provided JSON string like{"x":number}
into a Java object. The number and JSON string are printed tostdout
. We hope that the number printed here is the same as the known constant (262
) used in ourGET
endpoints.
Any server-side web framework or HTTP server will do. We happen to use JAX-RS in our example code.
Behavior may differ between JSON (de)serialization libraries, so we test two:
In total the server provides four endpoints, each named after the JSON serialization library used by that endpoint:
GET /gson
POST /gson
GET /jackson
POST /jackson
In the JavaScript client, we:
- Loop through each library-specific pair of
GET
/POST
endpoints. - Make a request to the
GET
endpoint. - Use
JSON.parse
to deserialize the response text (a JSON string) into a JavaScript object. - Use
JSON.stringify
to serialize that JavaScript object back into a JSON string. - Print each of the following to the console:
- the incoming JSON string
- the number contained in the JavaScript object, using
x.toString()
- the number contained in the JavaScript object, using
x.toFixed()
- the outgoing JSON string
- Make a request to the
POST
endpoint, providing the (re)serialized JSON string as the request body.
Here is the server-side Java code:
package test;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.google.gson.Gson;
import javax.ws.rs.Consumes;
import javax.ws.rs.GET;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import java.io.IOException;
@Path("/")
public final class JsonResource {
public static final class Payload {
public long x;
}
private static final long EXPECTED_NUMBER = 1L << 62;
@GET
@Path("gson")
@Produces("application/json")
public String getGson() {
Payload object = new Payload();
object.x = EXPECTED_NUMBER;
String json = new Gson().toJson(object);
System.out.println("GET /gson outgoing number: "
+ object.x);
System.out.println("GET /gson outgoing JSON: "
+ json);
return json;
}
@POST
@Path("gson")
@Consumes("application/json")
public void postGson(String json) {
Payload object = new Gson().fromJson(json, Payload.class);
System.out.println("POST /gson incoming JSON: "
+ json);
System.out.println("POST /gson incoming number: "
+ object.x);
}
@GET
@Path("jackson")
@Produces("application/json")
public String getJackson() throws IOException {
Payload object = new Payload();
object.x = EXPECTED_NUMBER;
String json = new ObjectMapper().writeValueAsString(object);
System.out.println("GET /jackson outgoing number: "
+ object.x);
System.out.println("GET /jackson outgoing JSON: "
+ json);
return json;
}
@POST
@Path("jackson")
@Consumes("application/json")
public void postJackson(String json) throws IOException {
Payload object = new ObjectMapper().readValue(json, Payload.class);
System.out.println("POST /jackson incoming JSON: "
+ json);
System.out.println("POST /jackson incoming number: "
+ object.x);
}
}
Here is the client-side JavaScript code:
[ "/gson", "/jackson" ].forEach(function(endpoint) {
function handleResponse() {
var incomingJson = this.responseText;
var object = JSON.parse(incomingJson);
var outgoingJson = JSON.stringify(object);
console.log(endpoint + " incoming JSON: " + incomingJson);
console.log(endpoint + " number toString: " + object.x);
console.log(endpoint + " number toFixed: " + object.x.toFixed());
console.log(endpoint + " outgoing JSON: " + outgoingJson);
var post = new XMLHttpRequest();
post.open("POST", endpoint);
post.setRequestHeader("Content-Type", "application/json");
post.send(outgoingJson);
};
var get = new XMLHttpRequest();
get.addEventListener("load", handleResponse);
get.open("GET", endpoint);
get.send();
});
The results are disappointing
Here is the server-side output:
GET /gson outgoing number: 4611686018427387904
GET /gson outgoing JSON: {"x":4611686018427387904}
POST /gson incoming JSON: {"x":4611686018427388000}
POST /gson incoming number: 4611686018427388000
GET /jackson outgoing number: 4611686018427387904
GET /jackson outgoing JSON: {"x":4611686018427387904}
POST /jackson incoming JSON: {"x":4611686018427388000}
POST /jackson incoming number: 4611686018427388000
Here is the client-side output:
/gson incoming JSON: {"x":4611686018427387904}
/gson number toString: 4611686018427388000
/gson number toFixed: 4611686018427387904
/gson outgoing JSON: {"x":4611686018427388000}
/jackson incoming JSON: {"x":4611686018427387904}
/jackson number toString: 4611686018427388000
/jackson number toFixed: 4611686018427387904
/jackson outgoing JSON: {"x":4611686018427388000}
Both of our POST
endpoints print the wrong number. Yuck!
We do send the correct number to JavaScript, which we can verify by looking at the output of x.toFixed()
in the console. Something bad happens between when we print x.toFixed()
and when we print the number out on the server.
Why is our code wrong?
Maybe there is a particular line of our own code where we can point our finger and say, “Aha! You are wrong!” Maybe it is an issue with our architecture.
There are many ways we could choose to address this problem (or not), and what follows is certainly not an exhaustive list.
“We call JSON.parse
then JSON.stringify
. We should echo back the original JSON string.”
This avoids the problem but is nothing like a real application. The test code is standing in for an application that gets the payload object from the server, uses it as an object throughout, then later/maybe makes a request back to the server containing some or all of the data from that object.
In practice, most applications will not even see the JSON.parse
call. The call will be hidden. The front-end framework will do it, $.getJSON
will do it, etc.
“We use JSON.stringify
. We should write an alternative to JSON.stringify
that produces an exact representation of our number.”
JSON.stringify
delegates to x.toString()
. If we never use JSON.stringify
, and instead we use something like x.toFixed()
to print numbers like this, we can avoid this problem.
This is probably infeasible in practice.
If we need to produce JSON from JavaScript, of course we expect that JSON.stringify
will be involved. As with JSON.parse
, most calls happen at a distance in a library rather than our own application code.
Besides, if we really plan to avoid x.toString()
, we must do so everywhere. This is hopeless.
Suppose we commit to avoiding x.toString()
and we have user
objects that each have a numeric id
field. We can no longer write Mustache or Handlebars templates like this:
<div id="user{{id}}"> {{! functionally wrong }}
<p>ID: {{id}}</p> {{! visually wrong }}
<p>Name: {{name}}</p>
</div>
We can no longer write functions like this:
function updateEmailAddress(user, newEmail) {
// Oops, we failed for user #2^62!
var url = "/user/" + user.id + "/email";
// Tries to update the wrong user (and fails, hopefully)
$.post(url, { email: newEmail });
}
It is extremely unlikely that we will remember to avoid x.toString()
everywhere. It is much more likely that we will forget and end up with incorrect behavior all over the place.
“We treat the number as a long
literal in the POST handlers. We should treat the number as a double
literal.”
If we parse the number as a double
and cast it to a long
, we produce the correct result in all test cases.
Such a cast should be guarded with a check similar to our verifyLongFitsInDouble(long)
code from earlier. Here is a utility method that can be used for this purpose:
public static void verifyDoubleFitsInLong(double x) {
long result = (long) x;
if (Double.compare(x, result) != 0 || result == Long.MAX_VALUE) {
throw new IllegalArgumentException("Overflow: " + x);
}
}
What if the client really does mean to send us precisely the integer 4611686018427388000
? If we parse it as a double
then cast it to a long
, we mangle the intended number!
Here it is worth considering who we actually talk to as we design our APIs. If we only talk to JavaScript clients, then we only receive numbers that fit in double
because that is all our clients have. Often times these APIs are internal and the API authors are the same as the client code authors. It is reasonable in cases like that to make assumptions about who is calling us, even if technically some other caller could use our API, because we make no claim to support other callers.
If our API is designed to be public and usable by any client, we should document our behavior with respect to number precision. verifyLongFitsInDouble(long)
and verifyDoubleFitsInLong(double)
are tricky to communicate, so we may prefer a simpler rule…
“We permit some values of long
outside of the range -253 < x < 253
. We should reject values outside of that range even when they fit in double
.”
In other words, perform a bounds check on every long
number that we (de)serialize. If the absolute value of that number is less than 253
then we (de)serialize that number as usual, otherwise we throw an exception.
JavaScript clients may find this range familiar, with built-in constants to express its bounds: Number.MIN_SAFE_INTEGER
and Number.MAX_SAFE_INTEGER
.
This approach is less permissive than our verifyLongFitsInDouble(long)
and verifyDoubleFitsInLong(double)
utility methods from earlier. Those methods permit every number in this range and then more. Those methods permit numbers whose adjacent values are invalid, meaning the range of valid inputs is not contiguous.
Advantages of the less permissive approach include:
- It is easier to express in documentation.
verifyLongFitsInDouble(long)
andverifyDoubleFitsInLong(double)
would permit255 + 8
but not255 + 4
. Understanding the reason for that is more difficult than understanding that neither of those numbers are permitted with the|x| < 253
approach. - If we are actually serializing numbers like
255 + 8
, it is likely that we are trying serialize nearby numbers that cannot be stored indouble
. Permitting the extra numbers may only mask the underlying problem: this data should not be serialized into JSON numbers.
“We encode a long
as a JSON number. We should encode it as a JSON string.”
Encoding the number as a string avoids this problem.
Twitter provides string representations of its numeric ids for this reason.
This is easy to accomplish on the server. JSON serialization libraries provide a way to adopt this convention without changing the field types of our Java classes. Our Payload
class keeps using long
for its field, but any time the server serializes that field into JSON, it surrounds the numeric literal with quotation marks.
How viable is this approach for the client? If the number is only being used as an identifier—passed between functions as-is, compared using the ===
operator, used as a key in maps—then treating it as a string makes a lot of sense. If we are lucky, the client-side code is identical between the string-using and number-using versions.
If the number is used in arithmetic or passed to libraries that expect numbers, then this solution becomes less practical.
“We use JSON as the serialization format. We should use some other serialization format.”
The JSON format is not to blame for our problems, but it allows us to be sloppy.
When we use JSON we lose information about our numbers. We do not lose the values of the numbers, but we do lose the types, which tell us the precision.
A different serialization format such as Protobuf might have forced us to clarify how precise our numbers are.
“There is no problem.”
We could declare that there is no problem. Our code breaks when provided with obscenely large numbers as input, but we simply do not use numbers that large and we never will. And even though our numbers are never this large, we still want to use long
in the Java code because that is convenient for us. Other Java libraries produce or consume long
numbers, and we want to use those libraries without casting.
I suspect this is the solution that most people choose (conscious of that choice or not), and it is often not a bad solution. We really do not encounter this problem most of the time. There are other problems we could spend our time solving.
Numbers smaller in magnitude than 253
do not trigger this problem. Where are our long
numbers coming from, and how likely are they to fall outside that range?
- Auto-incrementing primary keys in a SQL database
- Will we insert more than
9,007,199,254,740,992
rows into one table? Knowing nothing at all about our theoretical application, I will venture a guess: “No.” - Epoch millisecond timestamps
253
milliseconds has us covered for±300,000
years, roughly. Are we dealing with dates outside of that range? If we are, perhaps epoch milliseconds are a poor choice for units and we should solve that problem with our units first.- Randomly-generated, unbounded
long
numbers - The majority of these do not fit in
double
. If we send these to JavaScript via JSON numbers, we will have a bad time. Are we actually doing that? - User-provided, unbounded
long
numbers - Most of these numbers should not trigger problems, but some will. The solution may be to add bounds checking on input, filtering out misbehaving numbers before they are used.
No matter what solution (or non-solution) we choose, we should make our choice deliberately. Being oblivious is not the answer.