1
votes

SUMMARY

  • some support for JSON was added to XSLT 3.0 + XPath/XQuery 3.1
  • unfortunately, JSON number types are handled as IEEE double, subjecting the data to loss of numeric precision
  • I am considering writing a set of custom functions based on Java BigDecimal instead of IEEE double

Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?

DETAIL

I need to perform a number of transformations of JSON data. XSLT 3.0 + XPath 3.1 + XQuery 3.1 have some support for JSON through json-to-xml + parse-json.

https://www.w3.org/TR/xpath-functions-31/#json-functions https://www.saxonica.com/papers/xmlprague-2016mhk.pdf

I have hit a significant snag related to treatment of numeric data types. My JSON data includes numeric values that exceed the precision of IEEE double-floats. In Java, my numeric values need to be processed using BigDecimal.

https://www.w3.org/TR/xpath-functions-31/#json-to-xml-mapping states

Information may however be lost if (a) JSON numbers are not exactly representable as double-precision floating point ...

In addition, I have taken a look at the saxonica 9.8 HE reference implementation source for ./ma/json/JsonParser.java and confirm that the private method parseNumericLiteral() returns a primitive double.

I am considering cloning the saxon 9.8 HE JSON support code and using this as the basis for a set of customized functions which uses Java BigDecimal instead of double in order to retain numeric precision through the transformations ...

Q: In order to support numeric precision beyond that offered by IEEE double, is it reasonable for me to consider cloning the JSON support in saxon 9.8 HE and building a set of customized functions which use BigDecimal instead of IEEE double?

Q: Are you aware of any unforeseen issues which I may encounter?

The XML data model defines decimal numbers as having any finite precision. https://www.w3.org/TR/xmlschema-2/#decimal

The JSON data model defines numbers as having any finite precision. https://tools.ietf.org/html/rfc7159#page-6

Not surprisingly, both warn of potential interoperability issues with numeric values with extended precision.

Q: What was the rationale for explicitly defining the JSON number type in XPath/XQuery as IEEE double?

THE END

1
Javascript/ECMAScript numbers are IEEE doubles so I suppose JSON numbers as well. Are there other JSON mappings that implement the JSON numbers as xs:decimals or Java BigDecimal?Martin Honnen
It is true that Javascript/ECMAScript numbers are IEEE doubles. However, www.json.org places no restrictions on numeric precision.Michael Howard
... It is true that Javascript/ECMAScript numbers are IEEE doubles. However, [link](www.json.org) places no restrictions on numeric precision. More importantly, RFC7159 link explicitly discusses higher precision numbers ... while warning of potential interoperability problems. The java Jackson parser supports using BigDecimal for numeric values.Michael Howard

1 Answers

2
votes

This is what the RFC says:

This specification allows implementations to set limits on the range
and precision of numbers accepted. Since software that implements
IEEE 754-2008 binary64 (double precision) numbers [IEEE754] is
generally available and widely used, good interoperability can be
achieved by implementations that expect no more precision or range
than these provide, in the sense that implementations will
approximate JSON numbers within the expected precision. A JSON
number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the
software that created it expects receiving software to have greater
capabilities for numeric magnitude and precision than is widely
available.

That, to my mind, is a pretty clear warning: it says that although the JSON grammar allows arbitrary precision in numeric values, you can't rely on JSON consumers to retain that precision, and it follows that if you want to convey high-precision numeric values, it would be better to convey them as strings.

The rules for fn:json-to-xml and fn:xml-to-json need to be read carefully:

The fn:json-to-xml function creates an element whose string value is lexically the same as the JSON representation of the number. The fn:xml-to-json function generates a JSON representation that is the result of casting the (typed or untyped) value of the node to xs:double and then casting the result to xs:string. Leading and trailing whitespace is accepted. Since JSON does not impose limits on the range or precision of numbers, these rules mean that conversion from JSON to XML will always succeed, and will retain full precision in the lexical representation unless the data model implementation is one that reconstructs the string value from the typed value. In the reverse direction, conversion from XML to JSON may fail if the value is infinity or NaN, or if the string value is such that casting to xs:double produces positive or negative infinity.

Although I probably wrote these words, I'm not sure I recall the exact rationale for why the decision was made this way, but it does suggest that the matter received careful thought. I suspect the thinking was that when you consume JSON, you should try to preserve all the information that is present in the input, but when you generate JSON, you should try to generate something that will be acceptable to all consumers. (The famous maxim about being liberal in what you accept and conservative in what you produce.)

Your analysis of the Saxon source isn't quite correct. You say:

the private method parseNumericLiteral() returns a primitive double.

which is true enough; but the original lexical representation is retained, and when the parser communicates the value to a JsonReceiver, it passes both the Java double and the string representation, so the JsonReceiver has access to both (which is needed for a correct implementation of fn:json-to-xml).