3
votes

I have timecodes with this structure hh:mm:ss.SSS for which i have a own Class, implementing the Temporal Interface. It has the custom Field TimecodeHour Field allowing values greater than 23 for hour. I want to parse with DateTimeFormatter. The hour value is optional (can be omitted, and hours can be greater than 24); as RegEx (\d*\d\d:)?\d\d:\d\d.\d\d\d

For the purpose of this Question my custom Field can be replaced with the normal HOUR_OF_DAY Field.

My current Formatter

DateTimeFormatter UNLIMITED_HOURS = new DateTimeFormatterBuilder()
    .appendValue(ChronoField.HOUR_OF_DAY, 2, 2,SignStyle.NEVER)
    .appendLiteral(':')
    .parseDefaulting(TimecodeHour.HOUR, 0)
    .toFormatter(Locale.ENGLISH);
DateTimeFormatter TIMECODE = new DateTimeFormatterBuilder()
    .appendOptional(UNLIMITED_HOURS)
    .appendValue(MINUTE_OF_HOUR, 2)
    .appendLiteral(':')
    .appendValue(SECOND_OF_MINUTE, 2)
    .appendFraction(MILLI_OF_SECOND, 3, 3, true)
    .toFormatter(Locale.ENGLISH);

Timecodes with a hour value parse as expected, but values with hours omittet throw an Exception

java.time.format.DateTimeParseException: Text '20:33.123' could not be parsed at index 5

I assume, as hour and minute have the same pattern, the parser starts at front and captures the minute value for the optional section. Is this right, and how can solve this?

3
What is TimecodeHour.HOUR? Doesn't seem to be part of the JDKMichael
@Michael TimecodeHour is my own class; a custom Field allowing values for HOUR greater than 23.Tobias Wohlfarth
Please include it in the question, in order that people can copy your code into their IDE and run it.Michael
Does a TimecodeHour represent an amount of time, e.g., a duration, rather than a time of day? If so, don’t use any Temporal for it. If you want to make your own class, it may implement TemporalAmount. You may also be happy with just using the Duration class.Ole V.V.

3 Answers

3
votes

I started to suspect that 20:33.123 wasn’t meant to indicate a time of day between 20 and 21 minutes past midnight. Maybe rather an amount of time, a little longer than 20 minutes. If this is correct, use a Duration for it.

Unfortunately java.time does not include means for parsing and formatting a Duration in other than ISO 8601 format. This leaves us with at least three options:

  1. Use a third-party library. Time4J offers an elegant solution, see below. Joda-Time has its PeriodFormatter class. Apache may also offer facilities for parsing and formatting of durations.
  2. Convert your string to ISO 8601 format before parsing with Duration.parse().
  3. Write your own parser.

I was thinking that we’re too lazy for 3. and that Joda-Time is getting dated, so I want to pursue options 1. and 2. here, option 1. in the Time4J variant.

A regex for adapting to ISO 8601

ISO 8601 format for a duration feels unusual at first, but is straightforward. PT20M33.123S means 20 minutes 33.123 seconds.

public static Duration parse(String timeCodeString) {
    String iso8601 = timeCodeString
            .replaceFirst("^(\\d{2,}):(\\d{2}):(\\d{2}\\.\\d{3})$", "PT$1H$2M$3S")
            .replaceFirst("^(\\d{2}):(\\d{2}\\.\\d{3})$", "PT$1M$2S");
    return Duration.parse(iso8601);
}

Let’s try it out:

    System.out.println(parse("20:33.123"));
    System.out.println(parse("123:20:33.123"));

Output is:

PT20M33.123S
PT123H20M33.123S

My two calls to replaceFirst first handle the case with hours, then the case without hours. So either will convert a string that matches your regex to ISO 8601 format. Which the Duration class then parses. And as you can see, Duration also prints ISO 8601 format back. Formatting it differently is not bad, though, search for how.

Time4J

The Time4J library offers the really elegant solution very much along the same line of thought as yours. All we really need is this formatter:

private static final Formatter<ClockUnit> TIME_CODE_PARSER 
        = Duration.formatter(ClockUnit.class, "[###hh:mm:ss.fff][mm:ss.fff]");

Simply use like this:

    System.out.println(TIME_CODE_PARSER.parse("20:33.123"));
    System.out.println(TIME_CODE_PARSER.parse("123:20:33.123"));
PT20M33,123000000S
PT123H20M33,123000000S

The Time4J Duration class too prints ISO 8601 format. It appears that it uses comma as decimal separator as is preferred in ISO 8601, and that it prints 9 decimals on the seconds also when some of them are 0.

In the format pattern string ###hh means 2 to 5 digit hours, and fff means three digits of decimal fraction of second.

Anything wrong with your approach?

Was there anything wrong with your approach? ChronoField.HOUR_OF_DAY means that: hour of day. 0 is midnight, 12 is noon and 23 is near the end of the day. This is not what you want, so yes, you are using the wrong means. While you can probably get it to work, anyone maintaining your code after you will find it confusing and will probably have a hard time making modification in line with your intentions.

Links

2
votes

Try with two optional parts (one with hours, other without) like in:

var formatter = new DateTimeFormatterBuilder()
    .optionalStart()
      .appendValue(HOUR_OF_DAY, 2, 4, SignStyle.NEVER).appendLiteral(":")
      .appendValue(MINUTE_OF_HOUR, 2).appendLiteral(":")
      .appendValue(SECOND_OF_MINUTE, 2)
    .optionalEnd()
    .optionalStart()
      .parseDefaulting(HOUR_OF_DAY, 0)
      .appendValue(MINUTE_OF_HOUR, 2).appendLiteral(":")
      .appendValue(SECOND_OF_MINUTE, 2)
    .optionalEnd()
    .toFormatter(Locale.ENGLISH);

I do not know about TimecodeHour, so I used HOUR_OF_DAY to test
(also too lazy to include fractions)

1
votes

I think fundamentally the problem is that it gets stuck going down the wrong path. It sees a field of length 2, which we know is the minutes but it believes is the hours. Once it believes the optional section is present, when we know it's not, the whole thing is destined to fail.

This is provable by changing the minimum hour length to 3.

.appendValue(TimecodeHour.HOUR, 3, 4, SignStyle.NEVER)

It now knows that the "20" cannot be hours, since hours requires at least 3 digits. With this small change, it now parses correctly, whether the optional section is present or not.

So presuming that the hours field really does need to be between 2 and 4 digits, I think you're stuck with having to implement a workaround. For example, count the number of colons in the string and use a different formatter depending on which one you run into. Using a different delimiter besides a colon for the hours would also work.

The parser logic has undergone quite a few bug fixes over the various Java versions since it was introduced - as you can imagine, there are so many potential edge cases - so I was hopeful using a recent version of Java would make this problem disappear. Unfortunately, it seems even in Java 16, the behaviour is still the same.