1
votes

I have a cloud-backed file system that is made with Fuse for MacOS. Apparently this file system treats files that use different unicode normalization forms as different files. So you can have a file in NFC and another one with the same name in NFD.

So my question is, is it possible to read these two file names as two different strings from Java? Cause File.getName(), File.getPath(), Path.getFileName(), etc seem to return the NFC normalized strings for both files, thus considering them both as equal.

1

1 Answers

1
votes

So I think I found the answer as I was writing the question, and it looks like there is one way to get the filename in its original normalization form, and that is through the Path.toUri() method. Interestingly enough the File.toURI() does not do the same.

Here is an example that works:

    Files.list(Paths.get("/path/to/my/folder"))
            .filter(Files::isRegularFile)
            .forEach((f) -> System.out.println(org.apache.commons.codec.binary.Hex.encodeHex(f.toUri().getPath().getBytes())));

And here is one that doesn't:

    File folder = new File("/path/to/my/folder");
    File[] listOfFiles = folder.listFiles();
    for (int i = 0; i < listOfFiles.length; i++) {
        if (listOfFiles[i].isFile()) {
            System.out.println(org.apache.commons.codec.binary.Hex.encodeHex(listOfFiles[i].toURI().getPath().getBytes()));
        }
    }