1
votes

I have a perl string containing Unicode characters and I want to create a file with this string as a filename. It should work on Windows, Linux and Mac whatever the locale used. Here is my code:

use strict;
use warnings FATAL => 'all';

use Encode::Locale;
use Encode;

# ファイル.c
my $file = "\x{30D5}\x{30A1}\x{30A4}\x{30EB}.c";

$file = encode(locale_fs => $file);

open(my $filehdl, '>', $file) or die("Unable to create file: $!");
close($filehdl);

I use encode function because, according to this answer:

Perl treats file names as opaque strings of bytes. They need to be encoded as per your "locale"'s encoding (ANSI code page).

However, this code fails with the following error:

Unable to create file: Invalid argument at .\perl.pl line 15.

I took a deeper look on how the string is encoded by encode:

my $rep = sprintf '%v02X', $file;
print($rep);

This prints:

3F.3F.3F.3F.2E.63

In my current locale (CP-1252), it corresponds to ????.c. We can see that each Unicode characters has been replaced by a question mark. I think it is normal to have question marks here because the characters in my string are not representable using CP-1252 encoding.

So, my question is: is there a way to create a file with a name containing Unicode characters?

Just, you cannot. There is no guarantee that filesystem supports Unicode, and every OS has special cases on what characters are allowed. We are not ready for it.Giacomo Catenazzi
@GiacomoCatenazzi You mean that even if I'm sure my OS can create the file (which is the case if I use directly the File Explorer), I can't write a portable code for doing that, right?Pierre
It's not clear what you imagine that should mean. If the OS and the filesystem support UTF-8, using that should be straightforward; but of course, the assumption that they do isn't portable.tripleee
I would be surprised it there is a portable way (and portable also within the same OS, but different users, different file systems [e.g. USB stick, external disks]), etc.). But let's see if somebody have a solution.Giacomo Catenazzi
If you are on Windows, see also Win32::Unicode::File and this questionHåkon Hægland