0
votes

I'm wondering whether anyone has any insight on converting an array of character codes to Unicode characters, and searching them with a regex.

If you have

var a = [0,1,2,3]

you can use a loop to convert them into a string of the first four control characters in unicode.

However, if you then want to create a regex

"(X)+"

where X == the character code 3 converted to its Unicode equivalent, the searches never seem to work. If I check for the length of the string, it's correct, and .* returns all the characters in the string. But I'm having difficulties constructing a regex to search the string, when all I have to begin with is the character codes. Any advise?

Edit:

var a = [0,1,2,3,0x111]; str = "";

for(var i = 0; i < a.length; i++) {
    str += String.fromCharCode(a[i]);
}

var r = [0x111]
var reg = ""

reg += "(";
for(var i = 0; i < r.length; i++) {
var hex = r[i].toString(16);
    reg += "\\x" + hex;
}
reg += ")";

var res = str.match(RegExp(reg))[0];

Edit

//Working code:
var a = [0,1,2,3,0x111];
str = "";

for(var i = 0; i < a.length; i++) {
    str += String.fromCharCode(a[i]);
}

var r = [3,0x111]
var reg = ""

reg += "(";
for(var i = 0; i < r.length; i++) {
    var hex = r[i].toString(16);
    reg += ((hex.length > 2) ? "\\u" : "\\x") + ("0000" + hex).slice((hex.length > 2) ? -4 : -2);
}
reg += ")";

var res = str.match(RegExp(reg))[0];
1
Can you post just a few line code example of exactly what you are trying to do -- a minimal example that we can look at, rather than guess at? In its current form, it is very difficult to answer this question.Jeremy J Starcher
I hope the above edit is sufficient to get the basic idea across, though in the actual application it will be significantly more sophisticated.AaronF

1 Answers

2
votes

With changes to a few details, the example can be made to work.

Assuming that you are interested in printable Unicode characters in general, and not specifically the first four control characters, the test vector a for the string "hello" would be:

var a = [104, 101, 108, 108, 111]; // hello

If you want to match both 'l' characters:

var r = [108, 108]

When you construct your regular expression, the character code must be in hexadecimal:

reg += "\\x" + ("0" + r[i].toString(16)).slice(-2);

After that, you should see the results you expect.