Space and non-breaking space - chr(32) and chr(160) - are equated in arraysearch( of data array


#1

In seeking the equivalent of arrayboth( of a data array, I used something like this code:

local a,b,c,d
a = dataarray("*"+chr(32),"am","pr")
b = dataarray("am","*"+chr(32),"*"+chr(160),"pr")
c = arrayfilter(b,{?(arraycontains(a,import()),import(),"")})
message exportdataarray(c,";")

It gives the result, “am;* ;* ;pr” which includes both the space and the non-breaking space when only the former is common to both.

I’m using data arrays because, with the data I’m processing, text arrays cause other, previously reported problems - see Contains operator vs. diacriticals (was The nature of the separator appears to affect the result of a formulafill).

I seem to be out of luck either way. Jim found a solution to the earlier problem by choosing a different separator but there’s no guarantee that the Unicode problem won’t arise again with a different data set.


#2

A better example would be

local a
a = dataarray("*"+chr(32),"am","pr")
if arraycontains(a,"*"+chr(160))
    message "true"
else
    message "false"
endif

Arrayfilter( really has nothing to do with this. It’s arraycontains( that is equating those characters, which in turn means that arraysearch( is doing it. Arraycontains( is a custom function that uses arraysearch( in its formula.

A better title for this topic would be

Space and non-breaking space - chr(32) and chr(160) - are equated in arraysearch( of data array


#3

I did some research, and this function uses Apple’s localizedCompare: method to compare the text. This ensures that text is compared properly in your language. Apparently this Apple method treats different forms of spaces as identical.

For Michael’s application, he would clearly prefer an exact comparison rather than a localized comparison. But then I’m sure that would be wrong for other users. Perhaps ultimately I’ll need to add multiple versions for every combination, but that could get overwhelming.


#4

Point taken Dave.

I confess to being a bit baffled by this - I thought I was asking the simple question, does chr(32) match chr(160) and I thought the answer was pretty obvious. This, coupled with the earlier problem of:

is very worrying - one wonders what other text comparison problems are lurking out there.

I have a solution in this instance - instead of comparing two text strings, I’ll compare the set of their Unicode values. So, does "* " match "* " (where the two spaces are different) will become does “42-32” match “42-160” and, hopefully, it won’t. It adds an overhead, especially when I get to comparing much longer strings, but it seems unavoidable.


#5

You might find something like this useful.

local a,b,c,d
a = dataarray(texttobinary("*"+chr(32)),texttobinary("am"),texttobinary("pr"))
b = dataarray(texttobinary("am"),texttobinary("*"+chr(32)),texttobinary("*"+chr(160)),texttobinary("pr"))
c = arrayfilter(b,{?(arraycontains(a,import()),binarytotext(import()),"")})
message exportdataarray(c,";")

Texttobinary( can convert one string at a time, rather than one character at a time.


#6

Excellent idea, Dave. Since the values are all converted to binary, the arraycontains( will be using a binary comparison, which will only match on an exact match, not a localized match. It might even be a bit faster.


#7

An excellent idea indeed - thanks Dave. Can I just check that, for each of two arrays to be compared with the arrayboth( function, I would do the following:

Array1 = arrayfilter(Array1,Sep,{texttobinary(import())})

#8

Looking at it again, that doesn’t look right. I think I’ll have to start at the arraybuild( stage.


#9

It will need to be dataarraybuild(. You can’t put binary data in a text array.


#10

I realise that - my lazy shorthand misled you.