Arraydifference( function does not register a duplicate element

michael · July 16, 2019, 3:14am

If array A contains one more element than array B, you would expect the arraydifference( function to identify the extra element. If, however, the extra element is a duplicate of an existing element, the function returns an empty string. Is this how it should be?

If it is, is there a function that would facilitate the search for an extra duplicate?

admin · July 16, 2019, 6:32am

The arraydifference( documentation says:

This function creates a new array from two existing arrays. The new array contains only items that are in the first array but not the second array.

Perhaps it would be more clear if it said “the new array contains only values that are in the first array but not the second array.” That is what is really going on. So it doesn’t matter if a value appears more than once. If the value is in both arrays, it won’t be included in the result, no matter how many duplicates there are. It’s worked this way for I think at least 15 years, and for me at least this function has been very useful.

Sorry, there’s not a simple function that does this. I think you’d have to roll your own with a loop. Or maybe it can be done with a complicated formula, which would be faster. The current arraydifference( function is implemented with a complicated formula, which I’ve listed below. Perhaps that will provide inspiration.

ignore(“”,cache(arraystrip(arrayfilter(cache(arraysort(arraydeduplicate(arraystrip(arrayone,cache(thesep,“thearraysepchar”)),thesep)+thesep+arraydeduplicate(arraystrip(arraytwo,thesep),thesep),thesep),“tempboth”),thesep,{?(import() = array(tempboth,seq()-1,thearraysepchar),import(),“”)}),thesep),“comboarray”))+?(comboarray=“”,arrayone,arraystrip(arrayfilter(arrayone,thesep,{?(arraycontains(comboarray,import(),thearraysepchar),“”,import())}),thesep))

gary · July 16, 2019, 5:58pm

I’ve got to be missing something basic here, but couldn’t this formula simply be:

arraystrip(arrayfilter(arrayone,thesep,{?(arraysearch(arraytwo,import(),1,thesep),“”,import())}),thesep)

And to make an arraycompositdifference( function that not only gives the items in arrayone that are not in arraytwo but also the items that are in array two but not in arrayone:

arraystrip(arrayfilter(arrayone,thesep,{?(arraysearch(arraytwo,import(),1,thesep)=0,import(),“”)})+“.”+arrayfilter(arraytwo,thesep,{?(arraysearch(arrayone,import(),1,thesep),“”,import())}),thesep

So what am I missing or misinterpreting this time?

admin · July 16, 2019, 7:32pm

I’m not sure that you are. There is often more than one way to skin a cat. I think your arraydifference( formula probably works, I would have to do extensive testing. I would also want to do performance testing on various sizes of arrays and composition of values. Definitely I think the current formula would be significantly faster for arrays that contain a lot of duplicate values.

Actually I think your formula isn’t quite going to work if there are duplicate values in the first array? In that situation, isn’t it going to return duplicate values in the output array? Perhaps I didn’t make myself clear – it was intentional that arraydifference( returns a list of values that are in the first array but not in the second. It is supposed to return only one occurrence of each value, no matter how many times that value appears in the original arrays. I think your formula could work the same way simply by adding an arraydeduplicate( function around the whole thing.

Your second function looks like it would probably work, though again I think an arraydeduplicate( is needed. I don’t like the name arraycompositedifference(, maybe arraynotinboth(?

Please take all of my comments with a grain of salt. I don’t have any memory of writing the arraydifference( function, I think it must have been an adaption of different code from Panorama 6. I would really have to dig in very closely to work on this code again now. It does work now, so I would be very leery of making any change without spending a lot of time testing.

admin · July 16, 2019, 8:45pm

So it turns out that there is a bug in arraydifference( in Panorama X, it doesn’t handle duplicate values correctly (Panorama 6 did work).

michael · July 17, 2019, 12:40am

That’s certainly given me something to work on and I’m pleased to have indirectly discovered a bug.