Wanted: a non-sorting `arraydifference(` function


#1

A side effect of thearrayboth( and arraydifference( functions is that the input parameters are sorted. Does anybody have an arraydifference( function that doesn’t sort?


#2

I incorrectly tagged this post as Panorama X - I actually want it for Panorama 6.


#3

I’m not sure what you mean by “the input parameters are sorted”, the input parameters aren’t changed at all. It’s the output that is sorted.

That said, sorting isn’t a side effect, it is integral to how these functions work. A “non-sorting” version of these functions would be MUCH slower, for large arrays it could be hundreds or even thousands of times slower.

Also, a “non-sorting” version would also bring up the question of what order the output should be in. If the two input arrays are in different orders, there is no general correct answer to that question. Sure, for some small arrays there could be a non-sorted version that might be the obvious answer to a human, but that would only be because the data was actually sorted in some way that wasn’t alphabetical but was recognizable by humans.


#4

Depends how pedantic you want to be. The parameters are concatenated and that concatenation is sorted - I see that as sorting the parameters.

Well, last time I looked, I was human. My data isn’t sorted, its occurrence is according to the date on which it was acquired and I want it to remain in that order.

I’ll modify the code and see how much it slows down - there are 250,000 records, each of which requires an arraydifference operation in a formulafill statement so it will be a good test.


#5

Ok, you are talking about what the code does internally.

If you simply take out the arraydeduplicate and arraysort statements, it will undoubtably run quite a bit faster. However, it will no longer reliably produce a list of differences. The two arrayfilter statements further down in the procedure rely on the fact that the data is sorted. They won’t produce a correct list of differences if the data is not sorted.

Your request is literally impossible. You have two arrays, there is no way to tell where an item in the first array should be placed in relation to an item in the second way. (At least there is no way to tell from the arrays themselves, you may have additional information outside the array that can resolve that, but it is not available to the arraydifference function.)

Suppose you have this array:

dog
cat
bird

And this array:

zebra
lion
elephant

The arraydifference of these two arrays will be an array with six elements. But if not ordered, what is the order that should be used?? Where does zebra go – before or after dog, or some other spot? There simply is no such thing as “remaining in the same order” for this operation.


#6

The arraydifference( function removes from the first array those elements which also occur in the second array. If they are not sorted, the remnants of the first array will still be in the desired order.


#7

It also removes from the second array those elements which also occur in the first array, which is where the impossibility lies. Perhaps in your data, all elements in the second array will always be in the first array. If so, that means that the task is not impossible. However, simply removing the sorting code from the arraydifference code is not going to get the job done, it will take entirely new code to do this since you won’t be able to rely that duplicate values will always be next to each other in the array.


#8

You must be looking at a different function to me. The output from the arraydifference( function I see is a new array which leaves both of the input arrays unchanged.

The original ProVUE code took 24 minutes to process 250,000 records. My modified code took 29 minutes.

I replaced this code:

    arraydeduplicate arraystrip(a1,sep),a1,sep
    arraydeduplicate arraystrip(a2,sep),a2,sep
    both=a1+sep+a2
    arraysort both,both,sep
    
    arrayfilter both,both,sep,?(import() = array(both,seq()-1,sep),import(),"")
    arraystrip both,sep
    
    if both<>""
        arrayfilter a1,diff,sep,?(arraycontains(both,import(),sep),"",import())
        arraystrip diff,sep
    else
        diff=a1
    endif

with this:

arrayfilter a1,diff,sep,?(arraysearch(a2,import(),1,sep)>0,"",import())
arraystrip diff,sep

and it works.


#9

For the benefit of anyone reading this thread in the future, Michael’s code is definitely NOT a replacement for the arraydifference function. It may be just what Michael needs for his current application, but it is not the same operation. His code produces a list of items that are in the first array that are not in the second array. But it does not include items that are in the second array that are not in the first array. The arraydifference function included with Panorama results in a list of items that are in either the first or second array, but not both. If Michael’s function was going to be included in Panorama, it would probably be named something like itemsonlyinfirstarray(.


#10

OK, I was fooled by this description:

“This statement compares two arrays. The result is a list of elements that are included in the first array but not the second.”

which I interpreted as what I wanted to do.


#11

The acid test of course is that

message arraydifference( "43,2,3,4,5,04","3,4,5,6,7",",")

gives the result, “04,2,43”


#12

Actually, you weren’t fooled – rather, I have egg all over my face. Somehow I had an incorrect recollection of what this function did. The documentation is correct, and so are you.

In addition to dealing with my embarrassment, I also need to check to make sure I have not used arraydifference( anywhere incorrectly, based on my faulty understanding of its operation … time passes … ok, it looks like the four places it is used in Panorama libraries are correct.

My apologies for my faulty advice.


#13

That’s one demerit point to you compared with a gazillion for me.