Finding duplicate values in the same row

CooperT · October 17, 2020, 5:33pm

I am trying to develop procedures that can tally the votes in a rank order voting system. One of the steps in tallying the vote is to check for overvotes, that is, did the voter mark two different candidates as their first preference. (Or second preference once their first preference has been eliminated, etc. )

So I want the most efficient code to test a single row to see if the value n appears more than once, that is, is there a duplicate vote on the ballot? Can anyone do better than this formula?

arraysize(arraystrip(arrayfilter(exportline(),tab(),
    {?(val(import())=n,import(),"")}),tab()),tab())>1

This formula exports the row to a tab separated array, filters the array to leave only the value I want to test for, strips the blank array elements, and then counts the remaining elements. If it is greater than 1, I found a duplicate. Theoretically, this formula may have to be applied to large number of rows (you could have millions of votes if you were trying to do this with a large voter group in a state), so it better be fast.

I have the feeling that I may be missing some PanX function that might make this much easier.

dave · October 17, 2020, 5:44pm

I think you want to check out the arrraydeduplicate( function.

arraysize(arraystrip(exportline()tab()),tab())>arraysize(arraydeduplicate(arraystrip(exportline()tab()),tab())

dave · October 17, 2020, 6:08pm

My first formula is looking for the bottom line. Are there any duplicate votes? To check to see if a specific rank has been duplicated, you could use the rank itself as the separator in an arraysize( function. If the size is greater than 2, the separator appears more than once.

gary · October 17, 2020, 6:14pm

My version is slightly different from Dave’s since I’ve incorporated the cache( function so only one exportline() and one arraystrip( are used. It may be a little faster.

arraysize(cache(arraystrip(exportline(),tab()),"theLine"),tab())>arraysize(arraydeduplicate(theLine,tab()))

I have not tested this except for basic syntax.

CooperT · October 17, 2020, 8:29pm

Thanks for your ideas. I thought there must be other approaches.