I’ve used the Select Duplicates menu option to see that I have some duplicate records. I’d like to delete all but one of each. It looks like this isn’t possible by default.
I can see from the forum that it’s been discussed before; and there are hints that an easy way would be to sequence by group number; but I can’t quite figure out the best way to do this:
Long, long ago, before there was dirt, there was UnPropagate. In order for Select Duplicates to work, the records had to be sorted on the field you were checking for duplicates.
As I’m remembering it - you didn’t even need SelectDuplicates. You’d sort up on the field to cluster duplicate records together. Then you’d unpropagate which would empty that field for any records that had duplicate content. Then you’d select on that field not equal to empty, and finally RemoveUnselected. NOTE - this ASSUMES legitimate records would always have content in that field.
So six statements:
SelectAll
Field YourField
SortUp
Unpropagate
Select (YourField <> “”)
RemoveUnselected
Select Duplicates, on the other hand, selects those duplicate records so you can manually review them and choose which ones to delete. For example, if you have another field that contains a “date added”, you’d want to keep the most recent and you could see which one it was. Of course, in that case, you could also sortwithin on that entry date field after you’ve sorted on YourField - putting the most recent record on top of its cluster.
Thanks @designer! That works perfectly… And somehow, I managed to miss that there’s a specific help page about duplicate removal (I think I possibly searched for “duplicates” rather than duplicate):
There’s more info about dealing with existing empty fields; but for me, this wasn’t necessary as I had complete duplictes (even with primary key duplicated), so it was dead easy to do your steps above.
Here is some code I was playing with that will remove duplicate records that have the contents of all the fields matching exactly. I have included an undo capability as well as using nohow to speed things along.
startdatabasechange "ALLRECORDS","Remove Duplicates"
let startCount=info("records")
sortup
firstrecord
noshow
loop
let x=exportline()
downrecord
loop
if exportline()=x
deleterecord
stoploopif info("eof")
endif
stoploopif exportline()≠x
endloop
stoploopif info("eof")
endloop
firstrecord
showpage
endnoshow
nsnotify str(startCount-info("records"))+
pluralpattern(startCount-info("records")," record~ have been removed!")
By replacing the occurrences of exportline( with a formula you could narrow the search for duplicates to only specific fields (i.e. exportcell(«FirstName»)+tab()+exportcell(«LastName»)). Seems to work in my limited testing but you might want to try it on a copy even though there is the undo feature.
This code is not going to reliably remove all duplicate records. It will only remove duplicate records if they are adjacent. The single sortup statement at the top is not sufficient to make sure that all duplicate records are adjacent to each other (and what field is that supposed to sort? it’s not specified).
If you want to remove duplicates based on ALL fields being duplicate, that could be done using the findbelow statement with exportline(). It would be slow-ish, but would work 100%. I’ve got a lot on my plate so I’ll leave this as a hint without actually writing the code. Note - with this technique you wouldn’t need to sort at all.
You mean in the startdatabasechange statement? No, that doesn’t cause all records to be selected, and yes, adding selectall is also a good idea. The ALLRECORDS parameter indicates what the Undo will apply to, in this case to changes to all records.