Choosing the right link function

Orlando · November 17, 2018, 4:55pm

I have a target database that contains the full name and ID number (IDN) for that person. I have recipient database that only contains the full name - however, the spelling of the full name in the recipient may be different from that of the target: it may contain a middle initial, a middle name, a second last name, the spouse’s last name and so on. How may I do lookup for the IDN in the target database if any of the names in it matches any of the names in the recipient?

gary · November 17, 2018, 6:55pm

For the complicated nature of your needs I would look into the superarraybuild statement.

Since your query could return more than a single possibility it would probably be best to return a list containing both the names and IDs that match your formula so you can then select the exact one you need.

Orlando · November 17, 2018, 9:16pm

Can you please help me write the formula? Target contains only two fields, Name and IDN. Recipient only contains Name.

gary · November 18, 2018, 12:45am

Now that I look more closely at your current formatting in the recipient database I don’t really see any logical way to check for all these things with any kind of lookup or arraybuild. There are just too many variables and oddly formatted possibilities. Unfortunately it looks pretty much to me like a manual operation is needed to clean up that recipient database and maybe at least separate out the last names to a new field so that you would at least have a starting point for the search.

admin · November 18, 2018, 12:56am

Darn, I was looking forward to seeing what sort of magical solution Gary had in mind! Unfortunately I concur with Gary, I don’t think this can be done except by manually cleaning up the data.

michael · November 18, 2018, 1:08am

Unlikely, given that one file has over 4 million records and the other has about 1.4 million!

I’ll work with Orlando on this to get as near a solution as possible. And he wants it before the end of the weekend.

BruceDeB · November 18, 2018, 1:15am

There are some things that you can do to help you clean up the data. Divide the names by spaces, and then make some assumptions about them: That the first name is always first or after a comma, last name is last or before the comma, etc. Then you can sort them and look for duplicates. But there are no assurances that any particular method will work, and nothing will help you if you have identical names for different people.

michael · November 18, 2018, 1:26am

That’s a good approach for Anglo names Bruce but Orlando is dealing with Spanish names which have some quite complex rules and conventions for their construction - the last name is not necessarily the family name and is most frequently not so for women. One of my approaches has been to sort each name as a space-delimited array - I get a lot more matches that way.

Orlando · November 18, 2018, 3:20am

Thank you all … gracias Miguel!! Mission accomplished.

RAmeeti · November 18, 2018, 6:02pm

LOL. I was thinking the exact same thing.

RAmeeti · November 18, 2018, 6:09pm

Right on with this reality. I have a database of ~16,000 people, and within it are 5 'Charlie Brown’s, each known to be different people. That would translate to ~1,250 different 'Charlie Brown’s in the user’s database and then you add in the accidental unknown duplicates.