Country codes separated by comma into columns with full names?

Squirrel · February 16, 2024, 3:00pm

I have country codes in a column called ‘countries’. Some are multiple countries in the same field divided by a comma. I want to extract the full name of the country and insert it into a new column ‘country_long’. To accommodate the fields with more than one country, I’d like to automatically create the new columns ‘country_long_1’, ‘country_long_2’, etc and populate those with the other country codes in a series.

Is any of this possible through a script?

The final result will need to eventually also be exported into a csv or json file.

designer · February 16, 2024, 6:22pm

Squirrel, an example of the the data in the field would be helpful. I could imagine, but I might be wrong. Initially, you are saying that you have a column called countries that contains a country code. Then you say there could be multiple countries in the same field but from the previous statement would those be multiple country codes?

From your desire to extract the full country name, it sounds like you might have pairs of country code and country name, and the pairs are separated by a comma. If that’s the case, what are the country codes and their matching names separated by?

It might help to know if the codes are all the same length and any consistent alpha-numeric characters in their construction. Just the comma itself could be used as a delimiter - if it was well-behaved.

Without further description, it appears you are talking about a field that contains one pair - country code, country name - or multiple pairs - country code1, country name1, country code2, country name2, country code, country name 3.

Just at the beginning, it wouldn’t be difficult to find the record that has the most countries so you know ahead of time how many fields you’ll need to add. Then you just create them first and eliminate a lot of complexity from the distribution routine.

Just to say, I am always leery of actions that grow a database in fields, rather than records. If we had an example of what your exported file would look like, perhaps you might not need those extra columns.

I don’t know the number of records you are dealing with - all those details count - but two alternative methods are to simply build the data to be exported in a variable, inserting commas and carriage returns where needed. Or, as the data is extracted from the field in databaseA, it is added as a new record in databaseB (which just has two fields; one for the code and one for the country name), then export from DatabseB.

Another detail would be if this is going to be a continual operation or a “one-off” just to get the job done once. Those influence the elements of efficiency and elegance. If it’s just a one-off, any old code would do. If there are thousands of records and it will be run daily, one might spend a little more time working on the solution.

gary · February 16, 2024, 7:03pm

The country codes can consist of either 2 or 3 alphanumeric uppercase characters. Here is a small sample of a list I put together showing the 2 and 3 codes and the corresponding countries.

Squirrel · February 17, 2024, 12:44am

Hi Designer. Great point! The country codes are ISO Alpha 2. All countries are represented by two capital letters. CA for Canada, DE for Germany, etc. In some fields, there exist multiple country codes separated by a comma. Ex: CA, DE, MX, UK for Canada, Germany, Mexico and the UK.

It might help to know if the codes are all the same length and any consistent alpha-numeric characters in their construction. Just the comma itself could be used as a delimiter - if it was well-behaved.

They are all consistent in the spacing and use of comma. In some cases there is one country, in others a few and in other fields many. The countries, in this case, are showing where an organization has been active. Where things get sloppy is the order of the countries. For example: US, UK is also shown as UK, US

The list of country codes and standard can be found here. https://www.nationsonline.org/oneworld/country_code_list.htm

Without further description, it appears you are talking about a field that contains one pair - country code, country name - or multiple pairs - country code1, country name1, country code2, country name2, country code, country name 3.

The existing field contains the heading ‘countries’. That’s it. My thinking as to create columns for the country long form to be inserted so it is not changing the ISO country codes, but adding column(s) with the full spelling. Although re-writing the countries field with something like UK United Kingdom, CA Canada would also work. Pic below.

If it’s just a one-off, any old code would do. If there are thousands of records and it will be run daily, one might spend a little more time working on the solution.

Any old code will do. It’s not a one-off but it is not a regular high volume churn either. Currently there are 560 records and eventually that may be as high as 3500.

PS - I always feel like you are teaching me - both to be more concise in my descriptions and how to think around problems with a PanoramaX brain. It is much appreciated.

admin · February 17, 2024, 1:17am

Here’s one way this could be done. I’m assuming you have set up 6 fields country_long_1, country_long_2, country_long_3, etc.

setglobaldictionaryvalues "CountryList",
    "AF","Afghanistan",
    "AL","Albania",
    ....
    "US","United States",
for cnum,1,6
    field "country_long_"+str(cnum)
    formulafill globaldictionaryvalue("CountryList",strip(array(countries,cnum,",")))
endloop
deleteglobaldictionaryvalues "Countries"

This is just one way it could be done, there are others, but I think using a global dictionary would probably be the highest performance. Even with several thousand records this should run quite fast.

designer · February 17, 2024, 2:31am

Squirrel,
If you set up a dictionary of the country codes and their expanded names you will have a lot of versatility. So if you don’t know how to do that, it’s a really valuable lesson/exercise because the technique you will learn will help you in many future projects.

I can imagine a situation where you don’t need to fill a field with the expanded name at all - because you can use the country code to display the expanded name on the fly.

For example, if your dictionary was called countrycodename where each element had the code (US) for the key and the full name (United States) as the value, and you had a specific code in a variable ccode, then

getdictionaryvalue(countrycodename,code) would display the country name if it were in the formula of some display object. Something like that. I might be a little off; still learning myself.

I know that’s not what you are asking to do. Just letting you know the potential.

Others can be more specific. For now, a starting place is to look at Data Dictionaries in the Help (All Categories)

Note that Jim’s example used a maximum of 6 codes in a field for an example. So in addition to taking a little time to be familiar with dictionaries, it would be handy to have a procedure that told you the maximum number of expanded name fields you’ll need before hand and set them up first. We (forum members) can help with that too.

However, there is very little overhead penalty to have extra fields that are empty. So you could create more expanded fields greater than you’ll ever need. The unused one won’t “cost” very much - though for me, those extra fields would be like fingernails on the blackboard - just a generational thing; like worrying about the cost/duration of a long-distance call to Canada.

Finally, though a dictionary is one structure to hold a code/name pair, it is also possible to do the whole thing in an array structure with array-type commands.

What if you keep the Country field, but modify its content so you have both a code and its full name in the field - separated by a space? If you have more than one code, you’d have codename then carriage return and code2name2, etc.

The single field would show just the code and name. When there is more than one code, you’d have a separate line in the same field for each code/name pair. The code and name are separated by a space and the different code name pairs are separated by a carriage return.

If you need to strip off either the code of full name (which could have spaces), that would be easy using a text funnel.

The code would be myfield[1," "][1,-2]
The first bracket gives everything up to and including the first space - that would be the code and space, the second bracket gives everything up to the penultimate character - dropping off the space at the end.

and the name would be myfield[" ",-1[2,-1]
The first bracket gives everything from the first space to the end. The second bracket gives everything from the second character to the end of the string - dropping off the leading space.

The kids these days use fancy “regular expressions” for that kind of thing. The old-time-y text funnels are my jam.

That’s for a single code/name pair, with multiple code/names in the field it gets more … interesting.

Let’s see, I don’t want to introduce a gruesome image of pussy cats. There are a lot of ways to pluck leaves from an artichoke in PanX.

admin · February 18, 2024, 1:25am

Here’s a formula that will calculate the maximum number of countries in the countries field.

aggregate({arraysize(countries,",")},"max")

Definitely possible, but I’ll leave that to someone else on this holiday weekend.

Squirrel · February 18, 2024, 8:17pm

Thank you! I will give this a try!

Squirrel · February 18, 2024, 8:18pm

This might be helpful in the future, but currently all the data is being imported from other datasources. That is part of the issue. It comes as it comes and then I need to make it usable for multiple purposes.