Using semicolons instead of commas in European format CSV files

I have to correct myself: I have never faced this problem, because I have used TSV text files instead of CSV files all the time, when I was exporting data from Excel to Panorama 6 or X.

But Panorama has all the tools to change the separator character on the fly during the import. For all new customers it would certainly be more comfortable, when Panorama X would recognize semicolons or other separators automatically or with import presets.

This might be of some help:

IETF Tools: CSV Format

Yes, I am aware of that document. It’s better than nothing, but it is not an official standard. It even says right at the start that “It does not specify an Internet standard of any kind.” and later says “there is no formal specification in existence … this section documents the format that seems to be followed by most implementations.” It appears to have been created by a small company, independent of any standards body.

In any case, this document doesn’t mention semicolons at all, it seems to assume commas as the only delimiter.

I only brought it to the attention of the forum because it mentions that the text/csv MIME type was registered with the IANA. It appears to be a formal definition. For what that’s worth.

The IANA document mentioned (RFC 2048) is for MIME types – that document tells how to identify MIME types and what the process is for registering new MIME types, but is silent on the content of any particular mime type, which is what we are interested in. In fact, RFC 2048 does not even mention any particular mime types (text/plain, text/html, text/csv, image/jpg etc.)

Bottom line, there is no official standard for the CSV format. And even if there was a standard, it would not be of much help unless Microsoft adhered to it, and to a lesser extent, Google (Docs) and Apple (Numbers). So I guess we’ll just have to continue muddling along.

In Germany, Excel indeed uses semicolons when you save a worksheet as a CSV file. (I am quite sure it is using commas in the U.S…) So I think Excel is reacting to system settings for international formats e.g. German decimal commas.

In the absence of any standard, why not just make the delimiter an options setting, like the TapForms example in this thread?

Definitely not going to do that, except perhaps as a separate command for advanced users. Right now it works perfectly 99% of the time, for almost all customers, with no user UI. Adding a confusing dialog for EVERY user is a big step backwards in usability. Most users wouldn’t even know what to do with such a dialog. It needs to figure out how to do the right thing automatically.

I’m no Steve Jobs, but I try to think like him as much as I can for questions like this. I’m pretty sure he would not be a fan of adding an extra dialog in a situation like this to enable an option that is going to be used <1% of the time.

It would be a perfect solution to add the separator options to the existing Text Import wizard.

Jim,

Sorry - was not suggesting a dialog. Merely an option in settings. It’s out of the way and comma by default… but changeable by advanced users if needed.

Of course, if you can write a routine to parse the incoming data and automatically know what my delimiter is, then even better.

Karl

That’s a possibility, but I would anticipate technical support problems from customers that are upset that the import doesn’t work and don’t know and/or don’t mention how the setting is configured when inquiring for support, making troubleshooting difficult. So best to avoid that if possible.

Folks need to recall that PanX’s current implementation of csv involves more than just commas used as delimiters. There also is ‘special quote processing’ to deal with embedded commas, quotes, returns, etc., but not, IIRC, semicolons, within fields (which can be optionally turned off in importtext or importline.) That probably would also require adjustments and would become another can of worms I wouldn’t envy Jim for re-opening. TSV text avoids those problems.

Yes, Panorama X has a built in CSV “engine” that handles all that stuff, and it does so based on the IETF document mentioned earlier in this thread. Even though that isn’t actually a standard, it’s as close to one as we’ve got. That engine is used by the importtext and importline statements, and the importtext statement is used by the text import commands in the File menu (including New Database from Text File).

These statements allow a program to explicitly set the separator, or to let the engine try to automatically figure out what the separator is. When automatic mode is used, it looks for tabs, and if it finds any, it assumes tab separated, otherwise comma. That’s pretty easy. Identifying comma vs. semicolon will be quite a bit tricker. For example, this data contains both semicolons and commas, but probably semicolon is the separator. But not for sure – 7;widget;34 could be a valid cell value!

7;widget;34,12

Maybe if the system setting for a decimal point is “,” I can assume semicolons for the separator. Though that isn’t ideal because what if someone in the US receives a file generated in Europe, or the reverse? I guess you could temporarily change your system settings. Right now the analyzer only looks at the first line, I suppose it could look at multiple lines and see which separator gives a consistent number of columns. That sounds hard, though, and slower.

another can of worms

That’s for sure!!

While I’m at it, I guess the CSV export also needs to be changed to use semicolon separators depending on the decimal point separator.

No, Jim, there is no need to do that, because ALL applications you would use to import a CSV file are understanding the comma separated format. Here is for example the Import Settings dialog of Numbers with a sample CSV import file from Panorama X.

If the system decimal point setting is comma, shouldn’t the export use commas as decimal points in the numbers, and then use semicolon as the field separator? In other words, shouldn’t the export be localized based on the region settings, instead of always exporting using a US centric format?

That’s certainly true, but at the moment we have a one-directional problem only, since there is no easy way to import semicolon-separated CSV files into Panorama X.

There is no problem exporting CSV data (separated with commas) and importing them into Excel, Numbers, or other database applications.

I live in France and have been using Panorama since forever.

Every month I import a file from my bank’s site (a major French bank, actually the ex-state bank) with the activity on my account for the last month. I usually ask for a TSV file but this time I requested a “CSV” file. And I confirm that, indeed, the separator is a semicolon, which is logical as the decimal point is a comma, and there are lots of them.

Hope this helps.

Ellen

As a French Canadian user, I avoid this issue because my text files are always saved with the tab-delimited option. From the following screen shots, you will understand why.

  • CSV1 gives an idea of the original data.
  • CSV2 shows the “save as” options that I used (1, 2, 3, 4).
  • CSV3 compares the results of importing these files in Panorama X.

Hope this helps.

Thanks,
Nelson

CSV is standardized by RFC 4180 which clearly only uses the COMMA token as separator.

German numbers use a decimal point but a comma as thousands separator. The Dutch reverse this and the Swiss use single quotes for thousands. Neat. Needless to say Europe is a mess at least in this respect (they do use the handy metric system god bless them). No idea what Asia or Australia does.

So in other words you need to be prepared for anything and write or configure your software accordingly

RFC 4180 itself clearly states that it does not specify a standard of any kind. It’s just some notes that somebody put together. It is definitely NOT exactly followed by major software vendors including Microsoft, etc.

In the four years that this discussion was paused, Panorama was changed to so that by default it will change the delimiters based on the number separators specified in the Language & Region system settings.

If the decimal point is set to a comma, then Panorama uses semicolons as the default field separator, otherwise, it uses commas. This seems to have solved the problem. If you don’t want to use the defaults, you can write your own code and specify exactly what you want with the importtext statement.