I’ve just heard from a potential new customer in Germany who says that CSV files in Germany are normally separated by semicolons, not commas (since the commas are used as decimal points in numbers). This makes sense to me, but I’ve never heard of this before. Before I go willy nilly making a major change like this, I’d like to get further confirmation that this is really correct. I know there are some European users on this forum, can you confirm this? Ideally I would like to find some sort of standards documentation that would show this, but I don’t think there is an official standard for CSV.
There is a pretty extensive discussion of this subject here:
Indeed, the CSV format is not fully standardized. I append a screenshot from the import dialog of TapForms 5, where you can set and choose custom import presets for various import parameters (like column separators, day formats, text code like UTF8 or ASCII). Typical column separators are comma, tab, colon, semicolon, vertical slash. Excel has a somehow similar dialog for the import of text files.
But I would like to add that I never — in all the years — had to figure myself how to handle different column separator characters in Panorama or Panorama X. If I remember correctly, Panorama analyzes the first line of the import file and recognizes the separator character automatically.
(Edited: This applies for the disambiguation of tabs and commas as separators only. Pan X does not automatically recognize semicolons as separators.)
I have to correct myself: I have never faced this problem, because I have used TSV text files instead of CSV files all the time, when I was exporting data from Excel to Panorama 6 or X.
But Panorama has all the tools to change the separator character on the fly during the import. For all new customers it would certainly be more comfortable, when Panorama X would recognize semicolons or other separators automatically or with import presets.
Yes, I am aware of that document. It’s better than nothing, but it is not an official standard. It even says right at the start that “It does not specify an Internet standard of any kind.” and later says “there is no formal specification in existence … this section documents the format that seems to be followed by most implementations.” It appears to have been created by a small company, independent of any standards body.
In any case, this document doesn’t mention semicolons at all, it seems to assume commas as the only delimiter.
I only brought it to the attention of the forum because it mentions that the text/csv MIME type was registered with the IANA. It appears to be a formal definition. For what that’s worth.
The IANA document mentioned (RFC 2048) is for MIME types – that document tells how to identify MIME types and what the process is for registering new MIME types, but is silent on the content of any particular mime type, which is what we are interested in. In fact, RFC 2048 does not even mention any particular mime types (text/plain, text/html, text/csv, image/jpg etc.)
Bottom line, there is no official standard for the CSV format. And even if there was a standard, it would not be of much help unless Microsoft adhered to it, and to a lesser extent, Google (Docs) and Apple (Numbers). So I guess we’ll just have to continue muddling along.
In Germany, Excel indeed uses semicolons when you save a worksheet as a CSV file. (I am quite sure it is using commas in the U.S…) So I think Excel is reacting to system settings for international formats e.g. German decimal commas.
In the absence of any standard, why not just make the delimiter an options setting, like the TapForms example in this thread?
Definitely not going to do that, except perhaps as a separate command for advanced users. Right now it works perfectly 99% of the time, for almost all customers, with no user UI. Adding a confusing dialog for EVERY user is a big step backwards in usability. Most users wouldn’t even know what to do with such a dialog. It needs to figure out how to do the right thing automatically.
I’m no Steve Jobs, but I try to think like him as much as I can for questions like this. I’m pretty sure he would not be a fan of adding an extra dialog in a situation like this to enable an option that is going to be used <1% of the time.
It would be a perfect solution to add the separator options to the existing Text Import wizard.
Sorry - was not suggesting a dialog. Merely an option in settings. It’s out of the way and comma by default… but changeable by advanced users if needed.
Of course, if you can write a routine to parse the incoming data and automatically know what my delimiter is, then even better.
That’s a possibility, but I would anticipate technical support problems from customers that are upset that the import doesn’t work and don’t know and/or don’t mention how the setting is configured when inquiring for support, making troubleshooting difficult. So best to avoid that if possible.
Folks need to recall that PanX’s current implementation of csv involves more than just commas used as delimiters. There also is ‘special quote processing’ to deal with embedded commas, quotes, returns, etc., but not, IIRC, semicolons, within fields (which can be optionally turned off in
importline.) That probably would also require adjustments and would become another can of worms I wouldn’t envy Jim for re-opening. TSV text avoids those problems.
Yes, Panorama X has a built in CSV “engine” that handles all that stuff, and it does so based on the IETF document mentioned earlier in this thread. Even though that isn’t actually a standard, it’s as close to one as we’ve got. That engine is used by the
importline statements, and the
importtext statement is used by the text import commands in the File menu (including New Database from Text File).
These statements allow a program to explicitly set the separator, or to let the engine try to automatically figure out what the separator is. When automatic mode is used, it looks for tabs, and if it finds any, it assumes tab separated, otherwise comma. That’s pretty easy. Identifying comma vs. semicolon will be quite a bit tricker. For example, this data contains both semicolons and commas, but probably semicolon is the separator. But not for sure –
7;widget;34 could be a valid cell value!
Maybe if the system setting for a decimal point is “,” I can assume semicolons for the separator. Though that isn’t ideal because what if someone in the US receives a file generated in Europe, or the reverse? I guess you could temporarily change your system settings. Right now the analyzer only looks at the first line, I suppose it could look at multiple lines and see which separator gives a consistent number of columns. That sounds hard, though, and slower.
another can of worms
That’s for sure!!
While I’m at it, I guess the CSV export also needs to be changed to use semicolon separators depending on the decimal point separator.
No, Jim, there is no need to do that, because ALL applications you would use to import a CSV file are understanding the comma separated format. Here is for example the Import Settings dialog of Numbers with a sample CSV import file from Panorama X.
If the system decimal point setting is comma, shouldn’t the export use commas as decimal points in the numbers, and then use semicolon as the field separator? In other words, shouldn’t the export be localized based on the region settings, instead of always exporting using a US centric format?
That’s certainly true, but at the moment we have a one-directional problem only, since there is no easy way to import semicolon-separated CSV files into Panorama X.
There is no problem exporting CSV data (separated with commas) and importing them into Excel, Numbers, or other database applications.