Numeric Commas in .csv file And how to deal with them


#1

I receive a couple of files daily in .csv format. I have a procedure to open them with file load, do the parsing I need and import them into the working DB.
The problem is when they send files where one of the fields contains a number over 1000, they include a comma in the number, which breaks the import procedure.

This must be a common problem, so wondering the best way to deal with it. Is there a Pan setting to import everything as tab delimited. Or a grep-like way of checking and deleting the offending comma?


#2

There’s an easy way outside of Panorama: Open the .csv file in Excel or Numbers and format that column without a thousands separator. Save the file again as a .csv file. — In Excel you have the choice to save the file as a tab separated file.


#3

Going outside of Panorama is what I’m trying to avoid. Right now it’s Click a button, choose the file. 99% of the files Do Not have an offending field, so it would be a drag to have to open and inspect each first. If worse comes to worst, I can probably throw in a quick Applescript sub procedure, but I’m betting there is a way for Pan to do this.


#4

That’s why I generally prefer tab separated files when I transfer my data from Excel to Panorama, and I, too, like it to import data with one step only.
But that depends on the quality of the data. In a well-formed comma-separated file, values containing commas should be quoted, and I would not expect those cells to break the import into Panorama.
I do not know your import procedure, but for sure you can use importusing and a formula that deletes those commas inside cells during the import.


#5

What program is exporting csv data with commas in the data? That’s nutty.

However, if it is a properly formatted csv file, then any data with commas in it will be surrounded with quotes. For example:

Widget,"1,200.00"
Thingy,"4,500.00"

If Panorama X sees quotes like that, it will correctly import the data (Panorama 6 will not). If there are no quotes, I don’t think any program will correctly parse the data. How would any program know whether that was one number or two? (By the way, Panorama X will also export csv data with quotes if necessary.)

If there are no quotes, you would have to write a custom procedure to parse the data yourself. Panorama can certainly do that, but the trick is how to do the parsing. What you would need to do is load the text into a variable, then use a loop to process it line by line. If the parsing is simple, you might be able to do it with arrayfilter instead of a loop, which would be a lot faster.


#6

It’s from our credit card provider. I don’t know what they are using. I guess I need to talk to them about their formatting, though I’m surprised it hasn’t come up before.
Thanks
Martin McCaffery
martinmc@knology.net


#7

Look at the file in a text editor. It should show how the file is formatted.

TextWrangler has solved a lot of problems for me.


#8

Yup, did that yesterday. Turns out some of the cells are formatted $#### and some are $#,###. I was using array filter and array(import(),#,",” to parse the file. Worked fine as long as there wasn’t a comma in the number.
Going to explore importing directly in to a panorama file and correcting within there instead of an array.
Martin

Martin McCaffery
martinmc@knology.net


#9

Then you should contact your provider and tell them that their file is not formatted correctly. This is going to be a problem for anyone using the file.

It could be that the file is tsv, rather than csv. Did you check that the separators are actually commas?


#10

I still do not know if that .csv file is well-formed and the values containing commas are quoted or not. Martin did not say they are not quoted. So I would not blame the provider.
I think the problem is in Martin’s import procedure. If his procedure loads the .csv file into a text array, maybe the quotes are already eliminated, and then he cannot discern values from field delimiters.
As I said before, I would recommend to use the importusing statement. In the formula of this statement you can modify the imported “cells” on the fly.


#11

Kurt:
You got it correct. Am looking at importusing to load the db instead of an array.
Martin

Martin McCaffery
martinmc@knology.net


#12

There is an easy way to find out if the file is formed correctly or not. Drop it onto a spreadsheet or two and see if it comes out correctly.

If it does, then the problem is probably the delimiters. Panorama can handle that. Otherwise, it is a problem with the file.