Example of Regular Expressions

admin · June 22, 2017, 1:42am

I just fixed a bug using regular expressions, and I wanted to briefly share what I did as I think it really illustrates the power of regular expressions.

The bug was that the sample checkbook data used in the tutorials didn’t import properly in regions that used non-US date and number formats. I know, ugly Americans, right?

The raw sample data looks like this:

01/01/«YY»	100	Sparkletts	Office Supplies		14.20		32,827.78
01/03/«YY»	101	Blue Cross	Payroll	Health Insurance Group Xbg-872	975.00		31,852.78
01/04/«YY»	102	Valley Gas	Utilities	Heating	49.90		31,802.88
01/05/«YY»	103	AT&T	Telecom	Long Distance Phone Service	236.24		31,566.64

(The «YY» is automatically converted to the current year, so don’t worry about that.)

There are two problems – the data format is mm/dd/yy, but many countries use dd/mm/yy. Also, the money values use periods for decimal points, and commas for thousands separator. Again, doesn’t work in all countries.

Here is the new code I added. The sample text has already been loaded into a variable named * importPreviewRawData*

importPreviewRawData = 
	replace(
		regexreplace(importPreviewRawData,
			"\b([0-9]+)\.([0-9]+)\b","$1«DECIMAL_POINT»$2",
			"\b([0-9]+)\,([0-9]+)\b","$1«THOUSANDS_SEPARATOR»$2"),
		"«DECIMAL_POINT»",info("decimalseparator"),
		"«THOUSANDS_SEPARATOR»",info("thousandsseparator"))
if info("defaultdatepattern")="DD/MM/YYYY"
	importPreviewRawData = 
		regexreplace(importPreviewRawData,
			"([0-9][0-9])/([0-9][0-9])/«YY»","$2/$1/«YY»")
endif

This code uses the regexreplace( function to convert the decimal points and thousand separators to the correct symbols, and rearrange the dates if necessary, By using regular expressions, I was able to do this in a general way, so if I later add other sample files they should work automatically.

Unfortunately I don’t really have time right now to explain the actual regular expression patterns I used, though they are really quite simple. And I’m not sure I really came up with the most elegant solution possible. But I just wanted to illustrate what sort of thing is possible. In Panorama 6, this would have been extremely difficult to do (not impossible, but very tedious). By using regular expressions, I was able to get it working in a few minutes.

michael · June 22, 2017, 3:21am

For converting between European and non-European number formats, this simple function will toggle from one to the other if the number is a string (which is the only way it will have a thousands separator):

replacemultiple(Num,",;.;str(0)","str(0);,;.",";")

michael · June 22, 2017, 3:27am

Whoops! str(0) should be chr(0)

admin · June 22, 2017, 8:24am

@michael Yes, your formula will work fine if Num contains just a number by itself. I was in a hurry, so I didn’t explain that the code I posted will only convert periods and commas it finds inside numbers. So if the sample text contained

Nevada City, CA

that comma would not be affected. Nether would the period at the end of this sentence:

This is a test.

By using a regular expression, I was able to set it up so that only periods and commas that are inside numbers, for example 32,456.78 are converted, while other periods and commas that are in the data are not affected.

michael · June 22, 2017, 8:38am

That’s very clever - RegEx is the way to go.

pirouette · December 19, 2017, 4:35pm

Hi, this is my first post ever in a forum, so I hope I’m doing anything wrong.

I only know the regular expressions ‘dialect’ of BBEdit, and I have the same problem in my data: numbers with dots instead of commas for the decimal separator.
I used Find & Replace from the Search menu, selected “Regular Expression” and used (\d)(\.)(\d) and \1,\3 as my find and replace patterns respectively. When applied, the dot is replaced by a comma, but all amounts are off by 1.30 (1,30), so 20.00 became 21.30.

Can you explain this behaviour?

dave · December 19, 2017, 6:18pm

You need to use $1 and $3 instead of \1 and \3. In your example, the 0.0 in 20.00 was being replaced by 1,3. Your find should be (\d)(\.)(\d) and your replace should be $1,$3

pirouette · December 19, 2017, 6:39pm

Dear Dave,

Thank you for your quick reply. I guess I’ll have to learn a new dialect.