Reducing number of fields

I need to reduce the number of fields in my database. The database is for a charity which I successfully imported from Panorama 6. That database worked extremely well in Panorama 6. However, after I switched to Panorama X, I got the ‘spinning colour wheel of death’ and after contacting ProVUE support back in September, I was told that I had too many fields and should redesign my database to use vastly fewer fields, either by separating it into multiple databases or combining information in multiple fields. As I am a lecturer, I have not had time to redesign my database until now.

The database contains 3,256 records (a bloated mailing list with standard mailing list information – names, addresses, phone numbers, email addresses, etc.) with 859 fields. The reason for the high number of fields is that the database, in addition to the mailing list information, contains information on donations (for a much smaller sub-set of the total number of records). The information on donations is for each month going back to 1985 with additional fields for monthly amounts, yearly number of donations, yearly totals and yearly averages (16+ fields/year). The database also contains correspondence information (received, sent, date) for a small number of the total records.

I need to import donation information each month from a csv/tsv text file which I download from a server. I would like to have one place (i.e., different forms on the mailing list/main database) where I can both enter/edit monthly financial information as well as any correspondence. To do that, I need to see the name and address of the individual/organisation making the donation or with whom there is correspondence to process. I would also like to be able to have a form with a table with the monthly financial information for the last 10 years indicated by columns for each year and rows for each month. Additionally, for each donor, I would also like to be able to see the figures for the last 12 month period (not just calendar year counts, totals and averages) for: 1) number of donations, 2) the total amount of the donations and 3) the average donation . There are also several categories of donors in the database and I would like to be able to search/select a given category and see the information (counts, totals, averages) for that specific category.

Now to my questions. I am not a developer, but I’m not a novice with Panorama either. I’m looking for a fairly easy/straightforward solution, but it doesn’t have to be ‘pretty’. How might I reduce the number of fields? Could I somehow put the donations, and other figures for a given year into an array of some kind in order to reduce the number of fields? If so, how do I then easily access the information stored in the arrays. If that won’t work, is there an easy way to break the database into multiple databases and use lookup functions to pull information from the separate databases to display on the mailing list database? I had thought of creating separate databases for each year with the financial information for that year. Would that work? How do I easily access that information while in the main (mailing list) database? Will having multiple databases (minimum of 11) open at the same time impact memory use/speed? Is there a way to simultaneously open multiple (associated) databases when you open one (main) database, so that you can perform lookup operations while in the main database? Is there a way to lookup information from multiple databases at one time, or do the lookups each need to be separate operations?

Any other suggestions you have would be very gratefully received. Thank you in advance.

1 Like

Jim has said that the apple structure used to display the data sheet is optimized for displaying lots of rows, but not for lots of columns. 859 fields might run, but would likely be painfully slow.

Having multiple databases open at once wouldn’t be a problem with memory use or speed although it might cause screen clutter if all had open windows. Having some open ‘secret’ would leave them available for lookups without using screen space.

Multiple databases can currently be automatically opened with appropriate code placed in .Initialize procedures. Whenever 10.2 lands it offers a more elegant solution to that problem.

A truly ugly, but perhaps conceptually useful ‘solution’ would to add a field, say alpha, to your 859 and fill it with a unique value for each 3256 records. Then copy to 9 new databases 3256 records respectively of alpha plus fields 1-100, alpha plus 101-200,…, alpha plus 801-859. In theory they could be coded to crudely work together without the problem of one database choking on 859 fields. 101 fields may not be optimal, but have worked well enough for me to discourage changing databases. It’s likely your data will break down into smaller logical chunks. Many fields may not need all 3256 records. That’s fine so long as you maintain unique values to tie them to the rest of the data.

Instead of separate databases for each year’s donations could you have a field that specifies from which mailing list record it came, a donation_date field in date datatype and then one generic field for each those 16+ specific year fields you’d mentioned? You could mostly transfer it over procedurally, but after 35 years you’ll likely need some data clean up to get all to fit whatever new standardized format you picked. If you still have Pan6 running and are still more comfortable working in it you could split that off therein, then open the result into PanX. 18+ fields would run fine there even with 100,000s of records. You can then select by calendar year or by whatever date range you wished. The Crosstab Workshop and Summary Workshop under Help probably can give you the data displays you’re mentioning. You, like me, may find Jim’s post on how to combine/embed multiple Crosstabs or Summaries helpful for some of your catagorizations.

If you want to compare multiple selections’ data make a configurable form to display one then CloneWindows of it to configure the next.

Multiple fields can also be combined into text arrays stored in one field. Numerical and date data can be stored therein after conversion to text. Two, or even more, separators could organize the data. Plenty of PanX statements and functions exist to access just the desired data from such. Whether it’s worth the work to setup depends on your needs and skill. Fancier setups preferably move data in and out of arrays procedurally. As extra or missing ‘separator’ characters, especially easy with LF or CR, can goof your data. Getting old, manually entered, data ready to convert to a standardized text array format may take some work. But once setup a form could lookup and display just what you want. Data you’d routinely view, change, sort on the datasheet by field probably isn’t great to combine via text arrays. Multiple data bits you normally just view or select on a form could be combined as the form’s code, and your data input code, would do the needed work and all you’d miss would be extraneous fields.

It sounds to me like you might only need two databases – a mailing list and a donation database. It sounds like you’ve designed your database as a crosstab, and that you are continuously adding more fields month after month. Instead, I would recommend having a date field in your donation database, which I think will replace hundreds of fields in your current database. If you ever do need to display the data in the form of a crosstab, you can use the Crosstab Workshop to build that on the fly for you.

If I understand correctly, your database is currently very sparse – in other words, most of the cells are empty. Even if there weren’t performance problems (and Panorama 6 definitely did not perform well with that many fields either), a sparse database like that is very awkward to work with. Once you get this shifted over to a more normal configuration you are going to be much happier. However, since it took you decades to get into this situation, it’s going to take a bit of work to get out of it.

I’m really just restating some of what @JohnB posted above. His post contains excellent advice.

John,
Thank you very much. There is a lot here to process, but you have given me several new and helpful ideas to pursue. I can no longer run Pan6, since I upgraded to Catatlina 10.15.7 and a new MacBook Pro. It is just that I’ve not taken the time to really design the database as it should be. So, there may be some work ahead, but I think it will be well worth it. When I get into this, I’m sure I’ll be back for more advice to keep me moving forward.Thanks again for the help.

John,
Thanks again for your advice. I am working on re-designing my data base as two data bases, as Jim suggested: one for mailing list information and the other for donation information with a key field for linking the two. My question is do I set up the donation data base with the key (linking) field and a date field (for the date the donation was made), plus an amount field (for the donation amount)? And then access individual or groups via cross-tab analysis? If so, then how do I get my present data into that form? It is now spread across fields which represent each month from 1985 to the present associated with a given donor (as Jim indicated, somewhat like a cross-tab, but taking hundreds of fields). So how do I move the data from that horizontal alignment to a vertical alignment based basically on the date of the donation?
Thanks for any help you might be able to offer.

To do that automatically a custom program will have to be written. No one here can really provide specific advice for doing that since we are not familiar with the exact details of the field structures in your databases. If the field names in the original database were named in a systematic way then you can probably write a short loop to run through the fields and convert the data based on the field names. But I’m guessing that the field names were assigned in a somewhat haphazard way. If that is the case, the program will need special case code for every field. It won’t have super complicated logic, but it will be tedious to write. Since you will probably need several lines of code for each field, you’ll wind up with several thousand lines of code to convert over 800 fields. You must be very meticulous in writing this because any mistake will result in lost data. When the program is finished it will probably take several hours to run. I would suggest doing testing on just a handful of records at a time. I can easily imagine it would take several full time days to carefully write all the code. But that’s not too bad considering it took decades to get into this situation. I think you’ll be really pleased once you have the refactored data, it should be much easier to work with.

I think the data transfer to a donation database can be done somewhat faster than Jim fears. You are only going to transfer this once so a single step perfect program that is reusable is overkill. A repeated easily tweaked program transfers part of the data at a time might be quicker, albeit boring and tedious.

Since you can’t run Pan6 any longer either you eventually got the 859 field file converted to a sluggish PanX file or you’d previously exported it as a .csv or .tsv file. Either could be used, but the former is more straightforward. I’ve not tried 859 fields but I work with a 330 field one. There’s a massive monthly updated federal db, 330 fields and a few million records from which I need 23 fields for 2 states’ data. It’s distributed as a 6GB .csv file zipped to under 1GB. I use a shellscript to split the monster into 50k line pieces. I import a piece into my 330 field PanX db then select my states therein. Then use importdatabase into my 23 field PanX db with append, selected and matchingfields options. Loop for each piece. Works well so long as the correct field names match. I get through the whole thing in about 2 hours and end up with 300k 23 field records. I’d suggest a variation on that for you.

First, a little preparation. Add and fill a unique key field to your 859 db. Unless you’ve been compulsively precise naming all your monthly fields you are going to want a listing of their names. Run this one line program:
filesave info("desktopfolder")+"fieldnames.txt",info("fields")

It will be much easier to lookup and to work with your field names from that text file than from the slow db. Make an empty donation db with one field named identical to your key field in the leftmost position. Make a date field. You didn’t say whether your monthly data includes specific dates or a generic month date is implied from the field. If the latter you’ll supply dates via emptyfill in the date field after importing that month’s data. Make enough additional fields in appropriate data types to cover any needed month fields.

Because changing the full db is sluggish we’ll do the work, and write the procedure within within the donation db. Open the full db and select any records containing data in a given month. Change the appropriate field names in the donation db to match that month’s names in the full db. Then importdatabase the full db into the donation db with selected, append and matchingfields options. Repeat or each month. Once every month is imported change the field names to whatever you want for the future.

You can semi-automate this using copy/paste from your fieldname list, approximately like this:

let FN="paste ordered cr separated list of field names here"
window fulldb
select ?(array(FN,1,cr()),false(),not(emptycell(array(FN,1,cr()))) or ?(array(FN,2,cr()),false(),not(emptycell(array(FN,2,cr()))) or ?(array(F,3,cr()),false(),not(emptycell(array(FN,3,cr())))
;above selects records in which any of the field names in FN aren't empty
window donation db
field 2
fieldname ?(array(FN,1,cr())="","NOPE2",array(FN,1,cr()))
field 3
fieldname ?(array(FN,2,cr())="","NOPE3",array(FN,1,cr()))
field 4
fieldname ?(array(FN,3,cr())="","NOPE4",array(FN,1,cr()))
importdatabase fulldb,"existingdata","append","matchingfields","yes","selected","yes"

In principal all you’d have to do is paste the next month’s fieldname list into the let statement and rest would work for that month. Now it could use tweaks for different number of potential field names, may need to chevronquote() some or all of the fieldnames.

Once you’re sure everything transferred properly select the list of all the fields the full db no longer needs and use looparray to go to each and delete them in order.

If instead you’re starting with .csv or .tsv files you’ll need to prefix each line of the original with a sequence number and use that as your initial key field. Then use importtext with an appropriate rearrange to fill all your mailing db fields. Then use importtext with a rearrange including appropriate importcell() statements to append one month’s fields and stamp the date field, with one month per pass. You’ll end up with (number of original records)*(number of months) records. Then select and remove all the empty records. Getting the numbers straight to plug into the importcell() statements is the hard part.

Initially I’d just put the monthly data in the donation db and leave all the donor specific information in the mail db. Once set up it will be simple to move or copy anything you want in either direction via key field lookups. If you’ll be using something as a cross tab criteria it may make sense to have fields for that in both dbs eventually.

John,
Thanks once again for hanging in there with me. I am evidently not as knowledgable about PanX as I thought. I really do appreciate your helpful advice. It is helping me reshape my thinking regarding how to structure things.
Here’s how my 859 db is structured. For each record (name for which I have a unique number field for linking purposes) in my 859 db, I have 419 fields from 10/85 … 8/20 (where the field names are 10/85, etc.) strung out after all the fields containing contact information. Naturally, some names on the 859 db have not made any donation. Additionally, those who have made donations, do not always do so regularly. So, there are a lot of months with nothing recorded in the db – lots of empty space, which Jim says makes things difficult.
o How could I construct a loop which would move from the 10/85 field through all 419 fields and select all those records which are not empty? Once I had that, I could remove the unselected records and save the result as a new db which would contain only those records which actually contain donations which need to be re-formatted to a new db. But I am still not sure how to get those figures (which are numbers in fields whose names are not PanX dates, but imply the month and year), into a db, where each donation is a new record, identified by the unique key field and a date. If I could get to that point, I could use cross-tabs to come up with the other things I need (I believe).
Thanks again, in advance for any and all help you might be able to offer.

The monthly fields contain numbers, the amount of that month’s donation? Do some cells actually show zero and if so do you want to keep them or throw them out with cells showing empty?

You have one field for each of 419 months, 10/85 to 8/20. Are all named like that: two digits for every year, no leading zeros for the months, all slashes rather than hyphens and no spaces in their names? If so pulling out all your non-empty donation cells from this should be easy and quick and you’ve earned a prize for being unnaturally consistent over 35 years!

Make a new donor db in same folder as the 859 db. Give it three fields. The first with the identical name and datatype as your unique number field in the 859 db. The second named “Date” with a date datatype and an output pattern of “mm/dd”. The third named “Amount” with a floating point datatype and an output pattern of “#,.##”. Paste the following procedure into it:

let donor="name of your donor db"
let big="name of your 859 db"
let first=date("10/1/1985")
for i,0,418
    window big
    ;if you want keep cells showing zero uncomment the first select, if not uncomment the second
    ;Select not(emptycell(chevronquoted(datepattern(monthmath(first,i),"mm/yy"))))
    ;Select not(emptycell(chevronquoted(datepattern(monthmath(first,i),"mm/yy"))) or chevronquoted(datepattern(monthmath(first,i),"mm/yy"))=0)
    ifselect
    else
        window donor
        Field 3
        fieldname datepattern(monthmath(first,i),"mm/yy")
        importdatabase big,"existingdata","append","matchingfields","yes","selected","yes"
        if i=0
            select not(emptycell(unique))
            removeunselected
        endif
        Field Date
        emptyfill monthmath(first,i)
    endif
endloop
Field 3
fieldname "Amount"
Save

You’ll need to fill in the actual names of the 859 db and the donor db and uncomment which of the two select option you prefer. Then click run and let it go. I could be made to run faster with show and noshow, but then you wouldn’t see the progress as simply. For a one off it should be fast enough.

I haven’t tested it. The one potential problem, other than non-standardly named month fields, is that providing field names to functions via formulas can be tricky. I’m guessing this will work, if not it will fail fast! If you have non-standard field names it should run up to there then stop with an error. Standardize the offending field name, change let first=date("10/1/1985") to match it and rerun.

OK, I checked and the paired, commented out select statements need to be replaced with execute statements to work. The other place I’d used calculated field names seems to work. So instead of my code above try this:

let donor="name of your donor db"
let big="name of your 859 db"
let first=date("10/1/1985")
for i,0,418
    window big
    ;if you want keep cells showing zero uncomment the first execute, if not uncomment the second
;execute |||Select not(emptycell(«|||+datepattern(monthmath(first,i),"mm/yy")+|||»))||
;execute |||Select not(emptycell(«|||+datepattern(monthmath(first,i),"mm/yy")+|||») or «|||+datepattern(monthmath(first,i),"mm/yy")+|||»=0)|||

    ifselect
    else
        window donor
        Field 3
        fieldname datepattern(monthmath(first,i),"mm/yy")
        importdatabase big,"existingdata","append","matchingfields","yes","selected","yes"
        if i=0
            select not(emptycell(unique))
            removeunselected
        endif
        Field Date
        emptyfill monthmath(first,i)
    endif
endloop
Field 3
fieldname "Amount"
Save

Be certain to have backup copies of the 859 db before trying this! If it completes check as much as needed to reassure yourself all the donations transferred ok. Once confident the next step is to delete the 419 fields from the 859 db. That can also be looped. Just how ‘contact information’ is left needing 440 fields you haven’t said, but at least the width problem would be halved then.

John,
Once again, thank you for taking all this time to help me get things sorted. It’s 4:12 here in Scotland and already dark, but I’ve still got work to do. I’m an American working here in the UK.
I had been able to get my big db down to ‘only’ 521 fields by eliminating a lot of dead wood, but, hopefully, with your help, I can eliminate 419 more fields. I have renamed my big db, big, just for simplicity sake in using your code. The donor db is donor again for simplicity. I can always rename them later, if I so choose.
I have a couple of questions regarding the code you sent me. I understand most of it, but there are still a few things I’m uncertain about.

  1. What do you mean by “for i,0,418”? Am I to replace the ‘i’ with either a zero or 418? If so, where do I put what, since there are several 'i’s in your code?
  2. You have an endloop statement, but I don’t see a loop statement. Is that OK?
  3. How will the proper donation be placed with the proper donor (the unique field)? If the donation doesn’t match up with the right donor, it is all for nought.
  4. Do I place your code in the ‘code’ box in the field properties for the “Amount” field? Will it run automatically, or do I need to do something for it to run?
    Again, thanks for all the help! I really think this will work and be much more efficient.

“For” is a kind of loop, new to Panorama in 10.1, but ancient in other computer languages. So the endloop is for it. Look it up in Help. ‘i’ is the variable that holds its counter and is used as such in several places. It could be called anything but Help used “i” and I kept that. So the first time through the loop it adds i=0 months to the “first” variable’s staring date and adds one more month until in the last pass you are 418 months later.

By the magic of the “matchingfields” option of importdatabase. Code has the loop rename what will finally be named the “Amount” field in Donors sequentially to the name of your monthly fields and having the first field in Donors named identically to the unique field in the 859 db (and not having a field named Date in 859 db). So each time you import only data in those two fields with matching names get imported. Because importdatabase is using the “append” option for “existingdata” you keep adding months of data to it, just combined in one field. Each imported record carries its unique field data along.

Put this code in a new procedure window. You can use the Run button there to use it and once transferred use the Remove button to delete it. This is separate from whatever code you’ll write so the individual databases work together properly, along with any forms you design. This is just get your old data into a more serviceable form. Once you’re confident that data transferred ok I can give you a simpler loop to delete the then no longer needed 419 fields from the 859 db. Which should make whatever further you want to improve the organization of the mailing list etc left there more responsive. Check out Help on the Morph field dialog for some ideas on that. Once you’ve figured how you want data organized in your new PanX databases and gotten your legacy data into that form then you can figure out how to simplify and automate future actions.

John,
That was exactly what I needed to know. As it is ‘tea time’ now, I will run it tomorrow and let you know what happened. I am getting excited about the new possibilities and hope to explore PanX a wee bit more with regard to my bibliographic db which I use far more, once I get this donor db sorted. Thanks again.

If Will can get his hands on the Panorama 10.2 early release he will be able to link the files much more easily using the new relational tools that it provides. Makes using a boatload of lookups history. :partying_face:

2 posts were merged into an existing topic: BIG NEWS – Panorama X 10.2 Classes & Preview starts in January!

Downside of jumping to 10.2 is its relational tools are new concepts to some, including me. Feels to me that Will could use more time to absorb how PanX does things that are conceptionally similar in his familiar Pan6, before moving onto new concepts.

I’ve got a boatload of lookups, but I also have a boatload of experience working with them, knitting complex long procedures across numerous databases. Which currently work! Just not as quickly as I could wish. I have no experience working with the kind of databases who’s users have historically sneered down on Panorama as “not-relational.”

Jumping too fast into the new risks breaking the old. The logical solution is the coming course. Amongst his other virtues Jim is a good teacher. I’ll probably watch the videos after they’re recorded. The prior videos worked well for me. Got my recoded PanX conversions to run about as fast as my kludgy prior Pan6 programs! Then try them in small ways where they can’t hurt much, until I master their concepts. Then gradually blend them into my old code as time permits. Then probably dream up some now features to add.

John,
On my first attempt to run the code you had sent me, I realised that I needed to change my Language and region system preferences, since the way dates are configured here is different than in the US (here it is dd/mm/yy). When I tried again, I got the following error message: “Field or Variable [else] does not exist.” That is referring to the following lines in your code:
execute |||Select not(emptycell(«|||+datepattern(monthmath(first,i),“mm/yy”)+|||») or «|||+datepattern(monthmath(first,i),“mm/yy”)+|||»=0)|||

ifselect
else
    window donor
    Field 3

How do I get your code to work?
Thanks.

I haven’t been following this closely, but the lines you referred to looked strange to me, so I checked ifselect in help. “ifselect” starts a block of code that is terminated by either else or endif. There’s no code here between ifselect and else, so it seems as if Pan is trying to interpret else as part of that block of code. Shouldn’t it be

Ifselect

…[some instructions]

else

window donor

Field 3

Etc.

Yes, I’d goofed with the ifselect. I dream I can code the bullseye, but am satisfied if I can find it incrementally from what I first write. To do what I’d intended change it to:
if info("empty")

Keep the else and the rest.

Your region preferences shouldn’t affect this. Dates are stored as the same integers in all regions. Those preferences just affect their default format as text. In this that’s all pre-empted by the datepattern() functions

John,
You are way beyond me.
I changed the code as you instructed. However, I then received the following error message:
sizeof( function error: no field or variable named “1/85”.
What should I do now? We’re getting there, slowly but surely and I certainly very much appreciate your patience and help with this.