Importing of Excel .xlsx file as New File from Text File

I have a .xlsx file that I wanted to import into Pano X. When I found that the Open… command would not recognize the file, I then tried the New File from Text File option. Perhaps this was an overly imaginative bit of optimism that I thought might work but it did recognize the .xlsx file but alas it opened it with garbage. Either Pano X should open this type of file, recognizing the actual data format and doing a proper import, or it should ignore this type of file from the New File from Text File dialog choice options. Preferably recognize what it is and just deal with it as needed to give the user what we know they want.

I have recently saved Excel files as csv files and opened those with PanX.

I don’t think importing excel files directly into Pan 6 was an option either. I always had to first save as a csv file and then I could drag it into a Pan 6 db and then the Wizard would open and I would choose which columns to import, adding the data or making a new db. So I wouldn’t expect anything different at this time in Pan X. It would be nice if down the road one could do that, but I am guessing that is a lot of coding.

A .xlsx file is a quite complex package of .xml files in a folder structure. So I agree: It should be greyed out in the New File from Text dialog. But in fact, it can be opened with text editor apps, too. When those text editor applications are able to open .xlsx files, then I don’t think Panorama X can do anything to exclude .xlsx files from the Text File Open … dialog.

You can open a .xlsx file in TextWrangler and have a look at its structure. But for sure, this is not a suitable approach to transfer data from Excel to Panorama.

You should use CSV or TSV text files for the data transfer. Excel offers both formats in its Save as … dialog. — Or you copy the data in Excel and use the Import from Clipboard option in the Panorama X Text Import wizard.

In my view there are two issues here.

The first issue is whether or not there is any reliable way to determine that a file is a valid text file, and there isn’t. The only option would be to only allow a limited list of extensions : .txt, .csv, .tsv, .html, .css, etc. But what is that list? There could be hundreds or thousands of possible valid extensions for text file, and any list is sure to have missing items. I think it is better to just show all files. After all, the consequences are no big deal, you just get a database full of garbage. But if Panorama refuses to open a file that does contain valid text because it doesn’t use an extension on the list, I think that is worse.

You might then suggest that there be a list of excluded extensions. First of all, again, how would you come up with that list? But more importantly, the open file dialog is built using Apple’s code, and that only allows an inclusive list, not an exclusive one.

The second issue is Excel files specifically. It would be absolutelly awesome if Panorama could import and export Excel files directly, and that is definitely on the to-do list. But it is not a trivial project, and I think other projects have higher priority. I do have two libraries on my radar for possibly accomplishing this, one commercial, and one open source. Each appear to have plusses and minuses. So in the long run (at a minimum, after the server version), this will definitely happen. In the meantime, as has been suggested, either tab or comma separated text files have to be used as an intermediary.

I was aware of the typical method of having Excel export to a .csv but I was surprised to see Panorama allow just about anything in the ‘New Database from Text File…’ option with .xlsx files to choose from. It seemed like Pano X was setting the user up for failure. I try to not let the user choose something that I know will not work. It just isn’t nice to do.

Apple’s method for this kind of thing is shown in the following graphic.

The above graphic is from the ‘Choose an application to open the document’ dialog when choosing an app to assign to a file for opening of the file.

Apple initially displays a list of typical or recommended apps. If the user is not happy with the selection, they can choose the ‘All Applications’ pop down choice and then all apps will be enabled.

It was my thought that Pano X could initially enable the .txt, .csv, doc, or whatever list is typical and would likely result in success but also allow the brave souls to instead choose the greater list that would be enabled by use of the pop down for full access to all files.

Robert Ameeti

I think that is a good thought in a world where there are unlimited resources available for developing Panorama. We’re not in that world, and I think implementing this idea would be a lot more work than you imagine. First of all, just compiling the list of “typical” text files would be quite time consuming. It would be a huge list. There might also be performance problems in using such a large list – wouldn’t know that until it was actually implemented.

Secondly, adding extra widgets on an open or save panel is non-trivial. Doable, of course, but would take significant time.

Since I agree that this would be an improvement, I have entered it as a proposal for future consideration when development resources are less constrained (hey, I like that phrase, sorry, but I’m think I’ll start using it regularly!). The pop-up widget part would be implemented as a change to the choosefiledialog statement, so it would be available to any database author, not just ProVUE internally (the New>New Database From Text File command is actually implemented as a short procedure that uses choosefiledialog).

I knew you were aware of this but I tacked that on to acknowledge the earlier comments by others, and for the benefit of future readers that might not know about it.

At this point there is a triage process, because my development time is a zero sum game. Work on one area means some other area is not worked on. But it’s certainly good to file ideas like this for the future – Bitbucket has been a big help to me in keeping that reasonably organized and ensuring that just because something is dropped for now, it is not dropped forever. There have already been instances of items being implemented after sitting on the list for a year or more.

Out of curiosity about how difficult this would be, I looked up the .xlsx specifications, and found a pdf file of 290 pages of them from Microsoft. So I suspect that learning what is necessary to convert Excel to Panorama directly would be about as difficult a job as learning everything there is to know about Panorama. On that basis, I think we can wait.

Of course, if there are any Excel mavens reading this who are already familiar with the .xlsx specifications, it would probably be easy enough to write something that could be shared with us.

I’ve always used a Panorama procedure to identify all Excel files in a folder, then to open a very elementary VBA macro which saves the files as text which the Pan procedure then imports.

VBA has been replaced by something else but it still runs in MS Office 2011.

I’ve written lots of very sophisticated systems which import Excel files, process the data and export it as various fully-formatted Excel files - all at the click of a button. This is very useful for clients who are dedicated Excel users - and there are still a lot of them.

I’ve never used .csv files but text files work well so I don’t need to know.

VBA is still there in Office 2016 – with a less than perfect interface, but working. The only version of Office that came without Visual Basic was Office 2008. That was the reason I skipped that version and remained at Office 2004 for a long time until Office 2011 was released.

But Michael, I wonder about your “fully-formatted” Excel files: If you exported the data from Excel as text, processed the data and exported the data then back in Excel format, you have lost all Excel formulas and functions on the way and saved the values only. So the data will look nice in Excel, but it is “pretty text” only, not a working Excel file with formulas and functions. But O.K., maybe Excel was used here more as a database.

Actually, .csv files are text files. CSV stands for “comma separated values”, because within the text each value is separated from the next by a comma. That’s all there is too it.

I would never try to figure those specs out and write the code from scratch. I’ll have to either purchase a library or perhaps be able to incorporate an open source library. However, even when using someone else’s code, there’s still a lot of work in integrating it into my code, plus the time of researching into which pre-built library to choose.

That’s exactly what I was doing Kurt. The major client was the CSIRO (look it up in Wikiedia). They had subsidiary clients who gathered lots of data in a somewhat chaotic manner. My procedures processed the data and furnished the subsidiary clients and the federal government with reports that merged all of the inputs into Excel reports. None of the clients had any formulas or functions - just data, which is what I thought Robert was asking about.

In fact, I’m willing to bet that it’s possible to export formulas and functions as well if you want. VBA is as clunky as all get out but it’s surprisingly powerful.

I was going to leave this alone as I had had my chance to make a point but then you guys are pulling me back in. :slight_smile:

2 issues now…

  1. Most of the Mac world does in fact use Excel as their database. From little people doing a mailing list to large corporations collecting data. (And it often is in .xlsx despite my dislike of that format.) If Pano X has any real desire to grab new customers from that very large bulk of users, it would be very smart to be able to allow them to just have their files already full of data to be sucked in. There are tons of those people. Many, many times more of them than current Pano users. Don’t make them do anything to move from their spreadsheet to a real database. Make is slippery easy. That is how you get new users.

  2. As it pertains to that extremely large number of files that would import as they are actually text files, leave that very long tail alone. Ignore the .html, .css, and all those other dozens of files. 99% of the time, those wouldn’t be what the user really wants and would result in failure. My revised suggestion is to just offer perhaps .csv, txt, doc, xml, xmls, and leave the rest alone, only to be selectable by the expert who chooses the alternate choice from the pop down. Satisfy the masses. Even in this thread we are hearing about all the Pano users who have to do the twist without a full understanding of the files that we are trying to get our data from.

Robert Ameeti

I agree with your first point absolutely Robert but I don’t believe that Panorama is ever going to be able to import an .xls or .xlsx file directly and I see no way for Panorama to convert it to text.

I’m not sure what your second point is saying.

Michael, Apparently you’ve not had enough experience with the magic that Pano can achieve. I’ve always enjoyed the ability to make computers dance and sing with Panorama. While it may not be wise to hold your breath, do know that this capability is in our future. I can pretty much guarantee it.

Robert Ameeti

Actually, I hope that direct access of .xls and/or .xlsx files will happen eventually. I guess I wasn’t clear, but that is what my earlier post meant.

Here are two libraries that are on my radar for this:

http://openpyxl.readthedocs.org/en/default/

If anyone knows of any other libraries that could help with this, please let me know.