Server and Clients out of sync

Thunderstorms in the area on Friday caused a power outage to the building where the beta testing is underway, and caused a loss of internet connection. The details are impossible to obtain, such as the timing and duration, and the status of the server during those events, but I was told that the PanX server lost power. (There is supposed to be a UPS preventing this but I think it had not been configured properly,) Any number of conditions are possible.
I was a alerted to the problem when I was told one user could not access a record that had been added by another, after the outage was over and power and internet had been restored. Indeed, that was the case. There were records added by one user that were on that person’s computer but not on the server.
Interestingly, the person who added the record did not know and was not alerted to the problem. But later, when I tried to edit one of those records on her computer, I got a message saying it was not on the server. I was able to resolve some of these problems by copying and pasting the record, so the new record was on the server and the old, non-synced record could be deleted from the user’s computer.
I am curious about how this can happen. I am sure that when the record was added by the user that it was simultaneously added to the server, but it was apparently lost from the server. I suspect that there is some period of time when a record has been added to the server, but it has not yet been saved. If a loss of power occurs during that period, then the record could be on the user’s computer but not on the server, which is what we observed. When does the server save databases after a change has been made to a record?
BTW, there were at least five records like this involving two different users, all created around the same time, within a half hour or so. One user was on the same subnet as the server and one was connected via VPN to the server’s network, and that connection was probably lost during or after the storm. I am still waiting to find out if there are more reports of lost records (today is the first work day since I learned of the problem.)

By default, the server auto-saves changes after one second. This can be customized in the Preferences:Server panel, you can make it longer for better performance at the risk of possibly losing data.

Also, unless you uncheck the option, the server automatically saves all changes in a transaction journal. This happens immediately, with no delay. The journal holds all changes made since the last auto-save. When an auto-save happens, the journal is cleared. When the server launches and opens a database, it checks to see if there is a journal, indicating that changes were made after the last auto-save. If so, it automatically applies these changes.

So, quite a bit of effort was made to ensure against data loss. However, this only works for data that has actually been sent to the server while it was still running. Reviewing the client code, it appears that I didn’t do enough to ensure that the client handles server errors properly. It appears that even if the server is down, the client could add a record and add data, possibly with no error message appearing. So I suspect that is what happened on Friday. I think these records were never added to the server – the server was already down before the record was added.

I have added a Bitbucket issue for this and will investigate this further in the near future.

Given your explanation, I don’t know how the situation that I found arose. The records that were ‘missing’ from the server would have been created one at a time over a period of at least several minutes, so they should have been saved, except for possibly some small amount of information. But given the lack of information about the sequence of events in the loss of power and internet, I don’t think there is any way to figure out what may have happened.
In any event, hopefully they fixed their problem with their UPS and we won’t see this situation again.
I see on the Server Preferences that there is a checkbox to ‘Save transaction journal’. What is the effect of checking the box?

No, I think that is possible, because it appears to me that the client is not always reporting the errors it may get when communicating with the server. So they could merrily go on adding records and typing in data, with no clue that it isn’t getting saved to the server. That’s what the Bitbucket issue is about – it needs to report these errors immediately so that the users stops and takes steps to rectify the situation, or at least stops.

A working UPS isn’t a complete solution to this – you could still have a network problem, crashed server, etc. The fix is that the user needs to be alerted about these errors immediately.

I mentioned that in my previous post, please re-read, it describes what the journal is.

It’s also described on the Advanced Server Settings help page.

The concept of a server transaction journal is also discussed in this video, though you would have to wade thru a lot of other stuff to get to it. But you might find it entertaining.

Not sure that link is working, if not, try this: https://www.vimeo.com/368152303

Interesting, that embedded video is showing the same problem that occurs in the Panorama Video Training window. So I think this is a Vimeo problem, I’ll need to contact them.

Hi Jim,

I watched the video. Very interesting and fortunately at level that I could understand.

I have some questions, in case you have the time to respond:

Occasionally we have seen discussion on the forum of file corruption. How does this occur and how often? I.e., errors in creating the transaction journal? errors in saving the file to disk? reloading the transaction journal?

If corruption arises in the saved file, is there a way to isolate it, repair, or remove it so it does not lead to persistent problems?

I know that there are different types of RAM (I mean the physical RAM hardware itself); are there important differences in RAM that lead one to choose one over another for Panorama? Some are significantly more expensive, but maybe they are worth it. Are some types of RAM that is typically used with servers because of certain features?

Smith Duggan is using a MacMini as the server. Are there more reliable choices because of the differences in RAM? or Other reasons to choose one computer over another?

BTW, I had one file (one of my very first Pan6 files from 1990s) that I thought was corrupt in some areas after importing to PanX. If you looked at the data sheet, and scrolled down, when the file reached a certain segment of records, PanX would usually crash. Now I don’t know how that problem was eventually solved; I tried various things to stop the crashing, and eventually it worked.

I don’t really know what discussion you are referring to. Whatever it was, it probably wasn’t server related, so the transaction journal would not be involved (there is no transaction journal for single user databases).

In the first decade of Panorama I would occasionally get corrupted files from users that couldn’t be recovered. Maybe a half dozen over a decade? I don’t think I’ve seen any since the turn of the century. In some cases it appeared to be user error, in other cases the cause was mysterious.

As for the general question of how corruption occurs, there’s no fixed answer to that. Could be a hardware issue like a problem with a RAM memory cell, a magnetic anomaly on a disk (spinning disks aren’t used much any more), or a problem with an SSD memory cell. Disk hardware uses CRC to check for errors and correct small errors, to minimize this problem. But it can’t be completely eliminated.

Most computers don’t have any error correction/detection for RAM. However, this is possible, it is called ECC. The only Mac computers on which ECC is available is the Mac Pro. I think this sucks because ECC ram is only slightly more expensive than regular, but still it tends to be available only on hight end machines, and computers built for servers (which Apple does not offer).

So, unless you are willing to spring for a Mac Pro, a Mac Mini has RAM just as good as any other Mac computer. A Mac Mini is what is used for provue.com.

Of course theoretically a software bug could cause corruption also. I don’t know of any such bug in Panorama X, but it’s possible.

I’ve seen studies that indicate that cosmic rays could easily trigger bit flips in RAM memory. ECC should generally take care of that, but almost no one seems to care.

If you google “bit rot” you’ll also find studies that indicate that what’s written to disk isn’t necessarily what gets read back from disk at a later date. No one seems to understand why this happens. Again, concern about this seems to be strictly academic – hand wringing at conferences etc. The ZFS file system is capable of at least detecting such errors, but it never caught on. It seems like when Apple created their shiny new APFS file system that would have been a great opportunity to add this capability, but Apple engineers apparently didn’t agree.

Corruption has been raised 116 times in this forum before this post. Whether any of those actually did involve corrupt files, I do not know. I was asking a more theoretical question about data integrity, not specifically related to servers or even Panorama. In your talk about in memory databases, including the use of journals, I wondered whether there were data integrity issues associated with the various strategies.

I am going to pass on the Mac Pro.

BTW, I do know that cosmic rays are essential for high intensity discharge lamps starting. The rays dislodge electrons in the presence of an electric field which initiates the arcing in such lamps.

Really? On the Discourse forum? Wow. Anyway, when I said I didn’t know which discussion you were referring to, I meant because there have been so many. Though I think with Panorama X, many of these issues referred not to day, but forms. Also, I think many people that bring up corruption don’t know what they are talking about. If something happens they don’t understand, it must be “corruption.” Which isn’t to say corruption never happens, just a lot more smoke than fire I think.

Not really.

In regard to journals, they wouldn’t help with corruption, unless the corruption caused a crash before the data was saved.

They are pretty sweet, but way too rich for my blood.

Really? That’s very cool. Never heard that before.

Did you know that the word “limelight” originated in the 1840’s when they used to use burning piles of lime in front of the stage for lighting? Apparently lime burns quite bright and fairly white. I learned that many years ago when reading a book called “The Physics and Chemistry of Color.”