Hi Malaya,

First, by "Freeze", I think you mean executing ##class(Backup.General).ExternalFreeze().  That method pauses the write daemon at the end of a cycle.  This does not stop the journal daemon.  Journals are not written at the end of the write daemon cycle, they are written continually as you make updates.

The usage of the WIJ is described here: https://docs.intersystems.com/iris20253/csp/docbook/DocBook.UI.Page.cls?KEY=GCDI_wij.

Data blocks that are going to be written to databases are first written to the WIJ.  If there is a crash while the databases are being updated, some blocks may have made it to disk, and some not.  IRIS detects this and applies the blocks from the WIJ at the next startup.  

If you are getting warnings about MISMATCH.WIJ, contact the WRC for help.  There may be a problem with how you are backing up, or there may be another problem with your storage.

One other thing to keep in mind about journal files:

When you restore from a backup, you usually want to bring the system up to the current point in time. For that, you need journal files from the time of the backup up to the current time.  That means that some of the journal files you need aren't going to be in your backup, because they didn't exist when you took your snapshot.  You would need to account for this in your backup planning.
 

It looks like a number of people have contributed to that repo, both inside and outside of InterSystems.  Perhaps @Dmitry Maslennikov, @Evgeny Shvarov, or @Robert Cemper can help.

Those chained RUN commands are very common because they keep the layer count down, but it's a pain when they don't work 100%.

Or, if you like, improve what you can in your fork of this repo and propose a pull request to update it.  Get yourself some Developer Community cred for your effort!

I looked at your Dockerfile and the original one.  It looks like you are expected to pick your own base image, there are several at the top and they are old.

To resolve issues like this, I usually start by building up my Dockerfile one step at a time.  I break long RUN commands into individual steps, such as that complicated RUN command starting at line 29; which is failing even from the intersystems-community repo.  Consider trying to build the image you want outside of Docker Compose first; make sure it can build, then see if you can combine RUN commands, then call the build from Compose.

To add to what @Yaron Munz posted, it is important to note that journaling is an independent data flow that doesn't depend on the write daemon.  How deeply you need to go into understanding the data flow depends on why you are asking.  If you are trying to resolve a problem, reach out to the WRC.

Oliver, a segmentation fault means some process tried to use memory that doesn't belong to it, so the kernel terminates the process.  If you are building a new image, can you share your Dockerfile?  If you are following the startup steps from the README, can you provide all the output?

IRIS documentation on backup and restore mentions this for the contents of the snapshot:

"The snapshot is typically a snapshot of all file systems in use by the system. At a minimum, this includes all directories used by the InterSystems IRIS database in any way, such as the installation directory, database directories, journal and alternate journal directories, WIJ directory, and any directory containing external files used by the system." (https://docs.intersystems.com/iris20253/csp/docbook/DocBook.UI.Page.cls…)

As @AlexanderPettitt said, the target instance needs to be cleanly shutdown. The snapshot needs to include everything IRIS expects to use; I'm not certain about what is included in the snapshot from your post.  

MISMATCH.WIJ is described here: https://docs.intersystems.com/iris20253/csp/docbook/DocBook.UI.Page.cls?KEY=GCDI_wij#GCDI_wij_recover_compare

If you are comparing the IRIS.WIJ from the target against the contents of the databases from the source (or vice versa) you will get mismatches. If you take the wrong action, you will affect the integrity of the data.  The specific details of this are important: such as how the snapshot was taken, what mounting an svol means, and whether or not you actually have some other problem beyond how the backup was taken.

This is a situation where the WRC can help.  I'd suggest starting there.

Erik Hemdal · Jun 10, 2025 go to post

Remember to publish the Web server port so Docker will allow traffic into the IRIS container.  It's not clear whether you did that step.  If you have a Docker Compose file or similar orchestration, it would help to see that.  If you are licensed for IRIS for Health you can also reach out to the WRC or your account team for help.

Erik Hemdal · Jan 2, 2025 go to post

Igor, from those messages, I would surmise that you had some sort of brief network disconnect which ultimately resolved itself.  The primary mirror member, of course, will continue to support your workload so availability would not be affected.

This doesn't look like a problem with IRIS, but rather a response to the conditions on the network.

You might also reach out to the WRC to determine if there is something about your specific version that is causing these brief disconnects to be reported.

Erik Hemdal · Nov 11, 2024 go to post

It most likely depends on what you are doing during build.  Without a license, you can have one connection to IRIS -- so that you can get it running and can activate your license key.  There have been some license changes in recent versions that might have changed some behavior you expect.

The WRC is the best place to start, as @Ben Spead mentioned.

Erik Hemdal · Aug 12, 2024 go to post

Cold backup is one of the options documented in the Data Integrity Guide.

But a lot depends on what you mean by "all data" -- don't forget installation files, license key, journals, any stream files you may have, and on Windows remember that installation puts some keys in the Registry.  How much of this you'll need to restore depends on the kind of trouble you encounter.  For example, if you want to restore a database, but the instance is otherwise healthy, the Registry is likely in working order, but if the entire server is destroyed, you have a more complicated recovery.

If you need to be able to bring the instance back to its latest state (roll-forward recovery), you'll need journal files that won't exist at the time you take the backup so you may want to backup journal files separately during the working hours.

It would be a good idea to protect your installation .EXE file as well, so that you can install a fresh copy of the exact version you use.  Down the road, that version might no longer be available from the WRC.

Take care to preserve the correct ownership and permissions on files so they are correct when they are restored from NAS.

Your account team can be a good resource.

Erik Hemdal · Jun 3, 2024 go to post

Your Windows user and your IRIS user may be (and often are) different.  You need to authenticate to IRIS as well, which is why you need uid and pwd.

Erik Hemdal · Apr 15, 2024 go to post

Robert Cemper has a good idea for getting the container running so you can do work.  Or you can use iris-main to tell IRIS not to start when you bring the container up.

The user and password for EmergencyId is one you define for that session, so it can be whatever you choose.  See https://docs.intersystems.com/iris20241/csp/docbook/DocBook.UI.Page.cls… for the details.

Knowing more about why you want emergency mode can help us get you a better answer.

Erik Hemdal · Mar 13, 2024 go to post

It's unusual that IRIS created a new WIJ file.  I would not expect that if the previous shutdown was clean.  You would be well-served to contact the WRC for help with recovery.

Erik Hemdal · Jan 10, 2024 go to post

Logan, I can't see the post from where you got the image, and I haven't found other questions from your account.  What is your overall goal, and how does Docker fit into what you are trying to do?

I'm guessing that you have a traditional Caché online backup (.CBK file).  Do you know the $ZVERSION for the instance that created the backup?

Erik Hemdal · Dec 26, 2023 go to post

I strongly agree with @Alexander Pettitt on this one.  Request help from the WRC to make sure you have a version containing necessary corrections.  The WRC engineers can advise you about whether your specific version is affected and how best to upgrade.

Erik Hemdal · Jul 24, 2023 go to post

Echoing what Vic and Dmitry have mentioned.  GARCOL cleans up large KILLs, so it's a database operation.  Your post seems to ask a different question, akin to Java object garbage collection. 

I've been quite surprised to see how much work gets done by IRIS processes with scant process memory; I suppose because it's so easy to use globals.

Understanding what you are experiencing and trying to do would help a lot.

Erik Hemdal · Jun 26, 2023 go to post

I ran a quick test, Luis, on my Windows 10 laptop with 32GB of memory.  Memory usage peaked at about 25GB and settled down to about 20GB.  When I deleted the container and stopped Docker, memory went down to about 10GB used.

I did not see obvious memory exhaustion in what is an informal test.  I don't have a good sense of what "good" memory usage should be like on this image.  If this is causing you trouble, reach out to WRC and they can get you more help.

Erik Hemdal · Mar 15, 2023 go to post

Rochdi, if you haven't created and practiced your own steps for doing journal restore, you might be best served by contacting the WRC or your account team at InterSystems.  Restoring from journals can involve many decisions about where the journals come from, where you are using them now, and several other items -- it can get complicated depending on exactly what you need to do.

Also, you can't restore a complete database from journals unless you have all the journals from the time you created the database, which isn't very likely if you have been using the database for more than a few days.  You also reported a very old version of Caché (2014.1) which might affect how you recover.

Certainly if you are in a crisis situation, contact the WRC for immediate help.

Erik Hemdal · Jan 27, 2023 go to post

Alexander is observant.  There are five expansions, but these look like retries, because the size completed doesn't change.  Expansion is stuck for some reason.  The WRC might be able to find out more from ^SYSLOG if this is continuing to occur. 

Erik Hemdal · Jan 19, 2023 go to post

Rochdi,
2.2TB is within the limits I could find for the NTFS file size limits and partition size limits.  Caché can grow a CACHE.DAT file to 32TB before reaching its software limit.

From what you've shared, I have a few ideas:

The NTFS partition is limited in size, causing the <FILEFULL>.

There is some Windows policy that is limiting the maximum file size to less than what NTFS allows.

You've encountered an incompatibility between Ensemble 2014.1 and Windows Server 2016 (which is not a supported platform for this version according to documentation -- it's too new).

Your best bet is probably contacting WRC for help in sorting this situation out.

Erik Hemdal · Dec 28, 2022 go to post

^INTEGRIT is the simplest way to check integrity.  Run the integrity check output to a file and contact WRC, as others have said, if you have support.  The most direct way to resolve database errors is to restore from a good backup and replay journals.  If you can't do that, the other alternatives almost always involve loss of information.  The WRC has specialists who understand database internals, and WRC always wants to investigate for the root cause of any database problems.

Erik Hemdal · Oct 21, 2022 go to post

Shane, I'm glad you figured a way forward.  VistA is special among applications.  If you run into more issues, it might be worth contacting the WRC for help.

Erik Hemdal · Oct 19, 2022 go to post

I second @Dmitry Maslennikov 's question here.  Adding SSH to your container makes ongoing maintenance more complicated and there may be alternatives that will be simpler for you long-term.
 

Erik Hemdal · Oct 5, 2022 go to post

Hi Paul, I think you might better talk with the WRC or your Linux distro's support folks depending on whether this works at the Linux shell or not.  Is this trouble with Linux and Caché (start with your vendor) or just with Caché (ask the WRC).  If you have a contact at VMS Software, they might have a reliable solution too.

Something like

PROCAUTO  = /JRNDSK/ProcAuto_share

just sets up a variable, so you can replace the string '/JRNDSK/ProcAuto_share' with $PROCAUTO.  There's nothing fancier than that.  Linux doesn't have a notion of a "shortcut" like Windows does (although some Linux GUI's do give you shortcuts, just at application level) and certainly nothing like VMS logicals.  Soft links and hardlinks are just different ways of giving a file a different name at a different location, so there's nothing to pass in, really.

I still might not be clear on what you are trying to achieve though, and the details matter a bit -- things are a little different if you are using Samba or another shared filesystem and what you're trying to pass the link into.  There may be another way to reach your goal...or maybe not.  VMS is very different from Linux and you've bumped into one of those really nice VMS features that people miss when they move away.

If you are overall trying to migrate from Caché on VMS, I would certainly talk with your account team because there's a lot of learning gained from working with customers who have started that journey. 

Erik Hemdal · Sep 21, 2022 go to post

Hi Paul! Nice to see you on the DC.  These questions are hard because there's a lot of"plumbing" to think about.

If I understand things right, you have an environment variable for root set via a script file in /etc/profile.d and that's working at the Linux shell.  But you don't see that at a Cache' terminal even when you start the instance as root.

Caché daemons will run as the instance owner and user jobs would run with the privilege and environment of the user who logs in.  Otherwise, an ordinary user could open a Caché terminal and work with root privilege.  I'm writing this from memory, and Caché 2015.1 is pretty old now, so there might be some other details that are relevant (and which I'm forgetting).

Another issue is that when you use the sudo or su commands to "become root", you don't necessarily get root's environment unless you use the right options for that.  The manpages for sudo and su should help you figure out this.

If you can tell us some more about what you are trying to accomplish with the link, we might be able to help further.  If you are trying to establish links to files that are established when the instance starts and persist, what you need might better be addressed with actions in SYSTEM^%ZSTART or ^ZSTU.  For that, the WRC might also be able to help.

Good luck.  I hope this at least gives you a trailhead to solve the trouble.

Erik Hemdal · Aug 1, 2022 go to post

If you  are still having trouble, Phillip, contact the WRC and have a support advisor discuss how to recover the instance.  If you need assistance determining how to size your system for journaling, start with your account manager for help.