This behavior looks correct to me (but it's tricky).  The reason is that the string "2.2" is a number in canonical form, so it collates with the numeric subscripts. "1.0" is non-cananonical, so it's stored as a string subscript.  Sorts after operation is all about resolving subscript ordering.  You can convince yourself of this behavior by actually setting these as subscripts in a global or local variable and then ZWRITE'ing it.

The same reasoning is why "2.2" = 2.2 evaluates true but "1.0" = 1.0 is false.

Note, of course, that numeric conversion will happen as part of any arithmatic operation so "1.0" still functions as 1 in such operations.

If you have a true moment-in-time snapshot image of all the pieces of Caché (databases, WIJ, Journals, installation/manager directory, etc), then restoring that image is, to Caché, just as though the machine had crashed at that moment in time.  When the instance of Caché within that restored image starts up, all Caché's usual automatic recovery mechanisms that give you full protection against system crashes equivalently give you full protection in this restore scenario.

Whether a given snapshot can be considered crash-consistent comes from the underlying snapshotting technology, but in general that's what "snapshot" means.  The main consideration is that all of the filesystems involved in Caché are part of the same moment-in-time (sometimes referred to as a "consistency group").  It's no good if you take an image of the CACHE.DAT files from one moment in time with an image of the WIJ or Journals from another.

Most production sites wouldn't plan their backups this way because it means that the only operation you can do on the backup image is restore the whole thing and start Caché.  You can't take one CACHE.DAT from there and get it to a consistent state.  But, in the case of snapshots of a VM guest, this does come up a fair bit, since it's simple to take an image of a guest and start it on other hardware.  

Let me know if you have questions.

You will start the restore at that file that was switched to (your .003 file), and that file contains metadata that allows us to find the oldest open transaction to rollback.  The rollback as part of journal restore will scan backwards in the journal stream to find it if needed.  If you need to know what that oldest file will be, you can get it via the RequiredFile output parameter of ExternalFreeze() or by calling %SYS.Journal.File:RequiredForRecovery() before calling ExternalFreeze().  Again though, you don't need to start the journal restore from here, just have it (and the journal.log to find it) available at restore time.  So, if you're backing up and restoring all journals that are on the system, this basically takes care of itself.

Upon return from ExternalFreeze(), the CACHE.DAT files will contain all of the updates that occurred prior to when it was invoked.  Some of those updates may, in fact, be journaled in the file that was switched to (the .003 file in your example), though that doesn't really matter for your question.

BUT, you still need to do journal restore, in general, because the backup image may contain partially committed transactions and journal restore is what rolls them back, even if the image of journals that you have at restore time contains no newer records than the CACHE.DAT files do.  This is covered in the Restore section of documentation, which I recommend having a look at:  http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=...

There is an exception to this, and that is if you are a crash-consistent snapshot of the entire system, including all CACHE.DAT files, the manager directory, journals, and the WIJ.  In that case, all the crash-consistency guarantees that the WIJ and journals confer mean that when you start that restored image, the usual startup recovery actions will take care of any required roll forward and roll back from journals automatically.   In that scenario with crash-consistent snapshots, ExternalFreeze() wasn't even needed to begin with, because crash-consistent snapshot is by definition good enough.  However, ExternalFreeze() is typically used for planned external backups because it allows you to restore a subset of databases rather than requiring restore of the entire system.

%Library.Device class has GetMnemonicDirectory() and GetMnemonicRoutine()

A few comments:

1. Similar to what Alexey said, any time you're using a mix of data that is journaled and non-jounaled but also not temporary (will survive a restart), you have to remain keenly aware of recovery semantics.  After a crash and restart, the journaled data will be at a later point in time than the non-journaled data.  It's only pretty special cases where data is meant to persist across restarts, but doesn't really have to be as up to date as the rest for the integrity of the application.  This needs to be considered in the development cycle.

2. If using non-journaled databases, be aware of their recovery semantics; it can be a bit non-intuitive. Transactions are journaled there for satisfying rollback at runtime, but that journal information is not used during journal recovery or rollback at startup so transaction there are not atomic or durable (even if in synchronous commit mode) across restarts.  What this does get you is that all data in all the journaled databases are recovered to the same moment in time after a crash, regardless of whether they were in transaction or not.

3. Mirrored databases ignore the process ^%NOJRN flag discussed in e. (though it is honored for non-mirrored databases on mirror members).   

It's important to start by saying that mirroring already handles this automatically for the most common cases, and it is more the exceptional case that would require the original failover members to be rebuild after no-partner promotion.  As long as the original members really did go down in the disaster and the DR member is relatively up to date (a few seconds or even a few tens of seconds of data loss), then it is usually the case that the original members can reconcile automatically when they reconnect (as DR asyncs) to the new primary.  That's because the state of the CACHE.DAT files on disk did not advance past the journal position from which the DR member took over.   This is not a guarantee, but it is the case in most disasters for which this intends to cover.

The features Bob mentioned to survey other reachable members automatically helps make sure that the DR member becoming primary has all the data that is possibly available to it at the time (while not preventing it from becoming primary if it cannot).

The main case where this automatic reconciliation cannot happen is if the failover member(s) got isolated but did not crash, or at least did not crash right away.  In that case, if you choose to promote the DR member and accept this larger amount of data loss in the process, then indeed you expect the on-disk CACHE.DAT state to have advanced into a part of the journal that the DR member never had (and probably cannot get)

Regarding the enhancement you mention, there are no plans at the moment, though it's certainly a reasonable idea. 

This is the latest maintenance release and I know of no bug like this there, so this needs to be investigated to understand the error.  To your original question, there is nothing special you need to do to run SQL SELECT against an async mirror member, even when its databases are read-only.

There were problems like this in query compilation in some versions?  What error do you get and what version are you using?