Ray Fucillo · Sep 14, 2020 go to post

I said that only because managing database size with SLM can be painful operationally: having to predict where the growth is going to be and coordinate a configuration change in advance of the new mapping range getting used by the application.  I did not mean to imply that anything bad happens when you do this.  In fact, if the growth of a global isn't bounded by some natural data lifespan, or some application-level archival process, then SLM is unavoidable with a sufficient rate of growth.  By planning in advance for the growth, though, and starting the largest expected globals mapped to their own databases, you might stave that off for a long time. 

Note: there's a little runtime cost to resolving SLM that doesn't exist for (whole) global mapping, but it's generally a noise-level cost unless you've generated a very complex set of mappings (more complex than you'd likely do as a manual configuration step)

Ray Fucillo · Sep 11, 2020 go to post

From my perspective, the main reason to run integrity check is so that if you ever did have database degradation, you know that you have a backup that you can recover from.  I've seen too many disasters of the form that corruption is discovered that predates any available backup.  For use cases that would never recover from backup or mirrored copies or the like for disaster recovery, you might reasonably argue that integrity check isn't worth the effort/cost.   

(As a detail, just accessing a corrupted global won't  hang the system, but the system will hang if corruption causes a SET or KILL to fail in the middle of a multi-block update.)

Anyway, to your good thoughts about possible enhancements:

  • It turns out that one of my recent enhancements, as yet unreleased, did open up a possibiility of a "pointer block only" check (as a side effect of a different goal).  However, I don't think it's very valuable because pointer blocks make up a very small fraction of all blocks.  For typical patterns of subscripts, there's in the neighborhood 300-500 data blocks pointed to by a pointer block in 8KB databases, so you're talking about ~0.2% of all the blocks.  I don't think you'd draw any meaningful conclusion from a clean check that didn't include data blocks.  Don't be confused that most integrity check error starts with "Error while processing pointer block %d".  That's just the way integrity check works.  The vast majority of those are from a bottom pointer block and were found only because it read every data block under it to find the inconsistency.
  • We do actually have some protection against errors in the most recently written blocks (following a crash) via the Write Image Journal block comparison.  It's a totally different mechanism, but it is designed with the thought that when systems lose power, there's some history of drives dropping or corrupting the most recent writes, despite promises that they had already succeeded (via our very careful use of fsync() and similar mechanisms).
  • About piggy-backing on increment change tracking, it's an interesting idea, but again I worry that many of the failure modes that lead to corruption wouldn't necessarily get uncovered, and so it doesn't give the guarantee you need from integrity check in order to know that a backup image could be relied upon in a disaster.  
Ray Fucillo · Sep 5, 2020 go to post

I think integrity check isn't the primary driver of that architectural decision, but it might be part of the consideration.  Any single database is constrained to a max size of 2^32 blocks, so 32TB for standard 8KB block size.  There's practical reasons not to go anywhere near that high: backup/restore and other operational tasks on a single database may be more onerous,  AIX/JFS2 has a 16TB file limit anyway, integrity check has less ability to be parallelized if the huge database is also primarily a single global, (and if you're running older versions there's a couple bugs involving databases that have more than 2^31 blocks, all fixed in latest maintenance kits).

Given these and other considerations, I believe most sites shoot for max database sizes somewhere between 2 and 10 TB.   So for 100TB we're talking about a few dozen databases.   You'd hope that much data, especially if it's largely in active use, is spread over a significant number of different globals (e.g. many tables and their indices).  Ideally you use global mappings in anticipation of such huge growth to organize the globals into databases and as much as possible avoid the need to use subscript level mapping (SLM) to manage growth of a single global across multiple databases.  If growth is unbounded though (ie this isn't data that sort of data that can eventually be moved to some separate archive structure) then subscript level mapping to map across these dozen or more databases becomes inevitable. 

As for running integrity check on that much data, it will take some substantial time and you need to find the balance of how frequently you want to run it, how much storage bandwidth is reasonable for it to consume, and whether you can run it on an offline copy.  Since the other factors I mentioned already put you into have a multitude of separate databases (with any giant globals spread over some number of them via SLM), integrity check will be able to be well parallelized.

Ray Fucillo · Mar 5, 2018 go to post

Short answer: yes, you can certainly do this if you want to and the result is valid.  The main downside, in my opinion, is that the backup is then dependent on more technology, so there are more things that could go wrong.  More on that later.

If you're going to to this though, you really don't want to end up with Online Backup as your backup solution.  The problem with online backup is not consumption of resources, but time to restore,  I thought you were going to say you wanted the DR system so that you could shut it down for a couple hours while you take a cold external backup.  That would be a pretty good reason to do this.   

Since mirrored databases record their journal location inside the database, they intrinsically know from what journal file they need to "catch up" (the mirror checkpoint info).  Like all the usual backup solution, the result is not transactionally consistent in and of itself, but requires journal restore following backup restore to get to a transactionally consistent state. Mirroring makes this easier via the aforementioned checkpoint and the automatic rollback as part of becoming primary.  Of course it's the mirror journal files, not the DR's own journal files that will be used for this, but they live in the same directory, so if you just back that up in the same backup, you'll have the right stuff if it ever came to restoring this.

Now more about those downsides.  Backing up a replica means that you are subject to any problems with the replication.   For example, if a database on the DR had a problem and we had to stop dejournaling to it, that could mean your backup isn't good.  You'd worry a bit that you didn't notice because nobody is running on the DR system.  Or if you add a database to the primary but forget to add the same to the DR, your backup wouldn't have it.  These aren't meant to say this is a bad idea, but it is a consideration.   You want to think a bit about what you're trying to protect against.  You're talking about having a DR, so if you're restoring backup it means that something went wrong with both the primary and the DR.  So is the backup of the DR good in that situation?   If both are in the same physical location and your backing up in case that location is destroyed, then you're protected.  Or if you're backing up to handle the case of errant/malicious deletion of data, then you're protected.  

I don't know what your situation is with the main server, but I'd be curious how the system architect expects backups to take place and how long a backup of the disks are expected to take.  With a large global buffers, ExternalFreeze() can be workable in some application environments even if the freeze will last many minutes. If your operating environment is such that good backups are an absolute must, you might be better off investing in getting external backup working over there.

Ray Fucillo · Sep 1, 2017 go to post

Ah, I think we found the confusion!  Canonical number and internal type are different concepts.  A canonical number can have internal string type.  An internal numeric type (int, float, double) will always be canonical.  What do you want your assert to say if your method did this...

 set $p(canonicaldata,",",2)=+$p(data,",",2)
 set test.Amount=$p(canonicaldata,",",2)

Now test.Amount is canonical, but also a string so

>w test.Amount=0.1,!,test.Amount=.1,!,test.Amount=".1"
1
1
1

What should your assert method say about that?  OK or NOT OK.  If OK, then you want it to test that actual=+expected.  If not OK, then you want one of the tricks that breaks this abstraction

Ray Fucillo · Sep 1, 2017 go to post

Ah, I think we found the confusion!  Canonical number and internal type are different concepts.  A canonical number can have internal string type.  An internal numeric type (int, float, double) will always be canonical.  What do you want your assert to say if your method did this...

 set $p(canonicaldata,",",2)=+$p(data,",",2)
 set test.Amount=$p(canonicaldata,",",2)

Now test.Amount is canonical, but also a string so

>w test.Amount=0.1,!,test.Amount=.1,!,test.Amount=".1"
1
1
1

What should your assert method say about that?  OK or NOT OK.  If OK, then you want v=+v.  If not OK, then you want one of the tricks that breaks this abstraction

Ray Fucillo · Sep 1, 2017 go to post

Continuing this with your example, I'm saying you have to consider what you'd want the following to return.

>s y=+x.Amount w y ; y is now canonical form, and internally a float
.1
>s $p(tmp,",",2)=y,y=$p(tmp,",",2) w y=0.1; y is still canonical so it's =, but internally a string
1
>w AssertNumberEquals(y,0.1)
???? what's this going to return
Ray Fucillo · Sep 1, 2017 go to post

This is just definitional.  By "fail" I meant generate an assertion failure and it will do so for any canonical number if it happens to be stored internally as a string.  You've recently been saying this is what you want so I accept that.  This is going full circle again, but on the off chance that this is helpful to you or someone else, I'll take one last shot at explaining why I think that definition is not desirable.  Consider I write the following method 

ClassMethod foo() As %Float {
  set x=1.1 ; x is a number in canonical form
  set $piece(a,",",1)=x
  ...  other stuff ...
  quit $piece(a,",",1)
}

This method is perfectly correct in returning a floating point number.  It will also be in canonical form, so that it will test as = against any other canonical copy of 1.1 that you have.  But your assertion code will say the return value of my method doesn't equal 1.1 because it happens to internally have string type.  You would tell me that I should change my code to return +$piece(a,",",1) instead, but that is strictly not necessary.  The difference is only visible if you break the typeless abstraction layer and find a trick (like you've done) to peek into the internals.

You can certainly define your requirement to be stricter than this as you have and say that you want to require that the number would act as a number in one of the special functions that can tell the difference ($LB, $ZH, $ZB(), dyn arrays).  That's a fine definition, but it is special.  So it comes down to where you check this assertion.  Most COS programmers I know would not use the unary + in my method; rather they would use the unary + upon passing that value to one of aforementioned special functions.  

The definition I thought you were originally going for (when you liked sorts after) would be to accept any number that will evaluate as = to a copy of itself that had been passed through arithmetic operators, and for that the answer is to test value=+value.  (Side note: v=+v is better for this than sorts after $c(0) because it is invariant and meets my definition for things like "1111222233334444555566667777".)

Ray Fucillo · Aug 31, 2017 go to post

I promise this is the last thing I'll say on this topic :) But..

1. This has different results than John Murray's sorts-after suggestion that you originally liked so much.  And now that I understand what you're doing, I too like that suggestion much better (just make sure the local collation is what you want) since it at least plays by the COS rules.  The difference is that the method above will fail numbers in canonical form just because they happen to have string type under the covers.  John's suggestion will properly pass all canonical numbers regardless of how they came to be.

2. For anyone who might come along later and encounter this answer, we should warn them that this is for Sean's highly specialized purposes, relies on internal implementation details that may change, and in general is specifically intended to break an abstraction layer that COS otherwise provides.

Ray Fucillo · Aug 31, 2017 go to post

Hi Sean,

OK. I don't know of any direct way to access a variable's type.  Last little bit of food for thought...

Even if there were such a function, though, I'd consider it an internal detail that wouldn't necessarily be reliable.  Take as a trivial example 'set x="1234",x=x+0'.  Today, under the covers, x starts out as a string and then changes to an integer when it gets assigned the result of the addition operation.  You could imagine a future where a compile- or run- time optimization notices that it can just leave x unchanged as its string type 1234.  This is entirely an implementation detail and the optimization wouldn't violate any rules of the language.  Note that in the case of "set x="0.5",x=x+0", we would be obligated to leave x as having value ".5", not "0.5" due to the canonicalization rules, but even then we're not obligated to internally make it a floating point type rather than a string type. 

Would we ever really do this?  I don't know.  Unfortunately because there are things like $LB and $ZHEX that expose bits of these internal details in some fashion, you'd worry about compatibility implications.  But fundamentally, the internal type is just a detail for the Caché virtual machine to manage internally in doing whatever it needs to do to present the typeless COS language to the application.

Ray Fucillo · Aug 30, 2017 go to post

Sean, I think your post reveals a couple misunderstandings that relate to this problem.  Let me comment on a couple, though at this point, I'm not sure how helpful I'm being to you...

If "1.5"=1.5 is true, then arguably "0.5"=0.5 should also be true, but it is not. This means that developers should be wary of automatic equality coercion on floating point numbers.

It's very important to understand what's going on here because it's central to your question.  "1.5"=1.5 because 1.5 is a number in canonical form.  "0.5" does not equal 0.5 because 0.5 is a numeric literal, and so that literal 0.5 gets canonicalized before being evaluated in the equals.  This is exactly expected and well-defined and not really arguable.  Literals are one thing, but programs are going to most likely get both sides of the equality from some calculation, string extraction, or user input.  If one side of the equality was either a numeric literal or came through some numeric operation, then it is canonicalized, whereas the other side may or may not be, thus possibly failing the equality check unless you explicitly use the unary +. 

To make things a little more interesting, a persistent object will automatically coerce a %Float property to a true number value when saved. That's fine, but what if the developer is unaware that he / she is assigning a stringy float value and later performs a dirty check between another stringy float value and the now saved true float number. The code could potentially be tripped up into processing some dirty object logic when nothing has changed.

I understand exactly what you're saying here, but I want to make sure that this behavior doesn't seem mysterious.  All that's going on here is that saving an object invokes %Normalize for all the object properties before saving.  You can do the same any time you want if you have a need to do so.  Remember though that COS is a typeless language so developers should absolutely NOT expect to need to manage the type of their data.  Consider that I store an integer as second comma-delimited piece of a string.  Now I have a %Integer method where I'll return that piece.  All is well and I do not need to use the unary +.  However, your sample assert method would generate a false positive failure because the number I returned in this way internally has string type.  That's not correct though, and you should not be writing code to try to expose the internal type of local variables.  The fact that certain special operations must expose the internal type (like the internal $listbuild structure, $zhex, and this dynamic array typing stuff) is a detail specific to those particular functions and shouldn't be considered a backdoor to imposing types on COS, which is typeless.  (BTW, I'm not 100% convinced that it's correct for "1" to become a string in these dynamic arrays, but I'm not going to get into that!)

If I can interpret your goals more generally, it sounds like you're trying to impose a coding convention that at certain places in your application, you want certain value to have been already normalized through the appropriate normalization for their datatype class, so that evaluation with the = operator can be used for logical equality.  You're using %Float as a specific example of that which is interesting in that it gets into how the language canonicalizes number.  But, one could easily imagine wanting the same thing for any arbitrary data type for which only the %Normalize method will do.  If that's what you're really after, then you could easily write an AssertNormalizedValue(value,datatype) which generates an asssertion failure if value'=$classmethod(datatype,"%Normalize",value)... or something like that.  

Ray Fucillo · Aug 30, 2017 go to post

Sorry for all the duplicate replies... It won't seem to let me place the comment in the right place!

Ray Fucillo · Aug 30, 2017 go to post

Sorry for all the duplicate replies... It won't seem to let me place the comment in the right place!

Ray Fucillo · Aug 29, 2017 go to post

See my other comment above, but I don't think relying on what the dynamic array implementation picks for a type to convert to is a great idea. I'd like to see you find a solution in the core of the (typeless) language. If you really are just trying to implement AssertNumericEquals(actual,expected), then that's simply 'if +actual'=+expected { FAILED }'.  This will pass any value 'actual' that would evaluate in an arithimatic operation as the value 'expected' would.  Similarly, if you are trying to implement AssertEqualsCanonicalNumber(actual,expected), then it's 'if actual'=+expected { FAILED }'. That one will pass only the value 'actual' if it exactly is the canonicalized expected value (and thus could be compared to that number with the = operator).  If you want AssertIsCanonical(actual), that's 'if actual'=+actual { FAILED }'.  That one, of course will pass any number in its canonical form.

Ray Fucillo · Aug 29, 2017 go to post

I'm not sure that I have a precise definition of what you are trying to achieve.  If you can define it, I might be able to help more.  However, there is some confusion in your example that I think needs clarification.

What you're dealing with here is the rules about canonical numbers.  (x=+x) will indeed evaluate whether a number is in canonical form because the equals operator test for exactly equal strings and the unary + converts a value to a number in canonical form.  The reason your first example above returns true is just that you set x equal to a numeric literal, so it got converted to canonical form before it even got set into the variable.  (if you look at the value in x, it would not have had a leading zero)

If you haven't read it before, this portion of the doc (along with the linked references) is a pretty good treatment of this subject. http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY…

 

String-to-Number Conversion

Ray Fucillo · Aug 28, 2017 go to post

If you log the error with ^%ETN (as in DO BACK^%ETN, or LOG^%ETN, or exceptionobject.Log()), the SETs to the ^ERRORS global are done with the transaction "suspended", so that it does not roll back.  In the future, we will be exposing this functionality for use in applications.  These get recorded in the journal so that they are recovered upon a system crash or restored in a journal restore, but they are omitted from rollback.  As others have said, ^%NOJRN is not the answer because it is ignored for mirrored databases.

Ray Fucillo · Aug 28, 2017 go to post

I hesitate to comment on this because you know the answer, but it seems that if you're trying determine if a value is a number in canonical form, it's hard to beat testing that (x=+x).  

I don't think we should be so excited about the suggestion for sorts after $c(0), because that introduces dependencies on the the current local collation strategy.  Whatever answer you choose, I think you should require it to be invariant 

Ray Fucillo · Jun 8, 2017 go to post

Again, "1.0" is not a canonical number; "2.2" is.  Both are valid numbers, but only one is in canonical form.  So exactly what you quoted here is the reason for this behavior.

Since both are valid numbers, you don't have to use + for any function that evaluates them as numbers or as boolean.  You do have to use + any time you desire conversion to canonical form (like equality, array sorting, etc).

Ray Fucillo · Jun 7, 2017 go to post

This behavior looks correct to me (but it's tricky).  The reason is that the string "2.2" is a number in canonical form, so it collates with the numeric subscripts. "1.0" is non-cananonical, so it's stored as a string subscript.  Sorts after operation is all about resolving subscript ordering.  You can convince yourself of this behavior by actually setting these as subscripts in a global or local variable and then ZWRITE'ing it.

The same reasoning is why "2.2" = 2.2 evaluates true but "1.0" = 1.0 is false.

Note, of course, that numeric conversion will happen as part of any arithmatic operation so "1.0" still functions as 1 in such operations.

Ray Fucillo · Apr 5, 2017 go to post

If you have a true moment-in-time snapshot image of all the pieces of Caché (databases, WIJ, Journals, installation/manager directory, etc), then restoring that image is, to Caché, just as though the machine had crashed at that moment in time.  When the instance of Caché within that restored image starts up, all Caché's usual automatic recovery mechanisms that give you full protection against system crashes equivalently give you full protection in this restore scenario.

Whether a given snapshot can be considered crash-consistent comes from the underlying snapshotting technology, but in general that's what "snapshot" means.  The main consideration is that all of the filesystems involved in Caché are part of the same moment-in-time (sometimes referred to as a "consistency group").  It's no good if you take an image of the CACHE.DAT files from one moment in time with an image of the WIJ or Journals from another.

Most production sites wouldn't plan their backups this way because it means that the only operation you can do on the backup image is restore the whole thing and start Caché.  You can't take one CACHE.DAT from there and get it to a consistent state.  But, in the case of snapshots of a VM guest, this does come up a fair bit, since it's simple to take an image of a guest and start it on other hardware.  

Let me know if you have questions.

Ray Fucillo · Apr 5, 2017 go to post

You will start the restore at that file that was switched to (your .003 file), and that file contains metadata that allows us to find the oldest open transaction to rollback.  The rollback as part of journal restore will scan backwards in the journal stream to find it if needed.  If you need to know what that oldest file will be, you can get it via the RequiredFile output parameter of ExternalFreeze() or by calling %SYS.Journal.File:RequiredForRecovery() before calling ExternalFreeze().  Again though, you don't need to start the journal restore from here, just have it (and the journal.log to find it) available at restore time.  So, if you're backing up and restoring all journals that are on the system, this basically takes care of itself.

Ray Fucillo · Apr 4, 2017 go to post

Upon return from ExternalFreeze(), the CACHE.DAT files will contain all of the updates that occurred prior to when it was invoked.  Some of those updates may, in fact, be journaled in the file that was switched to (the .003 file in your example), though that doesn't really matter for your question.

BUT, you still need to do journal restore, in general, because the backup image may contain partially committed transactions and journal restore is what rolls them back, even if the image of journals that you have at restore time contains no newer records than the CACHE.DAT files do.  This is covered in the Restore section of documentation, which I recommend having a look at:  http://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY…

There is an exception to this, and that is if you are a crash-consistent snapshot of the entire system, including all CACHE.DAT files, the manager directory, journals, and the WIJ.  In that case, all the crash-consistency guarantees that the WIJ and journals confer mean that when you start that restored image, the usual startup recovery actions will take care of any required roll forward and roll back from journals automatically.   In that scenario with crash-consistent snapshots, ExternalFreeze() wasn't even needed to begin with, because crash-consistent snapshot is by definition good enough.  However, ExternalFreeze() is typically used for planned external backups because it allows you to restore a subset of databases rather than requiring restore of the entire system.

Ray Fucillo · Feb 11, 2017 go to post

%Library.Device class has GetMnemonicDirectory() and GetMnemonicRoutine()

Ray Fucillo · Dec 28, 2016 go to post

A few comments:

1. Similar to what Alexey said, any time you're using a mix of data that is journaled and non-jounaled but also not temporary (will survive a restart), you have to remain keenly aware of recovery semantics.  After a crash and restart, the journaled data will be at a later point in time than the non-journaled data.  It's only pretty special cases where data is meant to persist across restarts, but doesn't really have to be as up to date as the rest for the integrity of the application.  This needs to be considered in the development cycle.

2. If using non-journaled databases, be aware of their recovery semantics; it can be a bit non-intuitive. Transactions are journaled there for satisfying rollback at runtime, but that journal information is not used during journal recovery or rollback at startup so transaction there are not atomic or durable (even if in synchronous commit mode) across restarts.  What this does get you is that all data in all the journaled databases are recovered to the same moment in time after a crash, regardless of whether they were in transaction or not.

3. Mirrored databases ignore the process ^%NOJRN flag discussed in e. (though it is honored for non-mirrored databases on mirror members).   

Ray Fucillo · Oct 31, 2016 go to post

It's important to start by saying that mirroring already handles this automatically for the most common cases, and it is more the exceptional case that would require the original failover members to be rebuild after no-partner promotion.  As long as the original members really did go down in the disaster and the DR member is relatively up to date (a few seconds or even a few tens of seconds of data loss), then it is usually the case that the original members can reconcile automatically when they reconnect (as DR asyncs) to the new primary.  That's because the state of the CACHE.DAT files on disk did not advance past the journal position from which the DR member took over.   This is not a guarantee, but it is the case in most disasters for which this intends to cover.

The features Bob mentioned to survey other reachable members automatically helps make sure that the DR member becoming primary has all the data that is possibly available to it at the time (while not preventing it from becoming primary if it cannot).

The main case where this automatic reconciliation cannot happen is if the failover member(s) got isolated but did not crash, or at least did not crash right away.  In that case, if you choose to promote the DR member and accept this larger amount of data loss in the process, then indeed you expect the on-disk CACHE.DAT state to have advanced into a part of the journal that the DR member never had (and probably cannot get)

Regarding the enhancement you mention, there are no plans at the moment, though it's certainly a reasonable idea. 

Ray Fucillo · Aug 18, 2016 go to post

This is the latest maintenance release and I know of no bug like this there, so this needs to be investigated to understand the error.  To your original question, there is nothing special you need to do to run SQL SELECT against an async mirror member, even when its databases are read-only.

Ray Fucillo · Aug 18, 2016 go to post

There were problems like this in query compilation in some versions?  What error do you get and what version are you using?