Search | InterSystems Developer Community

All

The European Space Agency: Charting the Galaxy with the Gaia Satellite and InterSystems Caché

Abstract The European Space Agency (ESA) has chosen InterSystems Caché as the database technology for the AGIS astrometric solution that will be used to analyze the celestial data captured by the Gaia satellite. The Gaia mission is to create an accurate phase-map of about a billion celestial objects. During the mission, the AGIS solution will iteratively refine the accuracy of Gaia's spatial observations, ultimately achieving accuracies that are on the order of 20 microarcseconds. In preparation of the extreme data requirements for this project, InterSystems recently engaged in a proof-of-concept project which required 5 billion discrete Java objects of about 600 bytes each to be inserted in the Caché database within a span of 24 hours. Running on one 8-core Intel 64-bit processor with Red Hat Enterprise Linux 5.5, Caché successfully ingested all the data in 12 hours and 18 minutes, at an average insertion rate of 112,000 objects/second. gaia-crunching-data-to-map-milky-way.pdf

Announcement

Janine Perkins · May 5, 2016

Featured InterSystems Online Course: Configuring a Mobile Device for Application Deployment

Learn how to push an application to both an iOS and an Android device. Configuring a Mobile Device for Application Deployment This course discusses the process needed to push an application to both an iOS and Android device. Learners can choose which device they would like to learn about, or they can step through both to understand the settings and steps needed to push to both device types. Learn More.

Announcement

Janine Perkins · Jul 29, 2016

Featured InterSystems Online Course: Getting Started with FHIR in HealthShare

Learn why and when to use FHIR in your applications, common use cases for it, the general architecture of FHIR data, and the tools available to you in InterSystems HealthShare Health Connect. Learn More. Thank you for the online course! Very much needed.The course videos below are not available. Getting an error "No playable video sources found":WHAT IS FHIR?FHIR VS. HL7V2, HL7V3, AND CDAFHIR ARCHITECTUREAlso, it would be nice to get access to the custom code show in "FHIR IN HEALTHSHARE DEMONSTRATION", specifically, "Summit.BPL.V2ToFHIR." I'm sure many developers would like to know how to convert good old HL7v2 messages to FHIR resources using HealthConnect. Video's are not available. We've re-uploaded the videos and are working again. Please try the course again and let us know if there any further problems.

Announcement

Janine Perkins · Jan 3, 2017

Featured InterSystems Online Resource: Custom Business Components Learning Path

Announcing the Custom Business Components learning path! This learning path is designed for software developers who need to build custom business components for their productions.The learning path includes the following courses: 1. Building Custom Ensemble Messages2. Building Custom Business Operations3. Building BPL Business Processes4. Building Custom Business Services 5. Coming Soon: Building Custom Business ServicesAccess the learning path. We're really excited about this learning path as its something many of you have asked for! Have you taken the Building and Managing HL7 Productions classroom course but now need to create custom components? This is a great next step! As always, please let us know if you have other suggestions for courses, learning paths, or best practices that we can incorporate in future learning content.

Announcement

Janine Perkins · Mar 15, 2016

Featured InterSystems Online Courses: Searching Messages Using the Message Viewer

Have you ever needed to find a record for a particular person in your inbound data stream? Searching messages will enable you to find messages using an array of search capabilities.Searching Messages Using the Message ViewerSearching Messages Using the Message Viewer introduces the Message Viewer and details how to access it,and describes the process for modifying the message list display. Learn More. Nice little course, thanks!

Announcement

Janine Perkins · May 31, 2016

Featured InterSystems Online Course: HealthShare Information Exchange Basics*

Learn the basics about HealthShare Information Exchange, the architecture and common ways it is used.Find out how to perform a patient search, identify the main parts of HealthShare Information Exchange and the main purposes of each. Learn More.*This course is available to our HealthShare customers only.

Announcement

Evgeny Shvarov · Sep 15, 2017

Join InterSystems Developer Meetup on 17th of October in UK, Birmingham!

Hi, Community!We are pleased to invite you to the InterSystems UK Developer Community Meetup on 17th of October!The UK Developer Community Meetup is an informal meeting of developers, engineers, and devops to discuss successes and lessons learnt from those building and supporting solutions with InterSystems products. An excellent opportunity to meet and discuss new solutions with like-minded peers and to find out what's new in InterSystems technology.The Meetup will take place on 17th of October from 5pm to 8pm at The Belfry, Sutton Coldfield with food and beverages supplied.Your stories are very welcome! Here is the current agenda for the event:TimeSessionPresenterSite5:00 pmDependencies and Complexity@John.Murraygeorgejames.com5:30 pmDeveloping modern web applications with Caché, Web Components & JSON-RPC@Sean.Connellymemcog.com6:00 pmNetworking Coffee break 6:30 pmUp Arrow Redux: Persistence as a Language Feature@Rob.Tweedmgateway.com7:00 pmFirst class citizens of the container world@Luca.RavazzoloInterSystems Product Manager If you want to be a presenter, please comment on this post below, we'll contact you. All sessions are now filled.Attendees are also invited to join us the following day for the UK Technology Summit - which is the annual gathering of the InterSystems community to discuss the technologies, strategies, and methodologies that will leverage what matters – competitive advantage and business growth.Register for the Meet Up here (link to http://www3.intersystems.com/its2017/registration) and select UK Developer Community Meet Up. Topic from @rob.tweed is introduced. We have one free slot available! And we would have a session regarding containers from @Luca.Ravazzolo, InterSystems Product Manager.Come to InterSystems Data Platform UK Meetup and InterSystems UK Summit! We would have a live stream in two hours. Join! We live now! If you have any questions for presenter you can ask it online. To accompany the YouTube video I have posted the slide deck for my talk (the first one) here. Slides from @Luca.Ravazzolo session are available here. Slide deck for my presentation on "data persistence as a language feature" is here:https://www.slideshare.net/robtweed/data-persistence-as-a-language-feature Slide #7. You touched on a very sore subject. As I understand you! My presentation made reference to a Google V8 API bottleneck issue. Here's the link to the bug tracker report:https://bugs.chromium.org/p/v8/issues/detail?id=5144#c1and the detailed benchmark tests that illustrate the problemhttps://bugs.chromium.org/p/v8/issues/attachmentText?aid=240024 Here are the slides on DeepSee Web session

Article

Константин Ерёмин · Sep 18, 2017

Search InterSystems documentation using iKnow and iFind technologies

The InterSystems DBMS has a built-in technology for working with non-structured data called iKnow and a full-text search technology called iFind. We decided to take a dive into both and make something useful. As the result, we have DocSearch — a web application for searching in InterSystems documentation using iKnow and iFind. How Caché Documentation works Caché documentation is based on the Docbook technology. It has a web interface (which includes a search that uses neither iFind nor iKnow). The articles themselves are stored in Caché classes, which allows us to run queries against this data and, of course, to create our own search tool. What is iKnow and iFind Intersystems iKnow is a technology for analyzing unstructured data, which provides access to this data by indexing sentences and instances in it. To start the analysis, you first need to create a domain — a storage for unstructured data, and load a text to it. The iFind technology is a module of the Caché DBMS for performing full-text search in Caché classes. iFind uses many iKnow classes for intelligent text search. To use iFind in your queries, you need to introduce a special iFind index in your Caché class. There are three types of iFind indexes, each offering all the functions of the previous type, plus some additional ones: The main index (%iFind.Index.Basic): supports the search for words and word combinations.Semantic index (%iFind.Index.Semantic): supports the search for iKnow objects.Analytic search (%iFind.Index.Analytic): supports all iKnow functions of the semantic search, as well as information about paths and word proximity. Since documentation classes are stored in a separate namespace, if you want to make classes available in ours, the installer also performs mapping of packages and globals. Installer code for mapping XData Install [ XMLNamespace = INSTALLER ] { <Manifest> // Specify the name of the namespace <IfNotDef Var="Namespace"> <Var Name="Namespace" Value="DOCSEARCH"/> <Log Text="Set namespace to ${Namespace}" Level="0"/> </IfNotDef> // Check if the area exists <If Condition='(##class(Config.Namespaces).Exists("${Namespace}")=1)'> <Log Text="Namespace ${Namespace} already exists" Level="0"/> </If> // Creating the namespace <If Condition='(##class(Config.Namespaces).Exists("${Namespace}")=0)'> <Log Text="Creating namespace ${Namespace}" Level="0"/> // Creating a database <Namespace Name="${Namespace}" Create="yes" Code="${Namespace}" Ensemble="" Data="${Namespace}"> <Log Text="Creating database ${Namespace}" Level="0"/> // Map the specified classes and globals to a new namespace <Configuration> <Database Name="${Namespace}" Dir="${MGRDIR}/${Namespace}" Create="yes" MountRequired="false" Resource="%DB_${Namespace}" PublicPermissions="RW" MountAtStartup="false"/> <Log Text="Mapping DOCBOOK to ${Namespace}" Level="0"/> <GlobalMapping Global="Cache*" From="DOCBOOK" Collation="5"/> <GlobalMapping Global="D*" From="DOCBOOK" Collation="5"/> <GlobalMapping Global="XML*" From="DOCBOOK" Collation="5"/> <ClassMapping Package="DocBook" From="DOCBOOK"/> <ClassMapping Package="DocBook.UI" From="DOCBOOK"/> <ClassMapping Package="csp" From="DOCBOOK"/> </Configuration> <Log Text="End creating database ${Namespace}" Level="0"/> </Namespace> <Log Text="End creating namespace ${Namespace}" Level="0"/> </If> </Manifest> } The domain required for iKnow is built upon the table containing the documentation. Since we use a table as the data source, we'll use SQL.Lister. The content field contains the documentation text, so let's specify it as the data field. The rest of the fields will be described in the metadata. Installer code for creating a domain ClassMethod Domain(ByRef pVars, pLogLevel As %String, tInstaller As %Installer.Installer) As %Status { #Include %IKInclude #Include %IKPublic set ns = $Namespace znspace "DOCSEARCH" // Create a domain or open it if it exists set dname="DocSearch" if (##class(%iKnow.Domain).Exists(dname)=1){ write "The ",dname," domain already exists",! zn ns quit } else { write "The ",dname," domain does not exist",! set domoref=##class(%iKnow.Domain).%New(dname) do domoref.%Save() } set domId=domoref.Id // Lister is used for searching for sources corresponding to the records in query results set flister=##class(%iKnow.Source.SQL.Lister).%New(domId) set myloader=##class(%iKnow.Source.Loader).%New(domId) // Building a query set myquery="SELECT id, docKey, title, bookKey, bookTitle, content, textKey FROM SQLUser.DocBook" set idfld="id" set grpfld="id" // Specifying the fields for data and metadata set dataflds=$LB("content") set metaflds=$LB("docKey", "title", "bookKey", "bookTitle", "textKey") // Putting all data into Lister set stat=flister.AddListToBatch(myquery,idfld,grpfld,dataflds,metaflds) if stat '= 1 {write "The lister failed: ",$System.Status.DisplayError(stat) quit } //Starting the analysis process set stat=myloader.ProcessBatch() if stat '= 1 { quit } set numSrcD=##class(%iKnow.Queries.SourceQAPI).GetCountByDomain(domId) write "Done",! write "Domain cointains ",numSrcD," source(s)",! zn ns quit } To search in documentation, we use the %iFind.Index.Analytic index: Index contentInd On (content) As %iFind.Index.Analytic(LANGUAGE = "en", LOWER = 1, RANKERCLASS = "%iFind.Rank.Analytic"); Where contentInd is the name of the index and content is the name of the field that we are creating an index for. The LANGUAGE = “en” parameter sets the language of the text The LOWER = 1 parameter turns off case sensitivity The RANKERCLASS = "%iFind.Rank.Analytic" parameter allows to use the TF-IDF result ranking algorithm After adding and building such an index, it can be used in SQL queries, for example. The general syntax for using iFind in SQL: SELECT * FROM TABLE WHERE %ID %FIND search_index(indexname,'search_items',search_option) After creating the %iFind.Index.Analytic index with such parameters, several SQL procedures of the following type are generated: [table_name]_[index name]Procedure name In our project, we use two of them: DocBook_contentIndRank — returns the result of the TF-IDF ranking algorithm for a request The procedure has the following syntax: SELECT DocBook_contentIndRank(%ID, ‘SearchString’, ‘SearchOption’) Rank FROM DocBook WHERE %ID %FIND search_index(contentInd,‘SearchString’, ‘SearchOption’) DocBook_contentIndHighlight — returns the search results, where the searched words are wrapped into the specified tag: SELECT DocBook_contentIndHighlight(%ID, ‘SearchString’, ‘SearchOption’,’Tags’) Text FROM DocBook WHERE %ID %FIND search_index(contentInd,‘SearchString’, ‘SearchOption’) I will go into more detail later in the article. What do we have in the end: Autocomplete in the search field As you start entering text into the search field, the system will suggest possible query variants to help you find the necessary information quicker. These suggestions are generated on the basis of the word (or its beginning) that you types. The system shows ten best matching words or phrases. This process uses iKnow, the %iKnow.Queries.Entity.GetSimilar method Fuzzy string search iFind supports fuzzy search for finding words that almost match the search query. This is achieved by measuring the Levenshtein distance between two words. Levenshtein distance is the minimal number of one-character changes (inserts, removals or replacements) necessary for turning one word into another. It can be used for correcting typis, small variations in writing, different grammatic forms (plural and singular, for exampe). In iFind SQL queries, the search_option parameter is responsible for the fuzzy search. search_option = 3 denotes a Levenshtein distance of 2. To set a Levenshtein distance equal to n, you need to set the search_option parameter to ‘3:n’ Documentation search uses a Levenshtein distance of 1, so let's demonstrate how it works: Let's type “ifind” in the search field: Let's try a fuzzy search by intentionally making a typo. As we can see, the search corrected the typo and found the necessary articles. Complex searches Thanks to the fact that iFind supports complex queries with brackets and AND OR NOT operators, we were able to implement complex search functionality. Here's what you can specify in your query: word, word combination, one of several words, exceptions. Fields can be filled one by one, or all at once. For example, let's find articles containing the word “iknow”, the combination “rest api” and those that contain either “domain” or “UI”. We can see that there are two such articles: Please note that the second one mentions Swagger UI, so we can modify the query to make it exclude those ones that do not contain the word Swagger. As the result, we will only find one article: Search results highlighting As stated above, the use of an iFind index creates the DocBook_contentIndHighlight procedure. Let's use the following: SELECT DocBook_contentIndHighlight(%ID, 'search_items', '0', '<span class=""Illumination"">', 0) Text FROM DocBook To get the resulting text wrapped into a tag <span class=""Illumination""> This helps you to visually mark search results on the front-end.Search results ranking Find is capable of ranking results using the TF-IDF algorithm. TF-IDF is often used in text analysis and data search tasks – for example, as a criterion of relevance of a document to a search query. As the result of the SQL query, the Rank field will contain the weight of the word that will be proportionate to the number of times the word was used in an article, and reversely proportionate to the frequency of the word’s occurrence in other articles. SELECT DocBook_contentIndRank(%ID, ‘SearchString’, ‘SearchOption’) Rank FROM DocBook WHERE %ID %FIND search_index(contentInd,‘SearchString’, ‘SearchOption’) Integration with the official documentation search After installation, a “Search using iFind” button will be added to the official documentation search. If the “Search words” field is filled, you will be taken to the search results page after clicking the “Search using iFind” button. If the field is empty, you will be taken to the new search page. Installation Download the Installer.xml file from the latest release available on the corresponding page.Import the loaded Installer.xml file into the %SYS namespace and compile it.Enter the following command in the terminal in the %SYS namespace: do ##class(Docsearch.Installer).setup(.pVars) After that, the search will be available at the following address localhost:[port]/csp/docsearch/index.html Demo An online demo of the search is available here. Conclusion This project demonstrates interesting and useful capabilities of iFind and iKnow technologies that make data search more relevant. Any comments or suggestions will be highly appreciated. The entire source code with the installer and the deployment guide is available on github Hi Konstantin,thanks for sharing your work, a nice application of iFind technology! If I can add a few ideas to make this more lightweight:Rather than creating a domain programmatically, the recommended approach for a few versions now has been to use Domain Definitions. They allow you to declare a domain in an XML format (not much unlike the %Installer approach) and avoid a number of inconveniences in managing your domain in a reproducible way.From reading the article, I believe you're just using the iKnow domain for that one EntityAPI:GetSimilar() call to generate search suggestions. iFind has a similar feature, also exposed through SQL, through %iFind.FindEntities() and %iFind.FindWords(), depending on what kind of results you're looking for. See also this iFind demo. With that in place, you may even be able to skip those domains altogether :-)thanks,benjamin Thank you, Benjamin.I will keep your ideas in mind.Thanks, Konstntin Eremin. Thanks for posting this Konstantin. For a long time I have been wondering why InterSystems hadn't done this already. I've had something simple running on my laptop already a long time ago, but the internal discussion on how to package it proved a little more complicated. Among other things, an iFind index requires an iKnow-enabled license (and more space!), which meant you couldn't simply include it in every kit.Also, for the ranking of docbook results, applying proper weights based on the type of content (title / paragraph / sample / ...) was at least as important as the text search capabilities themselves. That latter piece has been well-addressed in 2017.1, so docbook search is in pretty good shape now. Blending in an easily-deployable iFind option as Konstantin published can only add to this!Thanks,benjamin Hi, Konstantin!I tried to search $Case word it finds, but it shows strange option in a dropdown list of a search field. See the screenshot:What does it mean? Hi, Evgeny!I used iKnow Entities as words in a dropdown list of a search field. iKnow thinks "$case( $extract( units, 1" is entity, because it look some strange. But I would like to use %iFind.FindEntities() (Idea from first Benjamin DeBoe's comment) for words in dropdown list of a search field after a short time. I think it will fix this iKnow was written to analyze English rather than ObjectScript, so you may see a few odd results coming out of code blocks. I believe you can add a where clause excluding those records from the block table to avoid them. Now I use %iFind.FindEntities to get words in a dropdown list of a search field. Installation has become faster than before, because I don't use domain builiding process Hi, Konstantin!The problem with strange suggestions fixed, but it doesn't suggest anything for $CASE now ) Did you introduce $CASE in a blacklist? )I think suggestions on all COS commands and functions is a good option for the search field (if possible of course). Hi, Evgeny!Yes, I agree with you about COS commands in a dropdown list of a search field.I had some problems with COS commands and functions. But now I fixed it: Hi Konstantin,Can we install this project on Cache 2016.2 or does it need 2017 ?I tried to install offline (becuse my server cannot get through to GITHUB(443)) and the installation failed on several errors.Maybe I need more specific instructions for offline install ?Uri Hi Uri!You need Cache 2017Konstantin Hi, Constantin!When I search documentation with your online tool what is the version of documentation it works with?Would you please add the version of the product in the results or somewhere?Thanks in advance! Hi, Evgeny!I will add the version of the product in the results in the near future. Hi, Evgeny!I added the version of documentation in resultsKonstantin Thanks, Konstantin!And here is the link to the demo.Do you want to add an option to share the search? E.g. introduce some share results button in UI which would provide an URL with added search option in URL? It would be very handy if you want to share search results with a colleague. Good day,I would very much like to install this example on my local instance. However, I cannot find installer.xml on "corrresponding page". Which is the "corresponding page" please? I downloaded the solution from Github, but also there is no installer.xml. I will apprecitae it if you can point me to the "corresponding page" where the installer.xml is please.Thank you in advance. Hi Elize!It's in releases

Announcement

Janine Perkins · Feb 23, 2016

Featured InterSystems Online Course: Integration Architecture: Ensemble and HealthShare

Are you going to be building an integration with Ensemble or HealthShare? Take the following course to learn about the core architecture in building an integration, the parts and pieces involved and the most common ways that data flows through that architecture.Integration Architecture: Ensemble and HealthShareLearn about the basic architecture of InterSystems' solution for integration. This course is for people who have purchased either Ensemble and HealthShare. This course covers the main parts of integration, each part's function and how data flows through that architecture. Learn More.

Article

Evgeny Shvarov · Apr 16, 2016

Try catch block I usually use in InterSystems ObjectScript

Hi! Want to share with you code snippet of try catch block I usually use in methods which should return %Status. { try { $$$TOE(sc,StatusMethod()) } catch e { set sc=e.AsStatus() do e.Log() } Quit sc } Here $$$TOE is a short form of $$$TROWONERROR macro. Inside macro StatusMethod is any method you call which will return %Status value. This value will be placed into sc variable. In case of sc contains error execution will be routed to try catch block. You can wrap any Status methods calls in your code if you need to catch the errors coming from them. In try catch block I place my logic and have to mandatory calls: s sc=e.AsStatus() to get the status of error. D e.Log() - to place all the stack of error to standard Application Error log which you can find in Management portal on this page: http://localhost:57772/csp/sys/op/UtilSysAppErrorNamespaces.csp?$NAMESPACE=SAMPLES&Recent=1 How do you handle errors in your COS logic? I often do this too, I think this sort of pattern is a good way to make the error handling simple and consistent. I'm not really a fan of the status codes. They seem like a legacy item for the days prior to try/catches when traps were used. Having to maintain code to handle possible exceptions AND "errors" means more control flow to check and handle problems through the various levels of code.Say that code above is within a method that's within another method that's called from the main part of the program. Now, you need three levels of try/catches. That's a lot of extra lines of code (code smell), lost errors and performance degradation.Instead, I encourage my developers to use the fail fast methodgy. We use try/catches as high as possible in the code or when absolutely necessary, like when something is prone to exceptions but shouldn't stop the overall program from running. I agree with you. As mentioned it is an option if you need to use %Status as a method result. And what about e.Log() - do you use it to log errors or use something else? Status codes still have a place along side of Try/Catch in my opinion. They really only serve to indicate the ending state of the method called. This is not necessarily an error. I agree that throwing an exception for truly fatal errors is the best and most efficient error handling method. The issues is what does "truly fatal" mean? There can be a lot of grey area to contend with. There are methods where the calling program needs to determine the correct response. For example take a method that calculates commission on a sale. Clearly this is a serious problem on a Sales order. However, it is less of an issue on a quotation. In the case of the latter the process may simply desire to return an empty commissions structure.Placement of try/catch blocks is a separate conversation. Personally I find using try/catch blocks for error handling to be clean and efficient. The programs are easier to read and any recovery can be consolidated in one place, either in or right after the catch. I have found that any performance cost is unnoticeable in a typical transactional process. It surely beats adding IF statements to handle to handle the flow. For readability and maintainability I also dislike QUITing from a method or program in multiple places. So where is the "right" place for a try/catch? If I had to pick one general rule I would say you should put the try/catch in anyplace where a meaningful recovery from the error/exception can be done and as close to the point where the error occurred as possible. I the above example of a Commission calculation method I would not put a try/catch in the method itself since the method can not perform any real recovery. However I would put one in the Sales order and Quotation code.There are many methods to manage program flow under error/exception situations; Try/Catch, Quit and Continue in loops are a couple off the top of my head. Used appropriately they can create code that is robust, readable and maintainable with little cost in performance. I agree with both Nic and Rich.The issue with e.Log() is that it clutters up the error log with repetitive entries, each subsequent one with less detail than the prior. The same thing happens in Ensemble when errors bubble through the host jobs.The trick here is knowing when to log an error verses when to just bubble it up. Using Nic's method we lose the context of the stack since the log isn't written until the entry method with the Try/Catch. Using your method we get noise in the logs, but at least the first one has the detail we'll need.I believe the root problem here is re-throwing the status. An exception should represent something fatal and usually out of the applications control (e.g. FILEFULL) while a %Status is a call success indicator. To that end your code could be refactored to just Quit on an error instead of throwing it. That way a single log is generated in the method that triggered the exception and the stack is accurate.However this doesn't work well in nested loops. In that case a Return would work unless there is a cleanup chore (e.g. releasing locks, closing transactions, etc). I haven't come up with a pattern for nested loops that doesn't clutter up the source with a bunch of extra $$$ISERR checks that are easily missed and lead to subtle bugs.Personally I use your style without logging because:Every method uses the same control structureWorks with nested loops without extra thoughtCan ignore errors by simply invoking with Do instead of $$$TOE/$$$ThrowOnErrorCleanup code is easy to find or insertUsing ByRef/Output parameters makes it trivial to refactor to return more than one valueI do lose the ability to see an accurate stack trace but most of the time the line reference in the error is enough for me to quickly debug an issue so it is an acceptable trade-off. Only in the most trivial methods is the Try/Catch skipped.All that said Nic's style is a solid approach too. By removing a bunch of boilerplate Try/Catch code it lets the programmer focus on the logic and makes the codebase much easier on the eyes. We use various methods and macros to log exceptions, it just depends on the situation. I see where you are coming from with having a status code returned to notify a problem exists. Personally, I'd calculate comissions in a seperate single responsibility class that extends an abstract class for interface segregation. That class implements three public methods: Calculate(), HasError(), GetError()//calculate then commission: Set tCommission = tUsedCarCommission.Calculate()If tUsedCarCommission.HasError() //log the error or do something with it It's very similar to what you'd do with traditional status types but without having to deal with passing in references. Also, IMO, it's very clear what's going on if the commission has an error. OK. But how do you call methods which return %Status?Do you raise Error? Or you check Error status with if? Or do you ignore status at all? I will leave the logging issue alone as I don't see it as being the main point of the example. It could also be a thread by itself.The issue of using a bunch of $$$ISERR or other error condition checks is exactly why I like using throw with try/catch. I disagree that it should only be for errors outside of the application's control. However it is true that most of the time you are dealing with a fatal error. Fatal that is to the current unit of work being performed, not necessarily to the entire process.I will often use code likeset RetStatus = MyObj.Method() throw:$$$ISERR(RetStatus) ##class(%Exception.StatusException).CreateFromStatus(RetStatus)The post conditional on the throw can take many forms, this is just one example.Where I put the Try/Catch depends on many factors such as:Where do I want recovery to happen?use of the method and my code readability and maintainability of the code...I the case of nested loops mentioned I think this is a great way to abort the process and return to a point, whether in this program or one farther up the stack, where the process can be cleanly recovered or aborted. Most of my methods look like this: Method MyMethod() As %Status { // Initialization - note locking and transactions only when needed Set sc = $$$OK Lock +^SomeGlobal:5 If '$TEST Quit $$$ERROR($$$GeneralError,"Failed to obtain lock") Try { TSTART // When error is significant $$$ThrowOnError(..SomeMethod()) // When error can be ignored Do ..SomeMethod() // When only certain errors apply Set sc = ..SomeMethod() If $$$ISERR(sc),$SYSTEM.Status.GetErrorText(sc)["Something" $$$ThrowStatus(sc) TCOMMIT } Catch e { TROLLBACK Set sc = e.AsStatus() } Lock -^SomeGlobal Quit sc } Well, I think a major question is: What do you use to return runtime information to your caller when you implement your own code? Do you return a %Status object, or something similar, or do you throw exceptions and don't return anything.Most code snippets I have seen here make use of try/catch, but do return a status code itself. Personally, I prefer to use try/catch blocks and throw errors when I encounter issues at runtime. The try/catch philosophy is optimized for the case that everything goes well and exceptions are the, well, exception. Handling a status object is not as clean from a code maintenance perspective (more lines of code within your logic), but allows you to handle multiple different scenarios at once (it was okay/not okay, and this happened...)Obviously, this is my personal preference. What do you do when you need to cleanup things like locks? Put the unlock code both in the Catch and before the Quit? Well, that depends on where you did take the lock.In your previous example you take a lock right before the try block, so you can release it directly after the try/catch block.If you take a lock in your try block, you have to put the unlock code both in the catch block and at the end of the try block. I would not place the unlock code outside of the try/catch block. This is a case where a try/catch/finally construct would definitely help, as you could place the unlock code in the finally block. Other than locks, there are a few other cases where cleanup may be needed whether or not something goes wrong:Closing SQL cursors that have been openedEnsuring that the right IO device is in use and/or returning to the previous IO redirection state.There are probably more of these too.Here's the convention we use for error handling, logging, and reporting in InSync (a large Caché-based application):We have TSTART/TCOMMIT/TROLLBACK in a try/catch block at the highest level (typically a ClassMethod in a CSP/Zen page). There isn't much business logic in here; it'll call a method in a different package.If anything goes wrong in the business logic, an exception is thrown. The classes with the business logic don't have their own try/catch blocks unless it's needed to close SQL cursors, etc. in event of an exception. After the cleanup is done, the exception is re-thrown. (Unfortunately, this means that cleanup code may be duplicated between the try and catch blocks, but there's typically not too much duplication.) The classes with business logic also don't have their own TSTART/TCOMMIT/TROLLBACK commands, unless the business logic is a batch process in which parts of the process may fail and be corrected later without impacting the whole thing; such a case may also call for a nested try/catch to do the TROLLBACK if something goes wrong in part of the batch. In this case the error is recorded rather than re-throwing the exception.We have our own type of exception (extending %Exception.AbstractException), and macros to create exceptions of this type from:Error %Status codesError SQLCODEs and messagesSQLCODE = 100 can be treated as an error, "alert", or nothing.Other types of exceptionsExceptions of our custom type can also be created to represent a general application error not related to one of those things, either a fatal error, or something the user can/should fix - e.g., invalid data or missing configuration.The macros for throwing these exceptions also allow the developer to provide a localizable user-friendly message to explain what went wrong.When an exception is caught in the top level try/catch (or perhaps in a nested try/catch in a batch process), we have a macro that logs the exception and turns it into a user-friendly error message. This might just be a general message, like "An internal error occurred (log ID _______)" - the user should never see <UNDEFINED>, SQLCODE -124: DETAILS ABOUT SOME TABLE, etc.Our persistent classes may include an XDATA block with localizable error messages corresponding foreign and unique keys in the class and types of violations of those keys. For %Status codes and SQLCODEs corresponding to foreign/unique key violations, the user-friendly error message is determined based on this metadata.Logging for these exceptions is configurable; for example, exceptions representing something the user can/should fix are not logged by default, because they're not an error in the application itself. Also, the log level is configurable - it might be all the gory detail from LOG^%ETN, or just the stack trace. Typically, verbose logging would only be enabled system-wide briefly for specific debugging tasks. For SQL errors, the SQL statement itself is logged if possible.I thought this convention was too complicated when I first started working with it, but have come to see that it is very elegant. One possible downside is that it relies on a convention that any method in a particular package (InSyncCode, in our case) might throw an exception - if that isn't respected in the calling code, there's risk of a <THROW> error.I mentioned the InSync approach previously on https://community.intersystems.com/post/message-error-csppage . Unfortunately, it's coupled with several parts of the application, so it'd be quite a bit of work to extract and publish the generally-applicable parts. I'd like to do that at some point though. >$SYSTEM.Status.GetErrorText(sc)["Something" Would not always work correctly in applications with users requesting content in several languages (for example web app). Why not use error codes? If $SYSTEM.Status.GetErrorCodes(sc)[$$$GeneralError $$$ThrowStatus(sc) Timothy! Thanks for sharing this!It' can be a standalone post as "Error and resource handling in large Caché ObjectScript project".Thank you! Agreed - GetErrorCodes() is the right thing to do from an I18N perspective. For more advanced error analysis, such as conversion of error %Status-es into user-friendly messages (as I described in another comment), $System.Status.DecomposeStatus will provide the parameters of the error message as well. These are substituted in to the localizable string. For example, here's a foreign key violation message from %DeleteId on a system running in Spanish: INSYNC>Set tSC = ##class(Icon.DB.CT.TipoDocumento).%DeleteId(50) INSYNC>k tErrorInfo d $System.Status.DecomposeStatus(tSC,.tErrorInfo) zw tErrorInfo tErrorInfo=1 tErrorInfo(1)="ERROR #5831: Error de Foreign Key Constraint (Icon.DB.CC.AllowedGuaranteeTypes) sobre DELETE de objeto en Icon.DB.CT.TipoDocumento: Al menos existe 1 objeto con referencia a la clave CTTIPODOCUMENTOPK" tErrorInfo(1,"caller")="zFKTipoDocDelete+4^Icon.DB.CC.AllowedGuaranteeTypes.1" tErrorInfo(1,"code")=5831 tErrorInfo(1,"dcode")=5831 tErrorInfo(1,"domain")="%ObjectErrors" tErrorInfo(1,"namespace")="INSYNC" tErrorInfo(1,"param")=4 tErrorInfo(1,"param",1)="Icon.DB.CC.AllowedGuaranteeTypes" tErrorInfo(1,"param",2)="Icon.DB.CT.TipoDocumento" tErrorInfo(1,"param",3)="DELETE" tErrorInfo(1,"param",4)="CTTIPODOCUMENTOPK" tErrorInfo(1,"stack")=... The "param" array allows clean programmatic access to the details of the foreign key violation, independent of language. Of course, these level of detail in these error messages may be subject to change across Caché versions, so this is a *great* thing to cover with unit tests if your application relies on it. I prefer doing this more correctly using the right API for matching status values: If $system.Status.Equals(sc,$$$ERRORCODE($$$GeneralError),$$$ERRORCODE($$$MoreSpecificStatusCode),...) { // Handle specific error(s) } The use of `'` can result in unexpected behaviour when you are checking for a 4 digit code, but are handling a 3 digit code... Note that it's also safer to wrap status codes in `$$$ERRORCODE($$$TheErrorCode)`, but that may not be necessary depending on the specific context. A rather subtle point that I haven't seen discussed here is actually about how TSTART/TCOMMIT/TROLLBACK should be handled when triggering a TROLLBACK from code that may be part of a nested transaction. Given that a lot of the code I write may be called from various contexts and those calling contexts may already have open transactions, my preferred transaction pattern is as follows: Method SaveSomething() As %Status { Set tStatus = $$$OK Set tInitTLevel = $TLevel Try { // Validate input before opening transaction so you can exit before incurring any major overhead TSTART // Do thing 1 // Do thing 2 // Check status or throw exception TCOMMIT } // Handle exception locally due to local cleanup needs Catch ex { Set tStatus = ex.AsStatus() } While ($TLevel > tInitTLevel) { // Only roll back one transaction level as you make it possible for the caller to decide whether the whole transaction should be rolled back TRollback 1 } Quit tStatus } (Edited for formatting.) What's the advantage of using $$$ERRORCODE macro? ^%qCacheObjectErrors global contains the same values.

Announcement

Anastasia Dyubaylo · Jun 14, 2017

Video of the Week: InterSystems iKnow Technology. A Cure for Clinician Frustration

Hi Community! Enjoy the video of the week about InterSystems iKnow Technology: A Cure for Clinician Frustration In this video, learn why iKnow capabilities are critical for getting the most out of your investments in electronic health records and improving information access for clinicians. You are very welcome to watch all the videos about iKnow in a dedicated iKnow playlist on the InterSystems Developers YouTube Channel. Enjoy!

Question

Tom Philippi · Jan 31, 2018

DSN does not show up on InterSystems Ensemble SQL Gateway configuration.

I am running InterSystems Ensemble 2016.2 on ubuntu and trying to connect to a remote MS SQL server database.Insofar, I have successfully configured my ubuntu machine to connect to the remote MS SQL server database using unix-odbc. That is:Telnet connection workstsql (test sql) connection worksisql command succesfully connects to sql server and I am able to execute queries on ubuntu.The DSN for the isql command are defined in /etc/odbc.ini and /etc/odbcinst.ini and should be available systemwide.The DSN in the odbcinst.ini uses the microsoft odbc driver 13 for Sql Server for linux. However, when I access the sql gateway in the management portal the DSN configured in /etc/odbc.ini does not show up. Does anyone know how I can expose my DSN defined in /etc/obdc.ini to Ensemble? I already tried creating a shortcut in /intersystems/mgr directory named cacheodbc.ini (as described here: https://groups.google.com/forum/#!topic/intersystems-public-cache/4__XchiaCQU), but insofar no success :(. The first thing I'd check are the permissions on these files. If you created them as root, they might not be readable for other users?

Article

Vasiliy Bondar · Oct 14, 2018

Configuring LDAP authentication in InterSystems Caché using Microsoft Active Directory

From the first glance, the task of configuring LDAP authentication in Caché is not hard at all – the manual describes this process in just 6 paragraphs. On the other hand, if the LDAP server uses Microsoft Active Directory, there a few non-evident things that need to be configured on the LDAP server side. Those who don’t do anything like that on a regular basis may get lost in Caché settings. In this article, we will describe the step-by-step process of setting up LDAP authentication and cover the diagnostic methods that can be used if something doesn’t work as expected.Configuration of the LDAP server1. Create a user in ActiveDirectory that we will use to connect to Caché and search for information in the LDAP database. This user must be located in the domain’s root.2. Let’s create a special unit for users who will be connecting to Caché and call it IdapCacheUsers.3. Register users there.4. Let’s test the availability of the LDAP database using a tool called ldapAdmin. You can download it here.5. Configure the connection to the LDAP server:6. All right, we are connected now. Let’s take a look at how it all works:7. Since users that will be connecting to Caché are in the ldapCacheUsers unit, let’s limit our search to this unit only.Settings on the Caché side8. The LDAP server is ready, so let’s proceed to configuring the settings on the Caché side. Go to Management Portal -> System Administration -> Security -> System Security -> LDAP Options. Let’s clear the “User attribute to retrieve default namespace”, “User attribute to retrieve default routine” and “User attribute to retrieve roles” fields, since these attributes are not in the LDAP database yet.9. Enable LDAP authentication in System Administration -> Security -> System security -> Authentication/CSP Session Settings10. Enable LDAP authentication in services. The %Service_CSP service is responsible for connecting web applications, %Service_Console handles connections through the terminal.11. Configure LDAP authentication in web applications.12. For the time being and for testing the connection, let’s configure everything so that new users in Caché have full rights. To do this, assign the %All role to the user _PUBLIC. We will address this aspect in the future ……13. Let’s try opening the configured web application, it should open without problems.14. The terminal also opens15. After connecting, LDAP users will appear on the Caché users list16. The truth is, this configuration gives all new users complete access to the system. To close this security hole, we need to modify the LDAP database by adding an attribute that we will use to store the name of the role that will be assigned to users after connecting to Caché. Prior to that, we need to make a backup copy of the domain controller to ensure that we don’t break the entire network if something goes wrong with the configuration process.17. To modify the ActiveDirectory schema, let’s install the Active Directory snap-in on the server where ActiveDirectory is installed (it is not installed by default). Read the instruction here.18. Let’s create an attribute called intersystems-Roles, OID 1.2.840.113556.1.8000.2448.2.3, a case-sensitive string, a multi-value attribute.19. Then add this attribute to the class “user”.20. Let’s now make it so that when we view the list of unit users, we can see a “Role in InterSystems Cache” column. To do that, click Start -> Run and type “adsiedit.msc”. We are connecting to “Configuration” naming context.21. Let’s go to the CN=409, CN=DisplaySpecifiers, CN=Configuration container and choose a container type that will show additional user attributes when we view it. Let’s choose unit-level display (OU) provided by the organisationalUnit-Display container. We need to find the extraColumns attribute in its properties and change its value to ”intersystems-Roles, Role in IntersystemsCache,1,200,0”. The rule of composing the attribute is as follows: attribute name, name of the destination column, display by default or not, column width in pixels, reserved value. One more comment: CN=409 denotes a language code (CN=409 for the English version, CN=419 for the Russian version of the console).22. We can now fill out the name of the role that will be assigned to all users connecting to Caché. If your Active Directory is running on Windows Server 2003, you won’t have any built-in tools for editing this field. You can use a tool called ldapAdmin (see item 4) for editing the value of this attribute. If you have a newer version of Windows, this attribute can be edited in the “Additional functions” mode – the user will see an additional tab for editing attributes.23. After that, let’s specify the name of this attribute in the LDAP options on the Caché management portal. 24. Let’s create an ldapRole with the necessary privileges25. Remove the %ALL role from the user _PUBLIC26. Everything is set up, let’s try connecting to the system27. If it doesn’t work right away, enable and set up an audit28. Audit settings29. Look at the error log in Audit Database.ConclusionIn reality, it often happens that the configuration of different roles for different users is not required for working in an application. If you only need to assign a particular set of permissions to users logging in to a web application, you can skip steps 16 through 23. All you will need to do is to add these roles and remove all types of authentication except for LDAP on the “Application roles” tab in the web application settings. In this case, only users registered on the LDAP sever can log in. When such a user logs in, Caché automatically assigns the roles required for working in this application. I wanted to add that you certainly can create an attribute to list a user's roles as described here, and some sites do, but it's not the only way to configure LDAP authentication. Many administrators find the group-based behavior enabled by the "Use LDAP Groups for Roles/Routine/Namespace" option easier to configure, so you should consider that option if you're setting up LDAP authentication. If you do use that option, many of the steps here will be different, including at least steps 17-23 where the attribute is created and configured. Yes, I agree. Thanks for the addition Thank you for sharing. Good job.

Article

Mark Bolinsky · Oct 12, 2018

InterSystems IRIS Example Reference Architectures for Google Cloud Platform (GCP)

Google Cloud Platform (GCP) provides a feature rich environment for Infrastructure-as-a-Service (IaaS) as a cloud offering fully capable of supporting all of InterSystems products including the latest InterSystems IRIS Data Platform. Care must be taken, as with any platform or deployment model, to ensure all aspects of an environment are considered such as performance, availability, operations, and management procedures. Specifics of each of those areas will be covered in this article. The following overview and details are provided by Google and can be found here. Overview GCP Resources GCP consists of a set of physical assets, such as computers and hard disk drives, and virtual resources, such as virtual machines (VMs), that are contained in Google's data centers around the globe. Each data center location is in a global region. Each region is a collection of zones, which are isolated from each other within the region. Each zone is identified by a name that combines a letter identifier with the name of the region. This distribution of resources provides several benefits, including redundancy in case of failure and reduced latency by locating resources closer to clients. This distribution also introduces some rules about how resources can be used together. Accessing GCP Resources In cloud computing physical hardware and software become services. These services provide access to the underlying resources. When you develop your InterSytems IRIS-based application on GCP, you mix and match these services into combinations that provide the infrastructure you need, and then add your code to enable the scenarios you want to build. Details of the available services can be found here. Projects Any GCP resources that you allocate and use must belong to a project. A project is made up of the settings, permissions, and other metadata that describe your applications. Resources within a single project can work together easily, for example by communicating through an internal network, subject to the regions-and-zones rules. The resources that each project contains remain separate across project boundaries; you can only interconnect them through an external network connection. Interacting with Services GCP gives you three basic ways to interact with the services and resources. Console The Google Cloud Platform Console provides a web-based, graphical user interface that you can use to manage your GCP projects and resources. When you use the GCP Console, you create a new project, or choose an existing project, and use the resources that you create in the context of that project. You can create multiple projects, so you can use projects to separate your work in whatever way makes sense for you. For example, you might start a new project if you want to make sure only certain team members can access the resources in that project, while all team members can continue to access resources in another project. Command-line Interface If you prefer to work in a terminal window, the Google Cloud SDK provides the gcloud command-line tool, which gives you access to the commands you need. The gcloud tool can be used to manage both your development workflow and your GCP resources. gcloud reference details can be found here. GCP also provides Cloud Shell, a browser-based, interactive shell environment for GCP. You can access Cloud Shell from the GCP console. Cloud Shell provides: A temporary Compute Engine virtual machine instance. Command-line access to the instance from a web browser. A built-in code editor. 5 GB of persistent disk storage. Pre-installed Google Cloud SDK and other tools. Language support for Java, Go, Python, Node.js, PHP, Ruby and .NET. Web preview functionality. Built-in authorization for access to GCP Console projects and resources. Client Libraries The Cloud SDK includes client libraries that enable you to easily create and manage resources. GCP client libraries expose APIs for two main purposes: App APIs provide access to services. App APIs are optimized for supported languages, such as Node.js and Python. The libraries are designed around service metaphors, so you can work with the services more naturally and write less boilerplate code. The libraries also provide helpers for authentication and authorization. Details can be found here. Admin APIs offer functionality for resource management. For example, you can use admin APIs if you want to build your own automated tools. You also can use the Google API client libraries to access APIs for products such as Google Maps, Google Drive, and YouTube. Details of GCP client libraries can be found here. InterSystems IRIS Sample Architectures As part of this article, sample InterSystems IRIS deployments for GCP are provided as a starting point for your application specific deployment. These can be used as a guideline for numerous deployment possibilities. This reference architecture demonstrates highly robust deployment options starting with the smallest deployments to massively scalable workloads for both compute and data requirements. High availability and disaster recovery options are covered in this document along with other recommended system operations. It is expected these will be modified by the individual to support their organization’s standard practices and security policies. InterSystems is available for further discussions or questions of GCP-based InterSystems IRIS deployments for your specific application. Sample Reference Architectures The following sample architectures will provide several different configurations with increasing capacity and capabilities. Consider these examples of small development / production / large production / production with sharded cluster that show the progression from starting with a small modest configuration for development efforts and then growing to massively scalable solutions with proper high availability across zones and multi-region disaster recovery. In addition, an example architecture of using the new sharding capabilities of InterSystems IRIS Data Platform for hybrid workloads with massively parallel SQL query processing. Small Development Configuration In this example, a minimal configuration is used to illustrates a small development environment capable of supporting up to 10 developers and 100GB of data. More developers and data can easily be supported by simply changing the virtual machine instance type and increasing storage of the persistent disks as appropriate. This is adequate to support development efforts and become familiar with InterSystems IRIS functionality along with Docker container building and orchestration if desired. High availability with database mirroring is typically not used with a small configuration, however it can be added at any time if high availability is needed. Small Configuration Sample Diagram The below sample diagram in Figure 2.1.1-a illustrates the table of resources in Figure 2.1.1-b. The gateways included are just examples, and can be adjusted accordingly to suit your organization’s standard network practices. The following resources within the GCP VPC Project are provisioned as a minimum small configuration. GCP resources can be added or removed as required. Small Configuration GCP Resources Sample of Small Configuration GCP resources is provided below in the following table. Proper network security and firewall rules need to be considered to prevent unwanted access into the VPC. Google provides network security best practices for getting started which can be found here. Note: VM instances require a public IP address to reach GCP services. While this practice might raise some concerns, Google recommends limiting the incoming traffic to these VM instances by using firewall rules. If your security policy requires truly internal VM instances, you will need to set up a NAT proxy manually on your network and a corresponding route so that the internal instances can reach the Internet. It is important to note that you cannot connect to a fully internal VM instance directly by using SSH. To connect to such internal machines, you must set up a bastion instance that has an external IP address and then tunnel through it. A bastion Host can be provisioned to provide the external facing point of entry into your VPC. Details of bastion hosts can he found here. Production Configuration In this example, a more sizable configuration as an example production configuration that incorporates InterSystems IRIS database mirroring capability to support high availability and disaster recovery. Included in this configuration is a synchronous mirror pair of InterSystems IRIS database servers split between two zones within region-1 for automatic failover, and a third DR asynchronous mirror member in region-2 for disaster recovery in the unlikely event an entire GCP region is offline. The InterSystems Arbiter and ICM server deployed in a separate third zone for added resiliency. The sample architecture also includes a set of optional load balanced web servers to support a web-enabled application. These web servers with the InterSystems Gateway can be scaled independently as needed. Production Configuration Sample Diagram The below sample diagram in Figure 2.2.1-a illustrates the table of resources found in Figure 2.2.1-b. The gateways included are just examples, and can be adjusted accordingly to suit your organization’s standard network practices. The following resources within the GPC VPC Project are recommended as a minimum recommendation to support a sharded cluster. GCP resources can be added or removed as required. Production Configuration GCP Resources Sample of Production Configuration GCP resources is provided below in the following tables. Large Production Configuration In this example, a massively scaled configuration is provided by expanding on the InterSystems IRIS capability to also introduce application servers using InterSystems’ Enterprise Cache Protocol (ECP) to provide massive horizontal scaling of users. An even higher level of availability is included in this example because of ECP clients preserving session details even in the event of a database instance failover. Multiple GCP zones are used with both ECP-based application servers and database mirror members deployed in multiple regions. This configuration is capable of supporting tens of millions database accesses per second and multiple terabytes of data. Large Production Configuration Sample Diagram The sample diagram in Figure 2.3.1-a illustrates the table of resources in Figure 2.3.1-b. The gateways included are just examples, and can be adjusted accordingly to suit your organization’s standard network practices. Included in this configuration is a failover mirror pair, four or more ECP clients (application servers), and one or more web servers per application server. The failover database mirror pairs are split between two different GCP zones in the same region for fault domain protection with the InterSystems Arbiter and ICM server deployed in a separate third zone for added resiliency. Disaster recovery extends to a second GCP region and zone(s) similar to the earlier example. Multiple DR regions can be used with multiple DR Async mirror member targets if desired. The following resources within the GPC VPC Project are recommended as a minimum recommendation to support a large production deployment. GCP resources can be added or removed as required. Large Production Configuration GCP Resources Sample of Large Production Configuration GCP resources is provided below in the following tables. Production Configuration with InterSystems IRIS Sharded Cluster In this example, a horizontally scaled configuration for hybrid workloads with SQL is provided by including the new sharded cluster capabilities of InterSystems IRIS to provide massive horizontal scaling of SQL queries and tables across multiple systems. Details of InterSystems IRIS sharded cluster and its capabilities are discussed later in this article. Production Configuration with InterSystems IRIS Sharded Cluster The sample diagram in Figure 2.4.1-a illustrates the table of resources in Figure 2.4.1-b. The gateways included are just examples, and can be adjusted accordingly to suit your organization’s standard network practices. Included in this configuration are four mirror pairs as the data nodes. Each of the failover database mirror pairs are split between two different GCP zones in the same region for fault domain protection with the InterSystems Arbiter and ICM server deployed in a separate third zone for added resiliency. This configuration allows for all the database access methods to be available from any data node in the cluster. The large SQL table(s) data is physically partitioned across all data nodes to allow for massive parallelization of both query processing and data volume. Combining all these capabilities provides the ability to support complex hybrid workloads such as large-scale analytical SQL querying with concurrent ingestion of new data, all within a single InterSystems IRIS Data Platform. Note that in the above diagram and the “resource type” column in the table below, the term “Compute [Engine]” is a GCP term representing a GCP (virtual) server instance as described further in section 3.1 of this document. It does not represent or imply the use of “compute nodes” in the cluster architecture described later in this article. The following resources within the GPC VPC Project are recommended as a minimum recommendation to support a sharded cluster. GCP resources can be added or removed as required. Production with Sharded Cluster Configuration GCP Resources Sample of Sharded Cluster Configuration GCP resources is provided below in the following table. Introduction of Cloud Concepts Google Cloud Platform (GCP) provides a feature rich cloud environment for Infrastructure-as-a-Service (IaaS) fully capable of supporting all of InterSystems products including support for container-based DevOps with the new InterSystems IRIS Data Platform. Care must be taken, as with any platform or deployment model, to ensure all aspects of an environment are considered such as performance, availability, system operations, high availability, disaster recovery, security controls, and other management procedures. This document will cover the three major components of all cloud deployments: Compute, Storage, and Networking. Compute Engines (Virtual Machines) Within GCP there are several options available for compute engine resources with numerous virtual CPU and memory specifications and associated storage options. One item to note within GCP, references to the number of vCPUs in a given machine type equates to one vCPU is one hyper-thread on the physical host at the hypervisor layer. For the purposes of this document n1-standard* and n1-highmem* instance types will be used and are most widely available in most GCP deployment regions. However, the use of n1-ultramem* instance types are great options for very large working datasets keeping massive amounts of data cached in memory. Default instance settings such as Instance Availability Policy or other advanced features are used except where noted. Details of the various machine types can be found here. Disk Storage The storage type most directly related to InterSystems products are the persistent disk types, however local storage may be used for high levels of performance as long as data availability restrictions are understood and accommodated. There are several other options such as Cloud Storage (buckets), however those are more specific to an individual application’s requirements rather than supporting the operation of InterSystems IRIS Data Platform. Like most other cloud providers, GCP imposes limitations on the amount of persistent storage that can be associated to an individual compute engine. These limits include the maximum size of each disk, the number of persistent disks attached to each compute engine, and the amount of IOPS per persistent disk with an overall individual compute engine instance IOPS cap. In addition, there are imposed IOPS limits per GB of disk space, so at times provisioning more disk capacity is required to achieve desired IOPS rate. These limits may change over time and to be confirmed with Google as appropriate. There are two types of persistent storage types for disk volumes: Standard Persistent and SSD Persistent disks. SSD Persistent disks are more suited for production workloads that require predictable low-latency IOPS and higher throughput. Standard Persistent disks are more an economical option for non-production development and test or archive type workloads. Details of the various disk types and limitations can be found here. VPC Networking The virtual private cloud (VPC) network is highly recommended to support the various components of InterSystems IRIS Data Platform along with providing proper network security controls, various gateways, routing, internal IP address assignments, network interface isolation, and access controls. An example VPC will be detailed in the examples provided within this document. Details of VPC networking and firewalls can be found here. Virtual Private Cloud (VPC) Overview GCP VPC’s are slightly different than other cloud providers allowing for simplicity and greater flexibility. A comparison of concepts can be found here. Within a GCP project, several VPCs per project are allowed (currently a max of 5 per project), and there are two options for creating a VPC network – auto mode and custom mode. Details of each type are provided here. In most large cloud deployments, multiple VPCs are provisioned to isolate the various gateways types from application-centric VPCs and leverage VPC peering for inbound and outbound communications. It is highly recommended to consult with your network administrator for details on allowable subnets and any organizational firewall rules of your company. VPC peering is not covered in this document. In the examples provided in this document, a single VPC with three subnets will be used to provide network isolation of the various components for predictable latency and bandwidth and security isolation of the various InterSystems IRIS components. Network Gateway and Subnet Definitions Two gateways are provided in the example in this document to support both Internet and secure VPN connectivity. Each ingress access is required to have appropriate firewall and routing rules to provide adequate security for the application. Details on how to use routes can be found here. Three subnets are used in the provided example architectures dedicated for use with InterSystems IRIS Data Platform. The use of these separate network subnets and network interfaces allows for flexibility in security controls and bandwidth protection and monitoring for each of the three above major components. Details on the various use cases can be found here. Details for creating virtual machine instances with multiple network interfaces can be found here. The subnets included in these examples: User Space Network for Inbound connected users and queries Shard Network for Inter-shard communications between the shard nodes Mirroring Network for high availability using synchronous replication and automatic failover of individual data nodes. Note: Failover synchronous database mirroring is only recommended between multiple zones which have low latency interconnects within a single GCP region. Latency between regions is typically too high for to provide a positive user experience especially for deployment with a high rate of updates. Internal Load Balancers Most IaaS cloud providers lack the ability to provide for a Virtual IP (VIP) address that is typically used in automatic database failover designs. To address this, several of the most commonly used connectivity methods, specifically ECP clients and Web Gateways, are enhanced within InterSystems IRIS to no longer rely on VIP capabilities making them mirror-aware and automatic. Connectivity methods such as xDBC, direct TCP/IP sockets, or other direct connect protocols, require the use of a VIP-like address. To support those inbound protocols, InterSystems database mirroring technology makes it possible to provide automatic failover for those connectivity methods within GCP using a health check status page called mirror_status.cxw to interact with the load balancer to achieve VIP-like functionality of the load balancer only directing traffic to the active primary mirror member, thus providing a complete and robust high availability design within GCP. Details of using a load balancer to provide VIP-like functionality is provided here. Sample VPC Topology Combining all the components together, the following illustration in Figure 4.3-a demonstrates the layout of a VPC with the following characteristics: Leverages multiple zones within a region for high availability Provides two regions for disaster recovery Utilizes multiple subnets for network segregation Includes separate gateways for both Internet and VPN connectivity Uses cloud load balancer for IP failover for mirror members Persistent Storage Overview As discussed in the introduction, the use of GCP persistent disks is recommended and specifically SSD persistent disk types. SSD persistent disks are recommended due to the higher read and write IOPS rates and low latency required for transactional and analytical database workloads. Local SSDs may be used in certain circumstances, however beware that the performance gains of local SSDs comes with certain trade-offs in availability, durability, and flexibility. Details of Local SSD data persistence can be found here to understand the events of when Local SSD data is preserved and when not. LVM Striping Like other cloud providers, GCP imposes numerous limits on storage both in IOPS, space capacity, and number of devices per virtual machine instance. Consult GCP documentation for current limits which can be found here. With these limits, LVM striping becomes necessary to maximize IOPS beyond that of a single disk device for a database instance. In the example virtual machine instances provided, the following disk layouts are recommended. Performance limits associated with SSD persistent disks can be found here. Note: There is currently a maximum of 16 persistent disks per virtual machine instance although GCP currently lists an increase to 128 is “(beta)” at the moment, so this will be a welcomed enhancement. The benefits of LVM striping allows for spreading out random IO workloads to more disk devices and inherit disk queues. Below is an example of how to use LVM striping with Linux for the database volume group. This example will use four disks in an LVM PE stripe with a physical extent (PE) size of 4MB. Alternatively, larger PE sizes can be used if needed. Step 1: Create Standard or SSD Persistent Disks as needed Step 2: IO scheduler is NOOP for each of the disk devices using “lsblk -do NAME,SCHED” Step 3: Identify disk devices using “lsblk -do KNAME,TYPE,SIZE,MODEL” Step 4: Create Volume Group with new disk devices vgcreate s 4M <vg name> <list of all disks just created> example: vgcreate -s 4M vg_iris_db /dev/sd[h-k] Step 4: Create Logical Volume lvcreate n <lv name> -L <size of LV> -i <number of disks in volume group> -I 4MB <vg name> example: lvcreate -n lv_irisdb01 -L 1000G -i 4 -I 4M vg_iris_db Step 5: Create File System mkfs.xfs K <logical volume device> example: mkfs.xfs -K /dev/vg_iris_db/lv_irisdb01 Step 6: Mount File System edit /etc/fstab with following mount entries /dev/mapper/vg_iris_db-lv_irisdb01 /vol-iris/db xfs defaults 0 0 mount /vol-iris/db Using the above table, each of the InterSystems IRIS servers will have the following configuration with two disks for SYS, four disks for DB, two disks for primary journals and two disks for alternate journals. For growth LVM allows for expanding devices and logical volumes when needed without interruption. Consult with Linux documentation on best practices for ongoing management and expansion of LVM volumes. Note: The enablement of asynchronous IO for both the database and the write image journal files are highly recommend. See the following community article for details on enabling on Linux: https://community.intersystems.com/post/lvm-pe-striping-maximize-hyper-converged-storage-throughput Provisioning New with InterSystems IRIS is InterSystems Cloud Manager (ICM). ICM carries out many tasks and offers many options for provisioning InterSystems IRIS Data Platform. ICM is provided as a Docker image that includes everything for provisioning a robust GCP cloud-based solution. ICM currently support provisioning on the following platforms: Google Cloud Platform (GCP) Amazon Web Services including GovCloud (AWS / GovCloud) Microsoft Azure Resource Manager including Government (ARM / MAG) VMware vSphere (ESXi) ICM and Docker can run from either a desktop/laptop workstation or have a centralized dedicated modest “provisioning” server and centralized repository. The role of ICM in the application lifecycle is Define -> Provision -> Deploy -> Manage Details for installing and using ICM with Docker can be found here. NOTE: The use of ICM is not required for any cloud deployment. The traditional method of installation and deployment with tar-ball distributions is fully supported and available. However, ICM is recommended for ease of provisioning and management in cloud deployments. Container Monitoring ICM includes a basic monitoring facility using Weave Scope for container-based deployment. It is not deployed by default, and needs to be specified in the defaults file using the Monitor field. Details for monitoring, orchestration, and scheduling with ICM can be found here. An overview of Weave Scope and documentation can be found here. High Availability InterSystems database mirroring provides the highest level of availability in any cloud environment. There are options to provide some virtual machine resiliency directly at the instance level. Details of the various policies available in GCP can be found here. Earlier sections discussed how a cloud load balancer will provide automatic IP address failover for a Virtual IP (VIP-like) capability with database mirroring. The cloud load balancer uses the mirror_status.cxw health check status page mentioned earlier in the Internal Load Balancers section. There are two modes of database mirroring - synchronous with automatic failover and asynchronous mirroring. In this example, synchronous failover mirroring will be covered. The details of mirroring can he found here. The most basic mirroring configuration is a pair of failover mirror members in an arbiter-controlled configuration. The arbiter is placed in a third zone within the same region to protect from potential zone outages impacting both the arbiter and one of the mirror members. There are many ways mirroring can be setup specifically in the network configuration. In this example, we will use the network subnets defined previously in the Network Gateway and Subnet Definitions section of this document. Example IP address schemes will be provided in a following section and for the purpose of this section, only the network interfaces and designated subnets will be depicted. Disaster Recovery InterSystems database mirroring extends the capability of high available to also support disaster recovery to another GCP geographic region to support operational resiliency in the unlikely event of an entire GCP region going offline. How an application is to endure such outages depends on the recovery time objective (RTO) and recovery point objectives (RPO). These will provide the initial framework for the analysis required to design a proper disaster recovery plan. The following links provides a guide for the items to be considered when developing a disaster recovery plan for your application. https://cloud.google.com/solutions/designing-a-disaster-recovery-plan and https://cloud.google.com/solutions/disaster-recovery-cookbook Asynchronous Database Mirroring InterSystems IRIS Data Platform’s database mirroring provides robust capabilities for asynchronously replicating data between GCP zones and regions to help support the RTO and RPO goals of your disaster recovery plan. Details of async mirror members can be found here. Similar to the earlier high availability section, a cloud load balancer will provide automatic IP address failover for a Virtual IP (VIP-like) capability for DR asynchronous mirroring as well using the same mirror_status.cxw health check status page mentioned earlier in the Internal Load Balancers section. In this example, DR asynchronous failover mirroring will be covered along with the introduction of the GCP Global Load Balancing service to provide upstream systems and client workstations with a single anycast IP address regardless of which zone or region your InterSystems IRIS deployment is operating. One of the advances of GCP is the load balancer is a software defined global resource and not bound to a given region. This allows for the unique capability to leverage a single service across regions since it is not an instance or device-based solution. Details of GCP Global Load Balancing with Single Anycast IP can be found here. In the above example, the IP addresses of all three InterSystems IRIS instances are provided to the GCP Global Load Balancer, and it will only direct traffic to whichever mirror member is the acting primary mirror regardless of the zone or region it is located. Sharded Cluster InterSystems IRIS includes a comprehensive set of capabilities to scale your applications, which can be applied alone or in combination, depending on the nature of your workload and the specific performance challenges it faces. One of these, sharding, partitions both data and its associated cache across a number of servers, providing flexible, inexpensive performance scaling for queries and data ingestion while maximizing infrastructure value through highly efficient resource utilization. An InterSystems IRIS sharded cluster can provide significant performance benefits for a wide variety of applications, but especially for those with workloads that include one or more of the following: High-volume or high-speed data ingestion, or a combination. Relatively large data sets, queries that return large amounts of data, or both. Complex queries that do large amounts of data processing, such as those that scan a lot of data on disk or involve significant compute work. Each of these factors on its own influences the potential gain from sharding, but the benefit may be enhanced where they combine. For example, a combination of all three factors — large amounts of data ingested quickly, large data sets, and complex queries that retrieve and process a lot of data — makes many of today’s analytic workloads very good candidates for sharding. Note that these characteristics all have to do with data; the primary function of InterSystems IRIS sharding is to scale for data volume. However, a sharded cluster can also include features that scale for user volume, when workloads involving some or all of these data-related factors also experience a very high query volume from large numbers of users. Sharding can be combined with vertical scaling as well. Operational Overview The heart of the sharded architecture is the partitioning of data and its associated cache across a number of systems. A sharded cluster physically partitions large database tables horizontally — that is, by row — across multiple InterSystems IRIS instances, called data nodes, while allowing applications to transparently access these tables through any node and still see the whole dataset as one logical union. This architecture provides three advantages: Parallel processing: Queries are run in parallel on the data nodes, with the results merged, combined, and returned to the application as full query results by the node the application connected to, significantly enhancing execution speed in many cases. Partitioned caching: Each data node has its own cache, dedicated to the sharded table data partition it stores, rather than a single instance’s cache serving the entire data set, which greatly reduces the risk of overflowing the cache and forcing performance-degrading disk reads. Parallel loading: Data can be loaded onto the data nodes in parallel, reducing cache and disk contention between the ingestion workload and the query workload and improving the performance of both. Details of InterSystems IRIS sharded cluster can be found here. Elements of Sharding and Instance Types A sharded cluster consists of at least one data node and, if needed for specific performance or workload requirements, an optional number of compute nodes. These two node types offer simple building blocks presenting a simple, transparent, and efficient scaling model. Data Nodes Data nodes store data. At the physical level, sharded table[1] data is spread across all data nodes in the cluster and non-sharded table data is physically stored on the first data node only. This distinction is transparent to the user with the possible sole exception that the first node might have a slightly higher storage consumption than the others, but this difference is expected to become negligible as sharded table data would typically outweigh non-sharded table data by at least an order of magnitude. Sharded table data can be rebalanced across the cluster when needed, typically after adding new data nodes. This will move “buckets” of data between nodes to approximate an even distribution of data. At the logical level, non-sharded table data and the union of all sharded table data is visible from any node, so clients will see the whole dataset, regardless of which node they’re connecting to. Metadata and code are also shared across all data nodes. The basic architecture diagram for a sharded cluster simply consists of data nodes that appear uniform across the cluster. Client applications can connect to any node and will experience the data as if it were local. [1] For convenience, the term “sharded table data” is used throughout the document to represent “extent” data for any data model supporting sharding that is marked as sharded. The terms “non-sharded table data” and “non-sharded data” are used to represent data that is in a shardable extent not marked as such or for a data model that simply doesn’t support sharding yet. Data Nodes For advanced scenarios where low latencies are required, potentially at odds with a constant influx of data, compute nodes can be added to provide a transparent caching layer for servicing queries. Compute nodes cache data. Each compute node is associated with a data node for which it caches the corresponding sharded table data and, in addition to that, it also caches non-sharded table data as needed to satisfy queries. Because compute nodes don’t physically store any data and are meant to support query execution, their hardware profile can be tailored to suit those needs, for example by emphasizing memory and CPU and keeping storage to the bare minimum. Ingestion is forwarded to the data nodes, either directly by the driver (xDBC, Spark) or implicitly by the sharding manager code when “bare” application code runs on a compute node. Sharded Cluster Illustrations There are various combinations of deploying a sharded cluster. The following high-level diagrams are provided to illustrate the most common deployment models. These diagrams do not include the networking gateways and details and provide to focus only on the sharded cluster components. Basic Sharded Cluster The following diagram is the simplest sharded cluster with four data nodes deployed in a single region and in a single zone. A GCP Cloud Load Balancer is used to distribute client connections to any of the sharded cluster nodes. In this basic model, there is no resiliency or high availability provided beyond that of what GCP provides for a single virtual machine and its attached SSD persistent storage. Two separate network interface adapters are recommended to provide both network security isolation for the inbound client connections and also bandwidth isolation between the client traffic and the sharded cluster communications. Basic Sharded Cluster with High Availability The following diagram is the simplest sharded cluster with four mirrored data nodes deployed in a single region and splitting each node’s mirror between zones. A GCP Cloud Load Balancer is used to distribute client connections to any of the sharded cluster nodes. High availability is provided through the use of InterSystems database mirroring which will maintain a synchronously replicated mirror in a secondary zone within the region. Three separate network interface adapters are recommended to provide both network security isolation for the inbound client connections and bandwidth isolation between the client traffic, the sharded cluster communications, and the synchronous mirror traffic between the node pairs. This deployment model also introduces the mirror arbiter as described in an earlier section of this document. Sharded Cluster with Separate Compute Nodes The following diagram expands the sharded cluster for massive user/query concurrency with separate compute nodes and four data nodes. The Cloud Load Balancer server pool only contains the addresses of the compute nodes. Updates and data ingestion will continue to update directly to the data nodes as before to sustain ultra-low latency performance and avoid interference and congestion of resources between query/analytical workloads from real-time data ingestion. With this model the allocation of resources can be fine-tuned for scaling of compute/query and ingestion independently allowing for optimal resources where needed in a “just-in-time” and maintaining an economical yet simple solution instead of wasting resources unnecessarily just to scale compute or data. Compute Nodes lend themselves for a very straightforward use of GCP auto scale grouping (aka Autoscaling) to allow for automatic addition or deletion of instances from a managed instance group based on increased or decreased load. Autoscaling works by adding more instances to your instance group when there is more load (upscaling), and deleting instances when the need for instances is lowered (downscaling). Details of GCP Autoscaling can be found here. Autoscaling helps cloud-based applications gracefully handle increases in traffic and reduces cost when the need for resources is lower. Simply define the autoscaling policy and the autoscaler performs automatic scaling based on the measured load. Backup Operations There are multiple options available for backup operations. The following three options are viable for your GCP deployment with InterSystems IRIS. The first two options, detailed below, incorporate a snapshot type procedure which involves suspending database writes to disk prior to creating the snapshot and then resuming updates once the snapshot was successful. The following high-level steps are taken to create a clean backup using either of the snapshot methods: Pause writes to the database via database External Freeze API call. Create snapshots of the OS + data disks. Resume database writes via External Thaw API call. Backup facility archives to backup location Details of the External Freeze/Thaw APIs can be found here. Note: Sample scripts for backups are not included in this document, however periodically check for examples posted to the InterSystems Developer Community. www.community.intersystems.com The third option is InterSystems Online backup. This is an entry-level approach for smaller deployments with a very simple use case and interface. However, as databases increase in size, external backups with snapshot technology are recommended as a best practice with advantages including the backup of external files, faster restore times, and an enterprise-wide view of data and management tools. Additional steps such as integrity checks can be added on a periodic interval to ensure clean and consistent backup. The decision points on which option to use depends on the operational requirements and policies of your organization. InterSystems is available to discuss the various options in more detail. GCP Persistent Disk Snapshot Backup Backup operations can be achieved using GCP gcloud command-line API along with InterSystems ExternalFreeze/Thaw API capabilities. This allows for true 24x7 operational resiliency and assurance of clean regular backups. Details for managing and creating and automation GCP Persistent Disk Snapshots can be found here. Logical Volume Manager (LVM) Snapshots Alternatively, many of the third-party backup tools available on the market can be used by deploying individual backup agents within the VM itself and leveraging file-level backups in conjunction with Logical Volume Manager (LVM) snapshots. One of the major benefits to this model is having the ability to have file-level restores of either Windows or Linux based VMs. A couple of points to note with this solution, is since GCP and most other IaaS cloud providers do not provide tape media, all backup repositories are disk-based for short term archiving and have the ability to leverage blob or bucket type low cost storage for long-term retention (LTR). It is highly recommended if using this method to use a backup product that supports de-duplication technologies to make the most efficient use of disk-based backup repositories. Some examples of these backup products with cloud support include but is not limited to: Commvault, EMC Networker, HPE Data Protector, and Veritas Netbackup. Note: InterSystems does not validate or endorses one backup product over the other. The responsibility of choosing a backup management software is up to the individual customer. Online Backup For small deployments the built-in Online Backup facility is also a viable option as well. This InterSystems database online backup utility backs up data in database files by capturing all blocks in the databases then writes the output to a sequential file. This proprietary backup mechanism is designed to cause no downtime to users of the production system. Details of Online Backup can be found here. In GCP, after the online backup has finished, the backup output file and all other files in use by the system must be copied to some other storage location outside of that virtual machine instance. Bucket/Object storage is a good designation for this. There are two option for using a GCP Storage bucket. Use the gcloud scripting APIs directly to copy and manipulate the newly created online backup (and other non-database) files. Details can be found here. Mount a storage bucket as a file system and use it similarly as a persistent disk enough though Cloud Storage buckets are object storage. Details of mounting a Cloud Storage bucket using Cloud Storage FUSE can be found here.

Announcement

Steve Brunner · Sep 4, 2018

InterSystems IRIS Data Platform 2018.1.2 Maintenance Release

InterSystems is pleased to announce the availability of InterSystems IRIS Data Platform 2018.1.2 maintenance release For information about the corrections in this release, refer to the release notes.This release is supported on the same platforms as InterSystems IRIS 2018.1.1. You can see details, including cloud platforms and docker containers supported, in this Supported Platforms document. The build corresponding to this release is 2018.1.2.609.0 If you have not visited our Learning Services site recently, I encourage you to try the InterSystems IRIS sandbox and Experiences.

first
previous
…
91
92
93
94
95
…
next
last