Horita-san,

not sure whether you mean the projection (table) itself is missing or the row you created through the API isn't showing up. This works fine for me, but in order to combine the use of APIs with a domain definition, you have to set the allowCustomUpdates flag to true (off by default).  See also the notes in this article on the dictionary builder demo

When set to false, the API methods like CreateDictionary() will return an error (passed by reference, the returned ID will be below zero to indicate a failure).

Hope this helps,
benjamin

There is a simple regression calculator that is used internally for similar trend line work, iirc. The class reference is not spectacularly elaborate, but it's fairly straightforward to use. First you use the add() function to load up points and then the result() function will calculate a simple trend line and populate Slope and Intercept properties:

USER>s stat = ##class(%DeepSee.extensions.utils.SimpleRegression).%New()

USER>w stat.add(0,1)
1
USER>w stat.add(1,2)
1
USER>w stat.result(.b,.y0,.r)
1
USER>zw b,y0,r
b=1
y0=1
r=1
USER>w stat.Slope
1

you can keep adding data and re-calculate:

USER>w stat.add(1,1)
1
USER>w stat.result(.b,.y0,.r)
1
USER>zw b,y0,r
b=.5
y0=1
r=.5

Hi Joe,

would you mind sharing some of your code (minus API key values :-) ) for signing AWS REST calls? I have almost scratched my head off trying to find out why things still aren't working when my StringToSign and SigningKey appear to be correct, but the hash I create from them isn't. I can even reproduce (aka "make the same mistake") using the sample Python code AWS provides.

Relevant but not working (and therefore less relevant) code:



Property AWSAccessKeyId As %String [ InitialExpression = "AKIDEXAMPLE" ];

Property AWSSecretAccessKey As %String [ InitialExpression = "wJalrXUtnFEMI/K7MDENG+bPxRfiCYEXAMPLEKEY" ];

Property Region As %String [ InitialExpression = "us-east-1" ];

Property Service As %String [ InitialExpression = "iam" ];

Method BuildAuthorizationHeader(pHttpRequest As %Net.HttpRequest, pOperation As %String = "", pURL As %String = "", Output pAuthorizationHeader As %String, pVerbose As %Boolean = 0) As %Status
{
set tSC = $$$OK
try {
if ..AWSAccessKeyId="" {
set tSC = $$$ERROR($$$GeneralError, "No AWS Access Key ID provided")
quit
}
if ..AWSSecretAccessKey="" {
set tSC = $$$ERROR($$$GeneralError, "No AWS Secret Access Key provided")
quit
}

set tAMZDateTime = $tr($zdatetime($h,8,7),":") // 20190319T151009Z
//set tAMZDateTime = "20150830T123600Z" // for AWS samples
set tAMZDate = $e(tAMZDateTime,1,8) // 20190319
set tLineBreak = $c(10)

set pOperation = $$$UPPER(pOperation)

// ensure the right date is set
do pHttpRequest.SetHeader("X-Amz-Date", tAMZDateTime)


// ************* TASK 1: CREATE A CANONICAL REQUEST *************
// http://docs.aws.amazon.com/general/latest/gr/sigv4-create-canonical-request.html

// Step 1 is to define the verb (GET, POST, etc.) -- inferred from pOperation

// Step 2: Create canonical URI--the part of the URI from domain to query 
// string (use '/' if no path)
set tCanonicalURL = $s($e(pURL,1)="/":pURL, $e(pURL,1)'="":"/"_pURL, 1:"/"_pHttpRequest.Location)


// Step 3: Create the canonical query string. In this example (a GET request),
// request parameters are in the query string. Query string values must
// be URL-encoded (space=%20). The parameters must be sorted by name.
// For this example, the query string is pre-formatted in the request_parameters variable.
set tQueryString = $piece(tCanonicalURL,"?",2,*)
set tCanonicalURL = $piece(tCanonicalURL,"?",1)

// TODO: append pHttpRequest.Params content?
// TODO: sort params!

// Step 4: Create the canonical headers and signed headers. Header names
// must be trimmed and lowercase, and sorted in code point order from
// low to high. Note that there is a trailing \n.
set tCanonicalHeaders = "content-type:" _ pHttpRequest.ContentType _ tLineBreak
_ "host:" _ pHttpRequest.Server _ tLineBreak
_ "x-amz-date:" _ tAMZDateTime _ tLineBreak

// Step 5: Create the list of signed headers. This lists the headers
// in the canonical_headers list, delimited with ";" and in alpha order.
// Note: The request can include any headers; canonical_headers and
// signed_headers lists those that you want to be included in the 
// hash of the request. "Host" and "x-amz-date" are always required.
set tSignedHeaders = "content-type;host;x-amz-date"

// Step 6: Create payload hash (hash of the request body content). For GET
// requests, the payload is an empty string ("").
if (pOperation = "GET") {
set tPayload = ""
else {
// TODO
set tPayload = ""
}
set tPayloadHash = ..Hex($SYSTEM.Encryption.SHAHash(256,$zconvert("","O","UTF8")))


// Step 7: Combine elements to create canonical request
set tCanonicalRequest = pOperation _ tLineBreak
_ tCanonicalURL _ tLineBreak
_ tQueryString _ tLineBreak
_ tCanonicalHeaders _ tLineBreak 
_ tSignedHeaders _ tLineBreak
_ tPayloadHash
set tCanonicalRequestHash = ..Hex($SYSTEM.Encryption.SHAHash(256, tCanonicalRequest))

w:pVerbose !!,"Canonical request:",!,$replace(tCanonicalRequest,tLineBreak,"<"_$c(13,10)),!!,"Hash: ",tCanonicalRequestHash,!

// ************* TASK 2: CREATE THE STRING TO SIGN*************
// Match the algorithm to the hashing algorithm you use, either SHA-1 or
// SHA-256 (recommended)
set tAlgorithm = "AWS4-HMAC-SHA256"
set tCredentialScope = tAMZDate _ "/" _ ..Region _ "/" _ ..Service _ "/" _ "aws4_request"
set tStringToSign = tAlgorithm _ tLineBreak 
_ tAMZDateTime _ tLineBreak 
_ tCredentialScope _ tLineBreak
_ tCanonicalRequestHash
w:pVerbose !!,"String to sign:",!,$replace(tStringToSign,tLineBreak,$c(13,10)),!

// ************* TASK 3: CALCULATE THE SIGNATURE *************
// Create the signing key using the function defined above.
// def getSignatureKey(key, dateStamp, regionName, serviceName):
     set tSigningKey = ..GenerateSigningKey(tAMZDate)
     w:pVerbose !!,"Signing key:",!,..Hex(tSigningKey),!

// Sign the string_to_sign using the signing_key
set tSignature = ..Hex($SYSTEM.Encryption.HMACSHA(256, tStringToSign, tSigningKey))


// ************* TASK 4: ADD SIGNING INFORMATION TO THE REQUEST *************
// The signing information can be either in a query string value or in 
// a header named Authorization. This code shows how to use a header.
// Create authorization header and add to request headers
set pAuthorizationHeader = tAlgorithm _ " Credential=" _ ..AWSAccessKeyId _ "/" _ tCredentialScope _ ", SignedHeaders=" _ tSignedHeaders _ ", Signature=" _ tSignature
w:pVerbose !!,"Authorization header:",!,pAuthorizationHeader,!!
b
catch (ex) {
set tSC = ex.AsStatus()
}
quit tSC
}

Method GenerateSigningKey(pDate As %String) As %String
{
set kDate = $SYSTEM.Encryption.HMACSHA(256, pDate, $zconvert("AWS4" _ ..AWSSecretAccessKey,"O","UTF8"))
    //w !,"kDate: ",..Hex(kDate)
    set kRegion = $SYSTEM.Encryption.HMACSHA(256, ..Region, kDate)
    //w !,"kRegion: ",..Hex(kRegion)
    set kService = $SYSTEM.Encryption.HMACSHA(256, ..Service, kRegion)
    //w !,"kService: ",..Hex(kService)
    set tSigningKey = $SYSTEM.Encryption.HMACSHA(256, "aws4_request", kService)
    //w !,"kSigning: ",..Hex(tSigningKey),! 
quit tSigningKey
}

ClassMethod Hex(pRaw As %String) As %String [ Internal ]
{
set out="", l=$l(pRaw)
for = 1:1:{
set out=out_$zhex($ascii(pRaw,i))
}
quit $$$LOWER(out)
}

ClassMethod SimpleTest() As %Status
{
set tSC = $$$OK
try {
set tAdapter = ..%New()
set tAdapter.AWSAccessKeyId = "use yours"
set tAdapter.AWSSecretAccessKey = "not mine"

set tAdapter.Region = "us-east-1", tAdapter.Service = "iam"

set tRequest = ##class(%Net.HttpRequest).%New()
set tRequest.ContentType = "application/x-www-form-urlencoded"
set tRequest.ContentCharset = "utf-8"
set tRequest.Https = 1
set tRequest.SSLConfiguration = "SSL client" // simple empty SSL config
set tRequest.Server = "iam.amazonaws.com"

set tURL = "/?Action=ListUsers&Version=2010-05-08"

set tSC = tAdapter.BuildAuthorizationHeader(tRequest, "GET", tURL, .tAuthorization, 1)
quit:$$$ISERR(tSC)
set tRequest.Authorization = tAuthorization

set tSC = tRequest.Get(tURL)
quit:$$$ISERR(tSC)

Do tRequest.HttpResponse.OutputToDevice()

catch (ex) {
set tSC = ex.AsStatus()
}
write:$$$ISERR(tSC) !!,$system.Status.GetErrorText(tSC),!
quit tSC
}

Hi Sean,

IRIS uses different port numbers than Caché and Ensemble so port clashes are not an issue, but there are a few components that are typically shared across instances (e.g. ISCAgent) where consecutive installations of IRIS and Caché might cause trouble. We're documenting these and also other compatibility items of note (such as accessing one platform with the other's xDBC driver) in a guide that will be published shortly.

The general recommendation remains to stick to instances of the same platform (so either all IRIS or all Caché) on a single server. Note that the use of VMs or Containers of course ensures a proper separation of libraries and enables you to run all your favourite cluster setups from the same physical server.

I just realized you're only on Caché 2012, which doesn't support table-valued functions, in which you can just SELECT from a function rather than having to use CALL, sorry.

On the other hand, I'd expect a BI tool like Logi to be capable of providing exactly the sort of UI-side labelling of columns, if not drive the entire YoY calculation. Not that I want to fend off the question, but if there's a full-fledged BI tool sitting on top of these results anyhow, let's make sure to use its full set of fledges :-) 

Hi Robert,

in 2018.2, we're introducing a feature called "coordinated backup", which basically allows adding a checkpoint in the journal files of all participating instances so you can roll them back to a synchronized state. We were just working on the docs for that feature the other week and it's four pages if you'd want the comprehensive answer to your question, so this is just a simplified version :-)

Please note that we currently do not support cross-shard transactions on sharded tables. It's not a common requirement for the types of use cases our sharding implementation was designed for (typically more analytical queries), but we're happy to discuss specific scenarios in the context of a POC to see what guarantees can be provided through appropriate application & schema design.

thanks,

benjamin

Hi Eduard,

for this sort of querying (and many other uses outside straight API calls), you can use the SQL projections generated for your domain, as documented in %iKnow.Tables.Utils. That'll generate a column for your Views metadata field on the table containing Source information, which you can then join to the Part (entity occurrence) table to filter the ones containing the requested entity.

Hope this helps,
benjamin

Hi Eduard,

looking at your code, there seem to be a few small things that may each contribute to not seeing the results you were expecting:

  1. the MaintenanceAPI:GetBlackListElements() call returns its results as result(n) = $lb(id, string) with n just an incrementing integer representing the row number. At the other end, the ContainsEntityFilter expects array(string) or a $listbuild(string1, string2, ...). So your filter might be selecting sources containing the strings "1", "2", etc
  2. SourceAPI:GetByDomain() returns result(n) = $lb(sourceID, externalID). That source ID is an internally generated integer ID that has no links to your source table Text.Data. The external ID is typically composed of what you selected as group field and identifier field when loading from a SQL table. So depending on how you set up your domain, that may indeed be the ID field of your Text.Data table. It looks like you have the "simple external IDs" feature switched on, which is why your external IDs only consist of the identifier field, making things indeed easier (but usually only useful/safe when loading from a single table!). Note that this is slightly different for DeepSee-managed domains, where the source ID equals the external ID and corresponds to DeepSee's fact ID, but ignore this confusing comment when not using DeepSee
  3. Finally, and likely irrelevant, you're passing in $$$YES when initializing filterNot. I'm not sure where you're loading that macro from, but that should be a %Boolean with a value of 1 to work as expected, where a string value would translate to a %Boolean with value 0.

Hope this helps,

benjamin

We currently don't support analytic windowing functions (PARTITION BY syntax), but have been looking into it for a future release. MATCH_RECOGNIZE is certainly one of the more advanced ones in that bucket. Is this the very one you would need or do you have scenarios that would be served by core windowing functionality, excluding the pattern matching piece?

Or is it the pattern matching and not as much the windowing you're looking for?

I'm afraid we don't support the SQL PIVOT command, so unless you can enumerate the response codes as columns explicitly, you can only organise them as rows. If you control the application code, you could of course first have a query selecting all response codes and then generating the lengthy SQL call that includes separate columns for each response code. Something like SUM(CASE bRecord.ResponseCode WHEN 'response code 1' THEN 1 ELSE 0 END) AS ResponseCode1Count should work fairly well.

Horita-san,

The SetParameter() method requires your domain has an ID assigned, which it gets automatically as soon as you call %Save() a first time. Note that it should be returning an error in the sequence of commands you pasted, but it went unnoticed because of the do syntax.

In general, it should be more convenient to work with Domain Definitions rather than the %iKnow.Domain API directly.

Thanks,
benjamin

Hi Steve,

hadn't seen this question until just now, but I have to admit we're a bit storage-hungry with iKnow. If you generate the default full set of indices, for a moderately-sized domain you'll need up to 25x the original dataset size measured as raw text to fit everything. This can drop to half that size (12x) if you forsake all non-essential indices, but that will prevent a number of queries from running smoothly or, in some cases, disable them completely.

For iFind, the numbers are dependent on the type of index. Count on factors 2x, 7x and 15x for Basic, Semantic and Analytic indices, respectively. Of course there's a difference in functionality between all these options and it's best to start from a set of functional requirements and then look at which particular approach covers those.

These numbers are somewhat conservative maximums and, as Eduard already suggested, you may see different (lower) numbers depending on the nature of your data. A more detailed sizing guide is available on request.

Thanks,
benjamin

Thanks John,

indeed, you'd need a proper license in order to work with iKnow. If the method referred above would return 0, please contact your sales representative to request a temporary trial license and appropriate assistance for implementing your use case.

Also, iKnow doesn't come as a separate namespace. You can create (regular) namespaces as you prefer and use them to store iKnow domain data. You may need to enable your web application for iKnow, which is disabled by default for security reasons in the same way DeepSee is. See this paragraph here for more details.

Hi Eduard,

you can define iFind indices for calculated fields, so if you point your field calculation to a function that strips out the HTML, you should be fine. The HTML converter in iKnow was built for a slightly different purpose, but can be used here:

Property HtmlText As %String(MAXLEN="");

Property PlainText As %String(MAXLEN="") [ Calculated, ReadOnly, SqlComputed, SqlComputeCode = { set {PlainText} = ##class(%iKnow.Source.Converter.Html).StripTags({HtmlText}) } ];

Regards,
benjamin

Hi Benjamin,

The (patented) magic of iKnow is the way how it identifies concepts in sentences and happens in a library shipped as a binary, which we refer to as the iKnow engine and is used by both the iKnow APIs and iFind indices. Most of what happens with that engine's output is not nearly as much rocket science and as Eduard indicated, its COS source code can usually be consulted for clues on how it works if you're adventurous. 

The two options of the GetSimilar() query both work by looking at the top concepts of the reference source and look for other sources that have them as well, using frequency and dominance for weighting in-source relevance for the two options respectively. So not much rocket science and only support for full matches at this point. 

This said, iKnow offers you the building blocks to build much more advanced things, quite possibly inspired by your academical research, leveraging the concept level that is unique to iKnow in identifying what a text is really about. For example, you can build vectors containing entity frequency or dominance and look for cosine similarity in this vector space, or you can leverage topic modelling, but many of these will require quite a bit of computation and actual result quality may depend a bit on the nature of the texts you're dealing with, which is why we chose to stick to very simple things in the kit for now.

However, you can find two (slightly) more advanced options in demos we have published online:

  • The iKnow Investigator demo is part of your SAMPLES namespace and offers, next to the SourceAPI:GetSimilar() option, also an implementation that builds a dictionary from your reference source, matches it against your target sources and looks for the ones with the highest aggregated match score, which accounts for partial matches as well.
  • In the iFind Search Portal demo, the bottom of the record viewer popup displays a list of similar records as well. This one is populated based on a bit of iFind juggling implemented in the Demo.SearchPortal.Utils class, leveraging full entity matches only, but easy to extend to weigh in words as well.

In both cases, there's a myriad of options to refine these algorithms, but all at a certain compute cost, given the high dimensionality introduced by the iKnow entity (and actually even word) level. If you have further ideas or, better yet, sample code to achieve better similar document lists, we'd be thrilled to read about it here on the community ;o)

Thanks,
benjamin

Hi Edward,

the thing that comes closest here would be the %iKnow.Queries.SourceAPI:GetSimilar() query, which for a certain seed document, looks for the most similar ones in a domain, optionally constrained by a filter object. The results of that query include a figure like the one you're looking for, expressing how many entities were new in the seed document vs the corpus it's comparing against. Although that that particular calculation isn't available as an atomic function, a simple way to get to what you want would be to use the %iKnow.Filters.SourceIdFilter and just compare against an individual document.

If you prefer to write more code :o), you can just look up the entities in the one document and compare them against those in the others through the %iKnow.Objects.EntityInSourceDetails SQL projection.

Regards,

benjamin