%Startswith or LIKE?

Question

Question

Hao Ma · Feb 24, 2020

#Performance #SQL #Caché #InterSystems IRIS

Could anyone please explain why Caché has a %STARTSWITH function while it supports ANSI SQL "LIKE"?

thank you very much.

Discussion (26)5

Log in or sign up to continue

Robert Cemper · Feb 24, 2020

%STARTSWITH relates better to Caché internal structures and is faster in larger scale

2 0

Joel Solon · Mar 3, 2020

I did a little bit more research.

~~Maybe %STARTSWITH 'abc' was at one time faster than the equivalent predicate LIKE 'abc%'.~~
The quote comes from the FOR SOME %ELEMENT predicate documentation. This predicate can be used with Collections and an old feature called Free Text Search. ~~The quote was actually only meant to apply to the Free Text Search usage.~~
I've tested %STARTSWITH 'abc' and LIKE 'abc%' today using FOR SOME %ELEMENT with Collections and Free Text Search. The code is identical.

~~Conclusion: the quote will be removed from the documentation since it's no longer true.~~

Thanks, @Vitaliy.Serdtsev, for making me realize that I should have been testing with placeholders rather than fixed values to the right of %STARTSWITH or LIKE. I was testing with Embedded SQL; with fixed values, my earlier statements are true. But if the query itself uses placeholders (? or host variables), or the WHERE clause is parameterized automatically (thanks, @Eduard Lebedyuk, for mentioning that) then the generated code differs, and LIKE sometimes does do an extra (slightly slower) comparison, because at runtime, LIKE could get a simple pattern ("abc%") or a complex one ("a_b%g_i") and the code has to cope with those possibilities.

New conclusion: the quote will be clarified so that it mentions placeholders/paramaterization and moved to the %STARTSWITH and LIKE documentation, instead of being buried in FOR SOME %ELEMENT.

And thanks to @Hao Ma for bringing this up!

0 0

Vitaliy Serdtsev · Feb 27, 2020

Conclusion: the quote will be removed from the documentation since it's no longer true.

Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"

I also did an analysis for Caché 2018.1

Class del.t Extends %Persistent
{

Index ip On p;

Property p As %VarString;

/// d ##class(del.t).Fill()
ClassMethod Fill(N = 1000000)
{
  d DISABLE^%NOJRN
  k ^del.tD,^del.tI

  f i=1:1:N s ^del.tD(i)=$lb("","test"_i)
  s ^del.tD=N
  d ENABLE^%NOJRN

  d ..%BuildIndices(,,,$$$NO)
  d $system.SQL.TuneTable($classname(),$$$YES)
  d $system.OBJ.Compile($classname(),"cu-d")
}
}

Although the plans are exactly the same in SMP the results of the metrics differ:

select count(*) from del.t where p like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed

select count(*) from del.t where p %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed

For the next two queries, the INT code matches:

&sql(select * from del.t where p like 'test7%')
&sql(select * from del.t where p %startswith 'test7')

But for these - is already different, so the metrics in SMP are different:

&sql(select * from del.t where p like :a)
&sql(select * from del.t where p %startswith :a)

In IRIS 2020.1, the work with embedded queries was changed, but I can't check it.

0 0

Robert Cemper · Feb 26, 2020

I'm surprised you don't see the obvious performance difference of looking for something of distinct length at the beginning of a string
vs. scanning an eventual rather long string for some bytes somewhere eventually including also composed strings as %AB%CD%.

keep in mind: Caché is built for speed, not for the comfort of the average programmer

1 0

score 2 · Answer 1 · 2020-03-03T09:26:04-05:00

%STARTSWITH is not faster or slower when comparing apples to apples.

LIKE can find a substring wherever it occurs, and has multi-character and single-character wildcards. %STARTSWITH is looking only at the beginning of the string, so it's equivalent to LIKE 'ABC%'.

Updating to match another updated post lower on this page. If the comparison string is parameterized, LIKE sometimes does an extra check, so %STARTSWITH will be slightly faster.

When the comparison string ('ABC%' and 'ABC') is fixed. The code that checks LIKE 'ABC%' is exactly the same as the code that checks %STARTSWITH 'ABC'

score 0 · Answer 2 · 2022-03-17T10:08:21-04:00

Well, the result is not always the same: I just found out that you should be careful when working with :variables, for example in %SQLQuery:

select 1 where 'well...' %startswith :myvar

Returns 1 row for myvar being null.

Whereas

select 1 where 'well...' like :myvar||'%';

Does return no row for myvar being null.

Using IRIS for Windows (x86-64) 2021.1 (Build 215) Wed Jun 9 2021 09:56:33 EDT

score 0 · Answer 3 · 2022-03-17T10:12:59-04:00

Malte Schnack · Mar 17, 2022

When the comparison string ('ABC%' and 'ABC') is fixed.

AH ok

0 0

score 1 · Answer 4 · 2020-02-24T06:05:07-05:00

If possible it's always recommended to use the Caché implemented predicates rather than the ANSI sql ones as they will normally always be faster. Execute the same queries in the management portal on large tables and you can verify those for yourself.

The list of those are here: Caché https://cedocs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RSQL_PREDICATE_CONDITONS

score 0 · Answer 5 · 2020-02-24T21:39:14-05:00

Hao Ma · Feb 24, 2020

I was told that before but I never know it written in document. Thanks.

0 0

score 0 · Answer 6 · 2020-02-27T04:09:43-05:00

Hi Vitaly!

Why can't you check it? IRIS 2020.1 Docker version is available for everyone with a community edition.

Also, the Cloud version of Try IRIS is available too, but it is 2019.3 though.

score 0 · Answer 7 · 2020-02-27T04:18:40-05:00

Vitaliy Serdtsev · Feb 27, 2020

Hi Evgeny!

I can't check for technical reasons. Docker version does not suit me.

0 0

score 0 · Answer 8 · 2020-02-27T04:39:00-05:00

Understand. Just curious - does Try IRIS work for you for testing purposes? Or are there any issues with Try IRIS? I mean we could fix it if any.

score 0 · Answer 9 · 2020-02-27T07:52:38-05:00

Vitaliy Serdtsev · Feb 27, 2020

Yes, Try IRIS (which is 2019.3) works without problems.

0 0

score 0 · Answer 10 · 2020-02-27T18:37:43-05:00

Then besides this, in the documentation for %STARTSWITH need to add the note DEPRECATED and the recommendation "use LIKE 'XXX%'"

select count(*) from del.t where p like 'test7%'
Row count: 1 Performance: 0.291 seconds 333340 global references 2000537 lines executed

select count(*) from del.t where p %startswith 'test7'
Row count: 1 Performance: 0.215 seconds 333340 global references 1889349 lines executed

I'm not sure what you mean here. %STARTSWITH executed fewer lines so why would we recommend LIKE instead?

score 0 · Answer 11 · 2020-02-28T01:32:30-05:00

Forget.
Now %STARTSWITH all other things being equal is slightly faster than LIKE. This point is deeply hidden in the documentation, and it seems that this applies only to FOR SOME %ELEMENT.
If manage to speed up the special case for LIKE, then still need to correct/supplement the documentation.

score 0 · Answer 12 · 2020-02-26T11:51:12-05:00

I have never heard of anyone issuing the blanket statement that InterSystems predicates are faster or slower than the ANSI standard ones. I don't think there are that many predicates that have similar functionality. As I said in a different comment, %STARTWITH 'abc' is 100% equivalent to LIKE 'abc%'. InterSystems also provides %MATCHES and %PATTERN, but they are different.

score 2 · Answer 13 · 2020-02-24T22:17:04-05:00

Than you all for your replies. I heard the %STARTSWITH has better performance but I never know it is in the online document. However, I am a little confused: instead of recommending users to use something with which they are not familiar, why not make 'LIKE' faster?

score 1 · Answer 14 · 2020-02-26T02:56:20-05:00

I think that the author meant that the simplest queries of the form
like 'text%'
automatically worked as/converted to
%startswith 'text'

score 1 · Answer 15 · 2020-02-26T07:42:20-05:00

To answer that, before query is compiled all arguments are parametrized:

like 'text%'

becomes

like ?

so we can't really replace LIKE with %STARTSWITH on code generation step (there's a brackets argument specification I suppose).

score 0 · Answer 16 · 2020-02-26T07:52:03-05:00

Vitaliy Serdtsev · Feb 26, 2020

And on code execution step?

0 0

score 0 · Answer 17 · 2020-02-26T04:23:59-05:00

Thanks, Robert!

> keep in mind: Caché is built for speed, not for the comfort of the average programmer

We work hard to make IRIS not only fast but comfortable too for any backend, full-stack, AI developer. Pinging @Raj Singh , our Product Manager on Developer Experience.

score 1 · Answer 18 · 2020-03-03T05:43:34-05:00

I think the good sense is the key,

Today's compiler's optimizers detect the most of common expressions and generate the better performatic code, but we can't be obsessed with readability.

A good post: Performance vs Readability

score 0 · Answer 19 · 2020-02-26T04:25:30-05:00

Thanks Hao Ma, this sounds reasonable. Inviting @Benjamin De Boe , @Raj Singh to comment on this.

score 0 · Answer 20 · 2020-03-02T06:59:58-05:00

Andre Wessels · Mar 2, 2020

Could %CONTAINS be included in this discussion?

0 0

score 0 · Answer 21 · 2020-03-02T08:11:12-05:00

Evgeny Shvarov · Mar 2, 2020

Sure, Andre!

What's wrong with %CONTAINS?

I think it deserves a separate question/discussion

0 0

score 4 · Answer 22 · 2020-02-26T03:04:28-05:00

Quote from the documentation:

For performance reasons, the predicate %STARTSWITH 'abc' is preferable to the equivalent predicate LIKE 'abc%'. ^proof