Robert Cemper · May 8, 2018 2m read

Manual Setup of Sharding

In order to get a feeling for Sharding I decided to set up a Cluster "by own hands"
1 Master (Win10) 2 Shard "slaves" on Ubuntu64Installation went fine and I started to follow an excellent Online Training with no issues.

> Introduction to Sharding in InterSystems IRIS online course
>>> InterSystems: ISC1125 Sharding Basics: Planning and Deploying
>>>>> 2. Deploying and Using the Cluster

All was fine as described

But creating a Shard Table failed at that point!
{ I skip all fruitless attempts }

Diagnosing what went wrong:

MASTER>s sc=$SYSTEM.Sharding.ListShards()
Shard   Host            Port    Namespc Mirror  Role    VIP
1   51773   SHARD
2   51773   SHARD
MASTER>zw sc

this still looks good but: 

MASTER>s sc=$SYSTEM.Sharding.VerifyShards()
MASTER>do $system.OBJ.DisplayError(sc)
ERROR #9355: 2 shards failed verification
ERROR #9354: Shard 1 failed verification
ERROR #5002: ObjectScript error: <SUBSCRIPT>getConnection+28^%SYS.BigData.ECP *() Subscript 1 is ""
ERROR #9354: Shard 2 failed verification
ERROR #5002: ObjectScript error: <SUBSCRIPT>getConnection+28^%SYS.BigData.ECP *() Subscript 1 is ""

first extra learning:

MASTER>s sc=$system.Sharding.SetOption("MASTER","AutoVerify",1)

Now your run verify directly after adding a Shard.
That's highly useful at the first steps.
After a hint from @Michael Braam I checked my local network

And that was written nowhere (or I just didn't find it):
Hosts used by hostname must be visible and reachable from your MASTER and Shard servers.

Instead of setting up a private DNS server just for this case is decided to set up
a hosts file   { on
     Win10 >>>>  C:\Windows\System32\drivers\etc\hosts
    ubuntu >>>> /etc/hosts
    ubuntu >>>> /etc/hostname


Now I could add Shards with "real" server names resulting in a working setup.

MASTER>s sc=$SYSTEM.Sharding.ListShards() zw sc

Shard   Host            Port    Namespc Mirror  Role    VIP
1       ubuntu64c       51773   SHARD
2       ubuntu64d       51773   SHARD
MASTER>s sc=$SYSTEM.Sharding.VerifyShards() zw sc


Now creating sharded tables was as easy as shown in the online training.
I didn't investigate if setting host for ShardMaster only would have been sufficient
as it was just a cut/past to have it for all involved servers and no further headache.

1 691
Discussion (5)3
Log in or sign up to continue

Alternate solution:

NO handcrafted HOSTS files on Sharding Data Server ("slave")

But sending the explicit IP address of Shard Master  to "slave" by $system.Sharding.SetOption() like this

SHARD1>s sc=$system.Sharding.SetOption(,"MasterIPAddress","") zw sc
SHARD1>s sc=$system.Sharding.SetOption(,"AutoVerify",1) zw sc
SHARD1>zw sc
SHARD1>s sc =$system.Sharding.AssignShard("SHARD1","",51773,"SHARD1") w $l(sc)
SHARD1>do $system.Sharding.ListShards()
Shard   Host            Port    Namespc Mirror  Role    VIP
1   51773   SHARD1

it turns out that the "slave" failed on DNS resolution of Sharding Master's hostname 

Robert, this is great to hear. 

Glad you were able to follow the online learning as well.  We are in the process of build a sharding exercise into the Hands on with Cloud Manager Experience, so that people could see a fully sharded system and test it out.  We would be provisioning using InterSystems Cloud Manager, which takes care of all of the networking for you.  

Let us know what other feedback you have. 

Doug Foster, Manager Online Education

right, @Douglas Foster, about InterSystems Cloud Manager + it configures all your Shards, ECP and Mirroring for you :-)

An additional warning:

When I executed  $system.Sharding.VerifyShards() I failed with  <COMMAND>  error:
Analysis of return status indicated something like "cannot connect to server"

After hacking around some hours I just by accident realized that my TELNET port (on ShardMaster wasn't) on 23 anymore.
The reason for the change was a parallel installation on the same machine.

Setting it to 23 (default) fixed my problem.



Honestly, I have no idea how Telnet relates to Sharding but for me, it fixed my problem.

Sorry. this was "fake news"

changing back to the assumed bad config  didn't show the error anymore
I couldn't reproduce it