How to remove a job PID from a production

Question

Question

Ryan Hulslander · Jun 3, 2016

#Business Service #System Administration #Ensemble

Problem:

A file-based business service uses a local path on a Linux machine that is actually a mounted CIFS share. The mount is "soft" and is designed to not cache data, etc. There are times however when the remote system offering up the share (it's a Windows machine I believe) gets bounced or otherwise hung up the business service in the Ensemble production just hangs.

Un-mounting the network share doesn't affect it, no process kill command affects it, and even going so far as to "kill -9" the process outside of Cache does nothing, either. So the "Update Production" button stays lit and the job never goes away - even if at the system level the connectivity to the shared mount point is restored.

Question - The job is gonzo and won't come back, ever, is there a way to "surgically remove" the PID from the list of the jobs the production "thinks" is there? I don't care if the PID is zombied, or for whatever reason not responding, I just want to make the production "forget" that PID is part of the running production. That way I can start another instance of the business service until such time as a bounce of the production or reboot of the machine is scheduled and performed.

Any advice is welcome! Thanks!

Discussion (4)2

Log in or sign up to continue

John Murray · Jun 6, 2016

When you write "no process kill command affects it", are you referring to commands issued from within Ensemble Portal or perhaps command prompt?

Also, what is the $ZV string of your Ensemble?

0 0

Pete Greskoff · Jun 8, 2016

It sounds to me that something could be wrong with the parent process of the one you are trying to kill. If you haven't already, I strongly suggest opening a WRC issue for this. It would be worth trying to get a trace (ltrace or strace) of both this process and the parent to see what is going on at the OS level.

0 0

score 0 · Answer 1 · 2016-06-06T13:49:56-04:00

Both - inside and outside (using "kill -9" in Linux). So just going out and whacking whatever temp globals/tables that tell Ensemble the job is there should be safe to do.

$ZV is "Cache for UNIX (Red Hat Enterprise Linux for x86-64) 2014.1.2 (Build 753U) Tue Jul 22 2014 11:25:14 EDT"

Thanks!

score 1 · Answer 2 · 2016-07-30T01:34:35-04:00

Hi Ryan,

You'll need to make sure the process is terminated on the OS level. Then, you can use the following command to unregister the PID from Ensemble:

Do ##class(Ens.Job).UnRegister("<config name>",<PID>)

Restarting the production should also work, but you might need to force shutdown the production.

HTH,

Wilber