Hi Wolf,

I ran into a similar problem a few weeks ago: IRIS on Linux - no studio

My solution: Custom install on my local WIN10 desktop. Only the Studio (and ODBC drivers).
Nice side effect: the new IRIS-Studio also talks seamlessly to my local Caché. 

My assumption: As it works for an isolated Linux it should work for Docker as well (if no firewall blocks you wink)

HIVE docs on  String Types shows me:

and

Varchar

Varchar types are created with a length specifier (between 1 and 65535), which defines the maximum number of characters allowed in the character string. If a string value being converted/assigned to a varchar value exceeds the length specifier, the string is silently truncated. Character length is determined by the number of code points contained by the character string.

Related to your  ERROR that tells us that STRING has no size limit => it is a STREAM in our terms.
 

So if you don't convert a STRING to VARCHAR   [ preferable VARCHAR(255) ] you won't be able to use an alphanumeric ID
You may, of course,  add some artificial numbering of type BIGINT to be used as ID.

In any case, just with data type STRING I'd call this a rather a text file than an SQL usable table.

Without touching the original source you may need to write your own loader:

  • reading the HIVE "table" sequentially row by row
  • insert it into a manually designed table/class with automatic id

the error message indicates that you try ECP access to a Mirror DB

Not all mirror configurations allow ECP access - the difference is read/write access and synch/async mirror
using a virtual IP address adds another level of complexity

Check your config against these rules: Configuring ECP Connections to a Mirror

The graphics in Mirroring may help to identify your situation.
 

Within a persistent class you have also a Storage definition describing the structure of the stored data.
In addition access code for SQL projection is generated during the compile process.
Using Inspector in Caché Studio you will find a bunch of SQL specific parameters to control the presentation of your class and properties.
The whole model of describing Objects + SQL tables is known as Unified Data Architecture.

This link tells you more Objects, SQL, and the Unified Data Architecture

Your snip out from the full WSDL uses XML references s0: and s1:

similar to 

<definitions xmlns="http://schemas.xmlsoap.org/wsdl/" xmlns:SOAP-ENC="http://schemas.xmlsoap.org/soap/encoding/xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:s="http://www.w3.org/2001/XMLSchema" xmlns:s0="http://tempuri.org"xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" targetNamespace="http://tempuri.org">

In addition I'm somewhat surprised to see  in your operation

            <input message="s0:myOperationSoapIn"/>
            <output message="s0:myOperationSoapOut"/>
            <fault message="s0:MyFault" name="MyFault"/>

This means you have 1 input msg but 2 different output msg.
If you see the Caché side
- input triggers a ClassMethod
- it returns something or returns an error

I'm not aware of some default logic to return a fault message that is generated.
Typically Success/Failure is signaled as part of the output message.    
As a consequence manual modification of the generated code could be required to split
the return message into 2  different message types. 

It's of cause some guessing behind as you didn't publish the full WSDL.

If the definition you get with  ?WSDL contains already the missing parts, then
It may have been changed since your SOAP service in Caché was generated.

You just generate your SOAP service with a different class name again using the new WSDL an then
either
- use the new set of classes
or
- check the differences to your actual services. And adapt it manually.
I'd assume it isn't much more than a property or a serial class. No magic.

This indicates that GROUP BY is to slow to answer in time.

If the selectivity of field1 is very low this may take quite a while.
The closer field1 is to unique the longer it takes as it produces a large resultset.

You may allow a longer timeout. But pls. don't ask me how. Some other experts may know. 

If your GCT.PA_Data is a huge thing you may slice it into pieces by year, id,  ... whatever seems useful:

Example:

SELECT field1 F1, count(field2) CntF2
FROM (
     SELECT filed1,field2 from GCT.PA_Data
     where field1 is not null and ID between 1 and 100000
) group by field
1

This is not the final solution but a way to understand the limits of your server.

BTW:
an index on field1 might be useful anyhow
Example

Index ff on field1 [ DATA = filed2 ] ;

with this construct, your query will only access this special index
instead of the full record and NULL fields are all grouped in advance.

It is kind of a "materialized view" that other DBs offer.