Access denied at dataFrame.show()
I'm new to Iris, trying to follow this guide ( https://community.intersystems.com/post/k-means-clustering-iris-dataset ) about using clustering. I've followed all instructions until showing the data. At that point, `dataFrame.show()` fails with `java.sql.SQLException: Access Denied`.
I'm using IRIS 2018.12.609.0 on Windows 10 Pro 64 bits. Python is 3.6.6, PySpark is 2.3.1, installed from Anaconda (Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)] on win32).
Along the road I had problems with Hadoop winutils, solved using information here ( https://stackoverflow.com/questions/19620642/failed-to-locate-the-winuti... ), installed Hadoop WinUtils 2.2.0, assigned proper permissions to `/tmp/hive`. I installed the samples from https://github.com/intersystems/Samples-Data-Mining, and I can see the data from the Management portal.
PySpark is run with the next command line and produces the next logs (I've filtered some):
(myenv) C:\Users\MyUser>pyspark --jars C:\InterSystems\IRIS\dev\java\lib\JDK18\intersystems-gateway-3.0.0.jar,C:\InterSystems\IRIS\dev\java\lib\JDK18\intersystems-jdbc-3.0.0.jar,C:\InterSystems\IRIS\dev\java\lib\JDK18\intersystems-spark-1.0.0.jar,C:\InterSystems\IRIS\dev\java\lib\JDK18\intersystems-uima-1.0.0.jar,C:\InterSystems\IRIS\dev\java\lib\JDK18\intersystems-utils-3.0.0.jar,C:\InterSystems\IRIS\dev\java\lib\JDK18\intersystems-xep-3.0.0.jar Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. 2018-09-12 12:41:02 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.3.1 /_/ Using Python version 3.6.6 (default, Jun 28 2018 11:27:44) SparkSession available as 'spark'. >>> sc.setLogLevel("DEBUG") >>> dataFrame = spark.read.format("com.intersystems.spark").option("url", "IRIS://localhost:51773/SAMPLES").option("user", "_SYSTEM").option("password", "sys").option("dbtable", "DataMining.IrisDataset").load() ... 2018-09-12 12:46:31 DEBUG Native:58 - select "DB_ID", "NAME", "DB_LOCATION_URI", "DESC", "OWNER_NAME", "OWNER_TYPE" FROM "DBS" where "NAME" = <'global_temp'> ... 2018-09-12 12:46:31 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException ... 2018-09-12 12:44:23 DEBUG DefaultSource:92 - createRelation(Map(dbtable -> DataMining.IrisDataset, url -> IRIS://localhost:51773/SAMPLES, user -> _SYSTEM, password -> sys)) 2018-09-12 12:44:23 DEBUG Master:177 - Evaluating on IRIS://127.0.0.1:51773/SAMPLES: SELECT * FROM (SELECT * FROM (DataMining.IrisDataset)) WHERE 1=0 2018-09-12 12:44:23 INFO package:108 - Connecting to IRIS://127.0.0.1:51773/SAMPLES from DESKTOP-9QIKMHH with {} ... 2018-09-12 12:41:51 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException >>> dataFrame.show() 2018-09-12 12:59:10 WARN SizeEstimator:66 - Failed to check whether UseCompressedOops is set; assuming yes [Stage 0:> (0 + 1) / 1] 2018-09-12 12:59:11 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0) java.sql.SQLException: [InterSystems IRIS JDBC] Communication link failure: Access Denied at com.intersystems.jdbc.IRISConnection.connect(IRISConnection.java:1096) ... Caused by: java.sql.SQLException: Access Denied at com.intersystems.jdbc.IRISConnection.connect(IRISConnection.java:1018) ... 2018-09-12 12:59:11 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.sql.SQLException: [InterSystems IRIS JDBC] Communication link failure: Access Denied at com.intersystems.jdbc.IRISConnection.connect(IRISConnection.java:1096) ... 2018-09-12 12:59:11 ERROR TaskSetManager:70 - Task 0 in stage 0.0 failed 1 times; aborting job Traceback (most recent call last): File "<stdin>", line 1, in <module> File "C:\Users\Ivan\AppData\Local\conda\conda\envs\atavia\lib\site-packages\pyspark\sql\dataframe.py", line 350, in show print(self._jdf.showString(n, 20, vertical)) File "C:\Users\Ivan\AppData\Local\conda\conda\envs\atavia\lib\site-packages\pyspark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__ File "C:\Users\Ivan\AppData\Local\conda\conda\envs\atavia\lib\site-packages\pyspark\sql\utils.py", line 63, in deco return f(*a, **kw) File "C:\Users\Ivan\AppData\Local\conda\conda\envs\atavia\lib\site-packages\pyspark\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 328, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o37.showString. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.sql.SQLException: [InterSystems IRIS JDBC] Communication link failure: Access Denied at com.intersystems.jdbc.IRISConnection.connect(IRISConnection.java:1096) ...
I am not 100% sure all installation is correct, but checking against the guide here https://sigdelta.com/blog/how-to-install-pyspark-locally/ I can successfully run the test code.