Hi all. We are going to find duplicates in a dataset using Apache Spark Machine Learning algorithms.

Note: I have done the following on Ubuntu 18.04, Python 3.6.5, Zeppelin 0.8.0, Spark 2.1.1

Introduction

In previous articles we have done the following:

0   0 1
0

comments

222

views

0

rating

Hi all.

I want to insert my dataframe into InterSystems IRIS. So, I tried to do this:

df = spark.read.load("/home/imported-openssh-key/zeppelin-0.8.0-bin-all/bin/resultData3/DF.json", format="json")
df.write.format("com.intersystems.spark").\
option("url", "IRIS://localhost:51773/DEDUPL").\
option("user", "********").option("password", "********").\
option("dbtable", "try.test1").save()

And got this error:

Last answer 21 September 2018 Last comment 21 September 2018
+ 1   0 1
301

views

+ 1

rating

Hi all. Today we are going to upload a ML model into IRIS Manager and test it.

Note: I have done the following on Ubuntu 18.04, Apache Zeppelin 0.8.0, Python 3.6.5.

Introduction

These days many available different tools for Data Mining enable you to develop predictive models and analyze the data you have with unprecedented ease. InterSystems IRIS Data Platform provide a stable foundation for your big data and fast data applications, providing interoperability with modern DataMining tools. 

Last comment 17 May 2019
+ 5   2 3
487

views

+ 5

rating