Article
· Apr 18, 2023 2m read

AI generated text detection using IntegratedML

In recent years, artificial intelligence technologies for text generation have developed significantly. For example, text generation models based on neural networks can produce texts that are almost indistinguishable from texts written by humans.
ChatGPT is one such service. It is a huge neural network trained on a large number of texts, which can generate texts on various topics and be matched to a given context. 

A new task for people is to develop ways to recognize texts written not only by people but also by artificial intelligence (AI). This is because, in recent years, neural network-based text generation models have become capable of producing texts that are almost indistinguishable from texts written by humans.

There are two main methods for AI-written text recognition:

  • Use machine learning algorithms to analyze the statistical characteristics of the text;
  • Use cryptographic methods that can help determine the authorship of the text

In general, the task of AI text recognition is difficult but important.

I am happy to present an application for the recognition of the texts generated by AI. During development, I took the benefits of InterSystems Cloud SQL and Integrated ML, which include:

  • Fast and efficient data requests with high performance and speed;
  • User-friendly interface for non-experts in databases and machine learning;
  • Scalability and flexibility to quickly adjust ML models according to requirements;

In the development and further training of the model, I used an open dataset, namely 35 thousand written texts. Half of the texts were written by hand by a large number of authors, and the other half was generated by AI with ChatGPT.

Configuration used for GPT model:

model="text-curie-001"
temperature=0.7
max_tokens=300
top_p=1
frequency_penalty=0.4
presence_penalty=0.1

Next, about 20 basic parameters were determined, according to which further training was carried out. Here are some of the options I used:

  • Characters count
  • Words count
  • Average word length
  • Sentences count
  • Average sentence length
  • Unique words count
  • Stop words count
  • Unique words ratio
  • Punctuations count
  • Punctuations ratio
  • Questions count
  • Exclamations count
  • Digitals count
  • Capital letters count
  • Repeat words count
  • Unique bigrams count
  • Unique trigrams count
  • Unique fourgrams count

As a result, I got a simple application that you can use for your tasks or just have fun.

This is what it looks like:

imageTo try the application you can use online demo or run it locally with your own Cloud SQL account. 

Also, this application participates in the contest. If you like it, vote for it.

Welcome to the comments to discuss this app if you were interested.
 

Discussion (1)1
Log in or sign up to continue

Hi Oleh,

Your video is available on InterSystems Developers YouTube:

⏯️ Intersystems IRIS AI text detection

https://www.youtube.com/embed/rkvxNwLR-Lw
[This is an embedded link, but you cannot view embedded content directly on the site because you have declined the cookies necessary to access it. To view embedded content, you would need to accept all cookies in your Cookies Settings]

Great work!