Article
· Mar 11 8m read

Generating meaningful test data using Gemini

We all know that having a set of proper test data before deploying an application to production is crucial for ensuring its reliability and performance. It allows to simulate real-world scenarios and identify potential issues or bugs before they impact end-users. Moreover, testing with representative data sets allows to optimize performance, identify bottlenecks, and fine-tune algorithms or processes as needed. Ultimately, having a comprehensive set of test data helps to deliver a higher quality product, reducing the likelihood of post-production issues and enhancing the overall user experience. 

In this article, let's look at how one can use generative AI, namely Gemini by Google, to generate (hopefully) meaningful data for the properties of multiple objects. To do this, I will use the RESTful service to generate data in a JSON format and then use the received data to create objects.

This leads to an obvious question: why not use the methods from %Library.PopulateUtils to generate all the data? Well, the answer is quite obvious as well if you've seen the list of methods of the class - there aren't many methods that generate meaningful data.

So, let's get to it.

Since I'll be using the Gemini API, I will need to generate the API key first since I don't have it beforehand. To do this, just open aistudio.google.com/app/apikey and click on Create API key.

and create an API key in a new project

After this is done, you just need to write a REST client to get and transform data and come up with a query string to a Gemini AI. Easy peasy 😁

For the ease of this example, let's work with the following simple class

Class Restaurant.Dish Extends (%Persistent, %JSON.Adaptor)
{
Property Name As %String;
Property Description As %String(MAXLEN = 1000);
Property Category As %String;
Property Price As %Float;
Property Currency As %String;
Property Calories As %Integer;
}

In general, it would be really simple to use the built-in %Populate mechanism and be done with it. But in bigger projects you will get a lot of properties which are not so easily automatically populated with meaningful data.

Anyway, now that we have the class, let's think about the wording of a query to Gemini. Let's say we write the following query:

{"contents": [{
    "parts":[{
      "text": "Write a json object that contains a field Dish which is an array of 10 elements. Each element contains Name, Description, Category, Price, Currency, Calories of the Restaurant Dish."}]}]}

If we send this request to https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent?key=APIKEY we will get something like:

 
Spoiler

Already not bad. Not bad at all! Now that I have the wording of my query, I need to generate it as automatically as possible, call it and process the result.

Next step - generating the query. Using the very useful article on how to get the list of properties of a class we can generate automatically most of the query.

ClassMethod GenerateClassDesc(classname As %String) As %String
{
    set cls=##class(%Dictionary.CompiledClass).%OpenId(classname,,.status)
    set x=cls.Properties
    set profprop = $lb()
    for i=3:1:x.Count() {
        set prop=x.GetAt(i)
        set $list(profprop, i-2) = prop.Name        
    }
    quit $listtostring(profprop, ", ")
}

ClassMethod GenerateQuery(qty As %Numeric) As %String [ Language = objectscript ]
{
    set classname = ..%ClassName(1)
    set str = "Write a json object that contains a field "_$piece(classname, ".", 2)_
        " which is an array of "_qty_" elements. Each element contains "_
        ..GenerateClassDesc(classname)_" of a "_$translate(classname, ".", " ")_". "
    quit str
}

When dealing with complex relationships between classes it may be easier to use the object constructor to link different objects together or to use a built-in mechanism of %Library.Ppulate.

Following step is to call the Gemini RESTful service and process the resulting JSON.

ClassMethod CallService() As %String
{
 Set request = ..GetLink()
 set query = "{""contents"": [{""parts"":[{""text"": """_..GenerateQuery(20)_"""}]}]}"
 do request.EntityBody.Write(query)
 set request.ContentType = "application/json"
 set sc = request.Post("v1beta/models/gemini-pro:generateContent?key=<YOUR KEY HERE>")
 if $$$ISOK(sc) {
    Set response = request.HttpResponse.Data.Read()	 
    set p = ##class(%DynamicObject).%FromJSON(response)
    set iter = p.candidates.%GetIterator()
    do iter.%GetNext(.key, .value, .type ) 
    set iter = value.content.parts.%GetIterator()
    do iter.%GetNext(.key, .value, .type )
    set obj = ##class(%DynamicObject).%FromJSON($Extract(value.text,8,*-3))
    
    set dishes = obj.Dish
    set iter = dishes.%GetIterator()
    while iter.%GetNext(.key, .value, .type ) {
        set dish = ##class(Restaurant.Dish).%New()
        set sc = dish.%JSONImport(value.%ToJSON())
        set sc = dish.%Save()
    }    
 }
}

Of course, since it's just an example, don't forget to add status checks where necessary.

Now, when I run it, I get a pretty impressive result in my database. Let's run a SQL query to see the data.

The description and category correspond to the name of the dish. Moreover, prices and calories look correct as well. Which means that I actually get a database, filled with reasonably real looking data. And the results of the queries that I'm going to run are going to resemble the real results.

Of course, a huge drawback of this approach is the necessity of writing a query to a generative AI and the fact that it takes time to generate the result. But the actual data may be worth it. Anyway, it is for you to decide 😉

 
P.S.

P.P.S. The first image is how Gemini imagines the "AI that writes a program to create test data" 😆

Discussion (3)3
Log in or sign up to continue

Here is a video to illustrate the article: Generating meaningful test data using Gemini

https://www.youtube.com/embed/00g64yDMamw
[This is an embedded link, but you cannot view embedded content directly on the site because you have declined the cookies necessary to access it. To view embedded content, you would need to accept all cookies in your Cookies Settings]