ChatGPT on Trial: A Tester’s Experience

Generative AI has been accused of being an overhyped toy or a black swan coming for our jobs. As a tester who has incorporated this technology into my daily routine, I put ChatGPT on trial to see whether it’s a useful tool for testers and developers or just a flashy pretender.
Some of the uses that come to mind are writing better unit tests, saving time on development or making more robust code. It can also be invaluable if you are working on a personal project or if you have test workforce restrictions.
But how trustworthy and helpful is it, really?
I put it to the test for several use cases:
- Test data generation (symbol sequences valid for tests)
- Code generation (machine executable commands valid for scripted testing/test automation)
- Test idea/case/strategy generation (human executable commands valid for testing activities)
Test Data
Defining test data is an important part of the testing process, particularly in unit tests where specific inputs are needed to validate code functionality. Yet it often involves monotonous work when you know exactly what to include or the data is iterative.
In such cases you can get the desired data by providing a prompt like, “I need a list of cities starting with the letters ‘XXX’” or “create a sequence of Unicode symbols containing a specific character set.” But how does it fare with more realistic testing tasks?
Simple Text Modifications, Replacement
Let’s consider a scenario where we need to incorporate an existing expression language into a new part of the system. Given the different usage context in this new location, the expression language syntax should be transformed before it can be used.
From a developer’s perspective, a “quick and dirty” approach to cover this functionality with unit test validation would be to use existing test data. I used examples from a product documentation website as the input. The idea for ChatGPT is to take the original expressions and modify them to serve as test data for our unit tests.
I asked ChatGPT to extract and transform the expressions from the product documentation website using the Webpilot plugin, and then transform each following the example below:
In the output, ChatGPT returned expected results for each expression so I can instantly put them in my tests:
As you can see, ChatGPT did its job well, aside from the fact that Boolean operators are case sensitive, so (True) should be replaced with (true).
This could have been done with some custom code or Excel, but we spared ourselves the monotonous job of copying a lot of data from a table, leaving room to deal with more complex work.
In more complex cases, with a couple more prompts you can make ChatGPT iterate over different types of data, other expressions or combinations of them.
Simulate a User
So we know it works when dabbling with formulas or attributes and creating data to use in tests. Now let’s take it one step further. Since we are testing syntax that could be generated by a user, let’s try to simulate the user — or at least mimic the user and then implement “user” ideas with expression syntax.
We are on shaky ground here. Simulating a user can be part of the testing process, but it’s insanely context-dependent. Reasonably applied, this approach could be beneficial for startups, solo developers, proof-of-concept projects or small companies struggling with resources.
For simplicity, let’s assume that the machine is sufficiently proficient with the subject area to give us ideas and that the general context allows us to make this experiment.
We will use two consecutive prompts here. First, we will ask for ideas:
In the next prompt, we will explain the rules of expression generation to ChatGPT and ask it to generate expressions based on the concepts it generated earlier:
The final output contains generated concepts and expressions.
So, it seems we can simulate some test data, potentially based on some general knowledge about the target industry. While it’s a fun experiment, the real question is whether this would be useful in everyday work.
Expect Mistakes
Some of the expressions provided by ChatGPT in the previous example are wrong. It put wrong construction into wrong places, mismatched (“) ({) symbols and so on.
That is frustrating, but luckily our goal is to test and explore possibilities — after all, error handling is an important quality characteristic of good software.
We can even use the system’s fallibility to our advantage and ask ChatGPT to generate potentially “harmful” constructions.
This prompt helped me find some CSS constructions that resulted in troubles with our validator.
Code
One of the initial use cases that OpenAI focused on is code generation. OpenAI Codex has already been in beta for three years, so let’s test if it can generate scripts for testing activities. If you’re a developer, you’ve likely already experimented with generating code and formed your own opinions about it. However, if you’re a tester with limited coding experience, there are specific use cases where generative AI could be particularly beneficial:
API
Let’s see how ChatGPT will show itself when we ask it to generate scripts according to the published documentation for the API.
The output will contain some simple checks and even an example of cURL that we can grab to check in Postman:
After placing the code in Postman, we can see that only the first check passed and others failed:
It seems like tests failed because the structure of the response from the API is different from what was expected in the tests, so ChatGPT failed to correctly parse structure from a given website link. After providing information, it recreated code, and now all tests are correct.
Can such simple checks help us? Well, probably for the very initial stages. More complex end-to-end scenarios require more debugging and tuning efforts, and it will be easier for you to write tests yourself only using the help of the ChatGPT.
Test Automation
The scenario is pretty much the same when we talk about applying ChatGPT for test automation. It’s able to write the simplest checks, but I would like to stress another interesting possibility.
I tested ChatGPT on one of our projects that required test scripts for a web interface whose architecture was not well adapted to the test automation framework used on the project.
We decided to explore other test frameworks and used ChatGPT to help with transferring the code from one automation framework to another with minimal losses in debug time. We quickly found the suited framework that fit and successfully implemented it.
Once again, the task was not to create, but to transform an existing text, and generative AI has proven an effective tool for that.
Ideas for Testing
Now we come to the most controversial topic in the entire study: Can ChatGPT be creative and provide us with testing ideas? To test it, I explained the product functionality and asked the system to give me some suggestions:
The answer contains some testing terminology and mentioned types of testing that could be used in this scenario.
However, it did not address some of the more specific questions I asked about risks affecting client experience nor did it provide any particular insight. It was more like a listicle of available testing types that might be found via simple Google search.
Of course, this answer can be used as heuristic for compiling and generating more suitable test ideas. Or you can describe the functionality in more detail until ChatGPT’s output fits your expectations — “Asking the right question is half the answer.”
But my recommendations (besides the obvious one — have a tester for your team) would be:
- Use proven heuristic methods such as this model.
- Appeal to quality standards.
- Gain a deep understanding of the context at hand and acknowledge its inherent limitations.
- Uncover and highlight potential risks associated with the product, generate test ideas and implement practical actions to address them.
The Verdict
While generative AI has its limitations, it presents opportunities for those who act cautiously and are aware of its strengths, weaknesses and biases. It’s a tool, and like any tool, its effectiveness depends on how it’s used.
By leveraging generative AI, you can:
- Generate diverse test data for both unit and end-to-end tests, ensuring desired coverage and saving time on repetitive tasks.
- Simulate user interactions with your product exploring potential paths and use cases and uncovering issues.
- Gain test ideas helping to identify different testing scenarios and improve the quality of their software.
So step into the virtual courtroom and give generative AI a fair trial.