Testing ChatGPT as a Pair Programming Partner

Table of Links

3 Pair programming with ChatGPT

Pair programming (Williams, 2001) is a software development technique in which two programmers work together at one workstation. One, the driver, writes code while the other, the navigator reviews each line of code as it is typed in and considers the “strategic” direction of the work. In what follows, we, the human-part of the pair, put ourselves in the role of the navigator, giving specific tasks to the driver, ChatGPT, representing the AI-part of the pair.

We begin prompting ChatGPT with tasks that require rather less lines of code in order to not overwhelm the reader with the amount of results, while still demonstrating as many abilities of ChatGPT as possible. Then, we gradually move to larger tasks until we get all of them implemented. Our prompts are in sans serif. Responses of ChatGPT are either in verbatim, if the response is code, or in italic, otherwise. In its first occurrence, we indicate this explicitly. Note that as ChatGPT is quite a loquacious LLM, we mostly limit the length of its responses, otherwise we would be overwhelmed with answers of unnecessary detail. For the same reason, when asking ChatGPT for code, we mostly omit its comments on the produced code. Also note that when commenting on ChatGPT’s responses, we speak of it as if it were a human, e.g., it “understands”, “knows” or “is aware of” something, which should be interpreted by the reader in the sense that ChatGPT produced a response that (typically very well) mimics a corresponding human reaction. As ChatGPT’s responses are by default non-deterministic, i.e., giving it the same prompt again, the response might slightly differ. To take this feature into account, we re-generate the response for each of our prompts three times, and if these responses are factually different from each other, we indicate it accordingly. Finally note that the whole interaction is conducted in one chat window. Once we observe that ChatGPT starts to forget the previous context due to the reasons described in Section 2.2, we re-introduce it as we describe in the same section.

In the rest of this section, we first investigate the knowledge of ChatGPT on the topic under consideration, and then we prompt it to generate code for evaluation of the density of the Clayton copula, for ML estimation of its parameter, for sampling from the copula, for creating a visualization of the example Monte Carlo approach, and for optimizing the code for parallel computations.

3.1 Warm up

We see that ChatGPT can save our time by quickly and concisely summarizing basic facts about the topic of our interest. We can also limit the size of the answer, which is satisfied in this 91 words long answer. The information about the positive dependence probably follows from what we have already stated before: the negatively dependent models are rarely used in practice, which is probably reflected in ChatGPT’s training data. However, several details of the answer can be discussed. In lines 3 and 4, “random variables” instead of just “variables” would be more precise. From the last sentence, it follows that an Archimedean copula can be expressed as the generator function for some symmetric distributions. This is at least confusing as Archimedean copulas are rather a particular class of copulas admitting a certain functional form based on so-called generator functions. Finally, symmetric distributions have their precise meaning: such a distribution is unchanged when, in the continuous case, its probability density function is reflected around a vertical line at some value of the random variable represented by the distribution. Whereas Archimedean copulas posses a kind of symmetry following from their exchangeability, they do not belong to symmetric distributions.

To investigate the limits of ChatGPT’s knowledge, let us prompt it with two further questions. According to the previous response, we can speculate that it has limited knowledge on the Clayton models with negative dependence.

We prompted ChatGPT to answer the same question three times and got contradicting answers. The first answer is correct, however, after asking again, ChatGPT changed its mind. Before commenting on that, let us try once again, with a bit more complex concept.

If (U1, U2) ∼ C, then the survival copula of C is the distribution of (1 − U1, 1 − U2), and thus the properties of the lower tail of C are the properties of the upper tail of the survival copula. Hence, we got an incorrect answer. After asking again, we got this response.

Again, we got contradicting answers. Based on this observation, the reader could raise the following question.

This response confirms what we have seen so far, hence, any user should take these limitations into account with the utmost seriousness and be extremely careful when asking ChatGPT for some reasoning. The examples above also well illustrate that the current version of ChatGPT is definitely not an appropriate tool for reasoning, which is as also observed by Frieder et al. (2023) and Bang et al. (2023). However, this by no means implies that it cannot serve as a helpful AI partner for pair programming.

Author:

(1) Jan G´orecki, Department of Informatics and Mathematics, Silesian University in Opava, Univerzitnı namestı 1934/3, 733 40 Karvina, Czech Republic (gorecki@opf.slu.cz).

This paper is available on arxiv under CC BY 4.0 DEED license.