Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 24:6:1199350.
doi: 10.3389/frai.2023.1199350. eCollection 2023.

Human-like problem-solving abilities in large language models using ChatGPT

Affiliations

Human-like problem-solving abilities in large language models using ChatGPT

Graziella Orrù et al. Front Artif Intell. .

Abstract

Backgrounds: The field of Artificial Intelligence (AI) has seen a major shift in recent years due to the development of new Machine Learning (ML) models such as Generative Pre-trained Transformer (GPT). GPT has achieved previously unheard-of levels of accuracy in most computerized language processing tasks and their chat-based variations.

Aim: The aim of this study was to investigate the problem-solving abilities of ChatGPT using two sets of verbal insight problems, with a known performance level established by a sample of human participants.

Materials and methods: A total of 30 problems labeled as "practice problems" and "transfer problems" were administered to ChatGPT. ChatGPT's answers received a score of "0" for each incorrectly answered problem and a score of "1" for each correct response. The highest possible score for both the practice and transfer problems was 15 out of 15. The solution rate for each problem (based on a sample of 20 subjects) was used to assess and compare the performance of ChatGPT with that of human subjects.

Results: The study highlighted that ChatGPT can be trained in out-of-the-box thinking and demonstrated potential in solving verbal insight problems. The global performance of ChatGPT equalled the most probable outcome for the human sample in both practice problems and transfer problems as well as upon their combination. Additionally, ChatGPT answer combinations were among the 5% of most probable outcomes for the human sample both when considering practice problems and pooled problem sets. These findings demonstrate that ChatGPT performance on both set of problems was in line with the mean rate of success of human subjects, indicating that it performed reasonably well.

Conclusions: The use of transformer architecture and self-attention in ChatGPT may have helped to prioritize inputs while predicting, contributing to its potential in verbal insight problem-solving. ChatGPT has shown potential in solving insight problems, thus highlighting the importance of incorporating AI into psychological research. However, it is acknowledged that there are still open challenges. Indeed, further research is required to fully understand AI's capabilities and limitations in verbal problem-solving.

Keywords: AI; Artificial Intelligence; ChatGPT; NLP; machine learning; problem-solving.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Encoding and decoding components in a machine translation application.
Figure 2
Figure 2
Encoder and decoder structure and sublayers.
Figure 3
Figure 3
Human sample outcome probabilities: outcome probabilities for each possible total score (i.e., number of correct answers, range 0–15) are presented for the Practice set (A), the Transfer set (B) and the pooled set [(C), Practice + Transfer Problems]. In each plot, the outcome with the highest probability is highlighted in black. Notably, the total score with the highest probability is equal to the performance of ChatGPT for both for each set of problems and for the pooled set.
Figure 4
Figure 4
Answer patterns equalling the ChatGPT total score: for each set of problems, all possible answer combinations leading to a score equal to that obtained by ChatGPT are presented [Practice set, (A) and Transfer set, (B)]. In each matrix, rows correspond to all possible combinations and columns to the answers. Each matrix element identifies a possible answer within a combination (black = correct, white = wrong). The matrix related to the pooled set is not presented as the number of possible combinations exceeded 86,000,000 and as such the image would have been unintelligible.
Figure 5
Figure 5
Distributions of answer combination probabilities equalling ChatGPT score: for each set of problems the distribution of probabilities associated with all possible answer combinations leading to a total score equal to that obtained by ChatGPT is presented using a scatterplot. The 5th, 50th, and 95th percentiles of the distribution are highlighted by black horizontal lines, whereas the probability associated with answer combinations equalling that of ChatGPT is identified by a black dot. Note that for ease of visualization, in each plot a down sampled number of combinations and probabilities are presented using a logarithmic scale (y-axis). (A–C) refer, respectively to Practice Problems, Transfer Problems, and Pooled Problems sets.

References

    1. Ansburg P. I., Dominowski R. I. (2000). Promoting insightful problem solving. J. Creat. Behav. 34, 30–60. 10.1002/j.2162-6057.2000.tb01201.x - DOI
    1. Bahdanau D., Cho K., Bengio Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
    1. Dell'Osso L., Bazzichi L., Baroni S., Falaschi V., Conversano C., Carmassi C., et al. . (2015). The inflammatory hypothesis of mood spectrum broadened to fibromyalgia and chronic fatigue syndrome. Clin. Exp. Rheumatol. 33(1 Suppl. 88), S109–S116. - PubMed
    1. Devlin J., Chang M. W., Lee K., Toutanova K. (2018). Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    1. Ferrucci R., Mameli F., Ruggiero F., Reitano M., Miccoli M., Gemignani A., et al. . (2022). Alternate fluency in Parkinson's disease: a machine learning analysis. PLoS ONE 17, e0265803. 10.1371/journal.pone.0265803 - DOI - PMC - PubMed

LinkOut - more resources