Comparative Analysis of Open Chat GPT Models: GPT-4o, ChatGPT-1, and ChatGPT-3 Mini

2025-03-02 14:40:23.407248

Introduction to Generative AI Model Evaluation

This chapter delves into a comparative analysis of three prominent open chat GPT models: GPT-4o, ChatGPT-1, and ChatGPT-3 Mini. These models, representing advancements in the field of generative AI, are evaluated based on their performance across a range of tasks designed to test different cognitive and creative abilities.

Generative AI: A type of artificial intelligence that focuses on creating new content, such as text, images, audio, and code, rather than simply analyzing or acting on existing data. These models learn patterns from training data and then use this learning to generate novel outputs that resemble the data they were trained on.

The evaluation is structured around five diverse prompt categories, each designed to challenge the models in specific areas:

Current Affairs and General Knowledge: Assessing the models’ ability to understand and analyze recent events.
Competitive Mathematics: Testing logical reasoning and problem-solving skills in a mathematical context.
Code Generation: Evaluating the models’ capacity to generate functional and well-documented code.
Logical Reasoning: Examining the models’ proficiency in solving classic logic puzzles.
Creative Image Generation: Exploring the models’ capabilities in transforming textual descriptions into visual content (where applicable) and generating effective prompts for external tools.

This comparative study aims to provide a comprehensive understanding of the strengths and limitations of each model, offering valuable insights into their suitability for various educational and practical applications.

Experimental Design: Prompt Categories and Model Responses

To rigorously assess the capabilities of GPT-4o, ChatGPT-1, and ChatGPT-3 Mini, we employed a series of carefully designed prompts.

Prompt: In the context of generative AI, a prompt is a text input given to a model to instruct it on the desired output. It serves as the starting point and guiding instruction for the model’s generation process.

Each model was presented with identical prompts within each of the five categories. The responses were then analyzed based on accuracy, clarity, detail, and efficiency. The following sections detail the prompts used and the observed responses from each model.

General Knowledge and Reasoning: Delhi Assembly Elections

This section evaluates the models’ ability to process and analyze current events, specifically focusing on the political landscape of Delhi, India.

Prompt: Discuss the outcome of the most recent Delhi assembly elections. Which party emerged victorious, and by how many seats did they win? Additionally, explain the major factors that contributed to this result and the implications it may have on Delhi’s political landscape.

GPT-4o Response and Analysis:

GPT-4o reported that in the elections held on February 5th, 2025, the Bharatiya Janata Party (BJP) secured a commanding victory, winning 48 out of 70 seats. It highlighted this as the BJP’s return to power in Delhi after 27 years.

GPT-4o delivered an impressively comprehensive and current analysis of the Delhi assembly elections. Its ability to incorporate live data through the web search feature truly sets it apart.

Web search feature: A functionality in some advanced AI models that allows them to access and process information from the internet in real-time. This enables the model to provide responses based on the most up-to-date information available online.

ChatGPT-1 Response and Analysis:

ChatGPT-1 provided details from the 2020 Delhi elections, stating that the Aam Aadmi Party (AAP), led by Arvind Kejriwal, secured 62 out of 70 seats, while the BJP managed only eight seats.

While ChatGPT-1 delivers a detailed historical analysis, its response is based on older data due to the web search features being disabled. This limitation means it misses the latest updates from the 2025 elections.

Historical analysis: The examination and interpretation of past events and trends. In the context of AI models, it refers to the model’s ability to access and process information from past datasets and events.

ChatGPT-3 Mini Response and Analysis:

ChatGPT-3 Mini reported that in the most recent elections, the BJP secured a decisive victory by winning 48 out of 70 seats, while the Aam Aadmi Party managed to win only 22 seats. It also highlighted key contributing factors such as anti-incumbency, corruption controversies, and effective campaigning strategies.

ChatGPT-3 Mini delivered a detailed and well-sourced analysis, highlighting key factors such as anti-incumbency, corruption controversies, and effective campaigning strategies.

Anti-incumbency: A sentiment against a political party or individual currently in power, often arising from dissatisfaction with their performance or policies. This sentiment can significantly influence election outcomes.

Comparative Analysis:

GPT-4o and ChatGPT-3 Mini demonstrated superior performance in this category by providing up-to-date information reflecting the hypothetical 2025 election results. GPT-4o’s real-time data integration via its web search feature was particularly noteworthy. ChatGPT-1, while providing a detailed response, was limited by its reliance on older data, showcasing the importance of current information access for tasks related to current affairs.

Competitive Mathematics

This segment assesses the models’ ability to solve mathematical problems and provide clear, step-by-step explanations, utilizing algebraic principles.

Prompt: Suppose two numbers A and B satisfy the conditions A + B = 5 and AB = 6. Using these values, find A³ + B³. Please provide a step-by-step explanation of your method and ensure you use the appropriate algebraic identities.

Algebraic identities: Equations that are always true for all values of the variables. These identities are fundamental tools in algebra for simplifying expressions and solving equations. In this context, the relevant identity is A³ + B³ = (A + B)³ - 3AB(A + B).

GPT-4o Response and Analysis:

GPT-4o delivered a clear step-by-step explanation and arrived at the correct answer of 35.

GPT-4o delivered an impressively clear step-by-step explanation and arrived at the correct answer of 35.

ChatGPT-1 Response and Analysis:

ChatGPT-1 delivered a fast, detailed, and accurate explanation, also reaching the correct answer of 35. It was noted to be particularly clear and methodical in its approach, starting with basic equations, recalling the relevant algebraic identity, and systematically substituting values.

ChatGPT-1 delivered a fast, detailed, and accurate explanation, even outpacing GPT-4o in clarity for this problem. The model’s clear structure and methodical approach made the solution easy to follow.

ChatGPT-3 Mini Response and Analysis:

ChatGPT-3 Mini also provided the correct answer of 35 and executed the computation swiftly. However, its explanation was not as detailed as those provided by GPT-4o and ChatGPT-1.

The answer is correct, and the computation was executed swiftly. However, while the response is efficient, its explanation isn’t as detailed as ChatGPT-1’s breakdown.

Comparative Analysis:

All three models successfully solved the mathematical problem and arrived at the correct answer. ChatGPT-1 stood out for its exceptionally clear and systematic step-by-step explanation, making it highly accessible for educational purposes. GPT-4o also provided a strong explanation, while ChatGPT-3 Mini, although efficient, offered a less detailed breakdown of the solution process.

Code Generation: Tic-Tac-Toe Game

This section evaluates the models’ ability to generate functional and understandable code in Python, specifically for creating a GUI-based Tic-Tac-Toe game using the Tkinter library.

GUI (Graphical User Interface): A type of user interface that allows users to interact with electronic devices through visual icons, menus, and graphical elements, rather than text-based commands.

Tkinter: A standard Python library for creating graphical user interfaces (GUIs). It provides a set of tools and widgets for building desktop applications with windows, buttons, labels, and other interactive elements.

Prompt: Write a Python program that builds a GUI-based Tic-Tac-Toe game using the Tkinter library.

GPT-4o Response and Analysis:

GPT-4o generated Python code that was well-commented and functional. The generated game ran as expected, demonstrating the model’s ability to create functional and understandable code.

GPT-4o handled our code generation prompt for a GUI-based Tic-Tac-Toe game using Tkinter effectively. The code is well-commented, making it easy to follow how each component works, and when run, the game works as expected.

ChatGPT-1 Response and Analysis:

ChatGPT-1 also generated functional Python code for the Tic-Tac-Toe game. The code included detailed comments and a clear method breakdown, making it easy to understand the game’s logic, user interaction management, and win/draw condition verification.

ChatGPT-1 handled our code generation prompt effectively. The detailed comments and clear method breakdown make it easy to understand how the game board is created, how user interactions are managed, and how the win/draw conditions are verified.

ChatGPT-3 Mini Response and Analysis:

ChatGPT-3 Mini produced a Python script that was easy to understand and well-documented, utilizing docstrings and inline comments to explain each part of the program. The code was functional, and the game ran correctly.

Docstrings (Documentation Strings): Strings used to document Python code. They are written within triple quotes and are used to explain what a function, class, module, or method does.

Inline comments: Explanatory notes written within the code itself, typically following a hash symbol (#) in Python. They are used to clarify specific lines or sections of code and improve readability.

ChatGPT-3 Mini produced a Python script that’s easy to understand and well documented. The code uses detailed docstrings and inline comments to explain each part of the program, from initializing the 3x3 game board and handling button clicks to checking for wins or draws and resetting the game.

Comparative Analysis:

All three models successfully generated functional Tic-Tac-Toe games using Tkinter. GPT-4o’s code was noted for its object-oriented approach and thorough comments, giving it a slight edge in clarity and structure. ChatGPT-1 also produced highly readable code with clear explanations. ChatGPT-3 Mini impressed with its generation speed and well-documented code, although its structural clarity was slightly less pronounced than GPT-4o and ChatGPT-1.

Object-oriented approach: A programming paradigm that organizes code around “objects,” which are instances of classes. This approach emphasizes data and methods that operate on that data, promoting modularity, reusability, and maintainability in code.

Logical Reasoning: Bridge and Torch Puzzle

This section tests the models’ logical reasoning abilities using the classic Bridge and Torch puzzle.

Prompt: Four people need to cross a bridge at night. They only have one torch, and the bridge is narrow, so at most two people can cross at a time. The torch must be used for every crossing. Person A takes 1 minute to cross, Person B takes 2 minutes, Person C takes 5 minutes, and Person D takes 10 minutes. When two people cross together, they move at the pace of the slower person. What is the minimum time required for all four people to cross the bridge? Provide a step-by-step explanation of your solution.

GPT-4o Response and Analysis:

GPT-4o outlined the puzzle, detailed a step-by-step strategy, and correctly identified the optimal solution as 17 minutes.

Optimal solution: The best possible solution to a problem, often in terms of minimizing time, cost, or resources, or maximizing efficiency or effectiveness. In this puzzle, it refers to the solution that achieves the minimum total crossing time.

GPT-4o begins by outlining the puzzle, then it details a step-by-step strategy, and after all the calculation, it gives the answer as 17 minutes, which is the minimum time required for all four to cross.

ChatGPT-1 Response and Analysis:

ChatGPT-1 clearly outlined the puzzle, detailing crossing times and constraints, and also arrived at the optimal solution of 17 minutes. Its explanation highlighted the strategy of using the fastest person for return trips and sending the slowest people together to minimize total time.

Constraints: Limitations or restrictions that must be considered when solving a problem. In the Bridge and Torch puzzle, constraints include the number of people who can cross at once, the need for the torch, and individual crossing times.

ChatGPT-1 began by clearly outlining the puzzle, detailing the crossing times for individuals along with the constraints and the rule that the slower person’s pace governs each crossing. Then it comes up with the answer at 17 minutes.

ChatGPT-3 Mini Response and Analysis:

ChatGPT-3 Mini also correctly identified the optimal solution of 17 minutes. Its explanation emphasized using the fastest individuals as shuttles for the torch to minimize extra trips and provided a clear and concise breakdown of the optimal sequence of moves.

ChatGPT-3 Mini starts by outlining the key area: using the fastest individuals as shuttles for the torch to minimize extra trips. It then details the optimal sequence of moves, adding this up to give a total of 17 minutes, which is the minimum time required for all four to cross.

Comparative Analysis:

All three models successfully solved the Bridge and Torch puzzle and arrived at the optimal 17-minute solution. ChatGPT-4o provided a comprehensive but slightly slower response. ChatGPT-1 and ChatGPT-3 Mini offered concise yet thorough explanations, demonstrating strong logical reasoning skills and efficient problem-solving.

Creative Image Generation: Futuristic Cityscape

This final section explores the models’ creative capabilities in image generation, specifically requesting a futuristic cityscape at dusk.

Prompt: Generate an image of a futuristic cityscape at dusk.

GPT-4o Response and Analysis:

GPT-4o directly generated an image featuring towering skyscrapers with neon lights, flying vehicles, and reflective rain-soaked streets, effectively capturing a cyberpunk aesthetic.

Cyberpunk aesthetic: A subgenre of science fiction characterized by a dystopian future setting with advanced technology combined with social breakdown. Visually, it often features neon lights, towering skyscrapers, advanced technology mixed with urban decay, and themes of rebellion and societal control.

GPT-4o produced an image featuring towering skyscrapers with neon lights, flying vehicles traveling the skyline, and reflective rain-soaked streets, perfectly capturing the cyberpunk aesthetic we asked for.

ChatGPT-1 and ChatGPT-3 Mini Response and Analysis:

ChatGPT-1 and ChatGPT-3 Mini, while not capable of direct image generation, can craft detailed prompts that can be used with external image generation platforms like Adobe Firefly, Midjourney, or DALL-E.

While ChatGPT-1 and O3 Mini might not generate images themselves, they can still craft detail prompts that you can feed into any image generation AI platform.

Image generation AI platform: A software or online service that uses artificial intelligence to create images from textual descriptions or other inputs. Examples include Adobe Firefly, Midjourney, and DALL-E.

Adobe Firefly, Midjourney, DALL-E: Popular examples of image generation AI platforms. Adobe Firefly is developed by Adobe, Midjourney is an independent research lab, and DALL-E is created by OpenAI.

Comparative Analysis:

GPT-4o demonstrated a significant advantage in this category by directly generating an image that effectively matched the prompt’s description. ChatGPT-1 and ChatGPT-3 Mini, while unable to generate images themselves, showcased their ability to create detailed prompts that could be used with external image generation tools, highlighting their value in guiding creative processes even without direct image output capabilities.

Overall Performance Comparison and Conclusion

Across the five diverse prompts, each model demonstrated unique strengths and limitations.

GPT-4o excelled in current affairs analysis due to its web search feature and demonstrated strong performance in mathematics, code generation, logical reasoning, and image generation. Its image generation capability is a distinct advantage compared to the other models tested.
ChatGPT-1 impressed with its clarity and methodical approach in mathematics, code generation, and logical reasoning. While limited by its lack of real-time data access, it provided accurate and detailed responses in areas not requiring up-to-the-minute information.
ChatGPT-3 Mini showed efficiency and speed in providing correct answers, particularly in mathematics and logical reasoning. Its code generation was functional and well-documented. While its explanations were sometimes less detailed than the other models, it showcased a strong overall performance.

In conclusion, GPT-4o emerges as the most versatile model in this comparison, particularly due to its real-time data access and image generation capabilities. ChatGPT-1 stands out for its clarity and methodical approach, making it potentially valuable for educational applications requiring detailed explanations. ChatGPT-3 Mini offers a balance of speed and accuracy, making it a practical choice for tasks where efficiency is paramount.

Further Exploration

To further your understanding and practical experience with these models, it is highly recommended to try out the prompts presented in this chapter yourself. Experiment with variations of these prompts and explore the models’ responses to different types of queries. Share your findings and observations in online communities and contribute to the growing body of knowledge surrounding generative AI models.