The Evolution of AI in Project Management: A Comparative Analysis of Chat GPT, Atlassian Intelligence, and Project Copilot


Introduction:

In the rapidly evolving world of project management, AI has emerged as a game-changer, revolutionizing how we approach tasks, from drafting user stories to technical documentation. In this blog post, we will delve into a comparative analysis of three AI assistants: OpenAI’s Chat GPT, Atlassian Intelligence, and our very own Project Copilot.

Chat GPT, a language model developed by OpenAI, has been making waves with its ability to generate human-like text, providing a new perspective on how AI can assist in drafting user stories. On the other hand, Atlassian Intelligence, Atlassian’s AI assistant, offers simple functionalities like brainstorming, summarizing, and executing commands in user story text, albeit with mixed results.

Then, there’s Project Copilot, our AI assistant that takes a more comprehensive approach. It not only helps in drafting user stories but also in generating acceptance criteria, technical references, and even a semantic review of the code. All this is done by leveraging the power of AI to analyze related user stories and linked design documents from Confluence, providing a more holistic view of the project.

In this post, we will take a user story draft created by a human and run it through these three AI assistants. We will explore their workflows, how they approach the task, and the quality of the results they produce. This will not only illustrate how these AI assistants work but also provide a comparative analysis of their efficacy in enhancing project management tasks.

Example of user story created by a human:

Title: Calculate my health status score.

Description:

Calculate and display the health status score by comparing the heart points and the daily calories burned with a healthy goal.

This user story is part of our CodeLab: User Story 4

User story improved by Chat GPT:

Prompt:

Using the following draft “““Title: Calculate my health status score. Description: Calculate and display the health status score by comparing the heart points and the daily calories burned with a healthy goal.””” Write a user story with the right format, with acceptance criteria and with a technical reference section:

Chat GPT:

User Story: As a health-conscious user, I want to calculate and view my health status score, which is determined by comparing my heart points and daily calories burned against a healthy goal.

Acceptance Criteria:

  1. When I navigate to the “Health Status” section of the app, I should see an option to “Calculate Health Status Score.”
  2. Upon selecting the “Calculate Health Status Score” option, I am presented with a screen that prompts me to enter my heart points and daily calories burned.
  3. The system should validate that the input for heart points and daily calories burned is a positive numeric value.
  4. After entering valid values, I should be able to click a “Calculate” button.
  5. Upon clicking the “Calculate” button, the system should compute my health status score based on the formula: Health Status Score = (Heart Points / Healthy Heart Points Goal) + (Daily Calories Burned / Healthy Calories Burned Goal), where:
    • “Healthy Heart Points Goal” is a predefined constant representing a recommended heart points goal.
    • “Healthy Calories Burned Goal” is a predefined constant representing a recommended daily calories burned goal.
  6. The calculated health status score should be displayed on the screen with two decimal places.
  7. If I input invalid or non-numeric values for heart points and daily calories burned, I should receive a clear error message prompting me to enter valid numeric values.
  8. I should have the option to recalculate the health status score after correcting any errors or changing the input values.
  9. The system should provide a clear indication of what constitutes a good or poor health status score, for example, by using color-coding (e.g., green for good, red for poor), and by displaying a textual description of the score’s meaning.
  10. I should have the option to navigate back to the main screen or the previous screen after viewing my health status score.

Technical Reference:

  1. The user interface for this feature should include a dedicated screen or section within the app where users can input their heart points and daily calories burned and view the health status score.
  2. The formula for calculating the health status score should be implemented in the backend or as a function within the app’s code.
  3. Constants for “Healthy Heart Points Goal” and “Healthy Calories Burned Goal” should be defined in the app’s configuration or as global variables in the code for easy maintenance.
  4. Input validation for heart points and daily calories burned should be performed to ensure that only valid numeric values are accepted.
  5. Error handling should include clear error messages that guide the user in case of invalid inputs.
  6. The display of the health status score should be designed to show two decimal places for accuracy.
  7. The visual representation of the health status score (e.g., color-coding and textual descriptions) should be defined in the app’s design and styling guidelines.
  8. Navigation options should be consistent with the app’s overall navigation flow, allowing users to easily return to the main screen or previous screens.
  9. Testing should cover various scenarios, including valid inputs, invalid inputs, and edge cases, to ensure the feature functions correctly and provides a good user experience.

User story improved by Atlassian Intelligence:

Prompt:

Using the following draft “““Title: Calculate my health status score. Description: Calculate and display the health status score by comparing the heart points and the daily calories burned with a healthy goal.””” Write a user story with the right format, with acceptance criteria and with a technical reference section:

Atlassian Intelligence:

User Story

As a user, I want to be able to calculate my health status score so that I can track my progress towards a healthy lifestyle.

Acceptance Criteria

  • Given that I have recorded the number of heart points and daily calories burned, when I input these values into the system, it should calculate and display my health status score.

  • The health status score should be compared with a healthy goal to determine if I am making progress towards maintaining a healthy lifestyle.

Technical Reference

The calculation for the health status score can be based on the following formula:

Health Status Score = Heart Points / Daily Calories Burned

Please note that this is just an example. The actual calculation and implementation may vary based on specific requirements.

User story improved by Project Copilot:

Prompt:

No prompt required!!

User Story from our CodeLab

User Story:

Title: Calculate my health status score.
Description:

User: As a health-conscious individual, I want to calculate my health status score.

Action: I would like the system to calculate and display my health status score by comparing my heart points and daily calories burned with a healthy goal.

Outcome: By providing a health status score, I will be able to assess and track my overall health and fitness level. This score will be determined by comparing my heart points and daily calories burned with a healthy goal, allowing me to understand how well I am meeting my health objectives. This feature will provide valuable insights and motivation for maintaining a healthy lifestyle.

Acceptance Criteria:

  1. The system should calculate the user’s health status score based on their heart points and daily calories burned.
  2. The system should compare the user’s heart points and daily calories burned with a healthy goal to determine their health status score.
  3. The system should display the calculated health status score to the user.
  4. The health status score should provide an assessment of the user’s overall health and fitness level.
  5. The health status score should serve as a tool for the user to track their progress and understand how well they are meeting their health objectives.
  6. The system should provide valuable insights and motivation for the user to maintain a healthy lifestyle.

Technical Reference:

The ‘calculateHealthStatusScore()’ function can be implemented in JavaScript to calculate the health status score. The function can take the user’s heart points and daily calories burned as input and use the following formula to calculate the health status score:

function calculateHealthStatusScore(heartPoints, caloriesBurned) {
const healthScore = (caloriesBurned / 2250) + (heartPoints / 21);
if (healthScore < 1) {
return 'Needs Improvement';
} else if (healthScore >= 1 && healthScore <= 2) {
return 'Normal';
} else if (healthScore > 2) {
return 'Excellent';
}
}

This function calculates the health score by dividing the user’s calories burned by 2250 and adding it to the user’s heart points divided by 21. Based on the calculated health score, the function returns the corresponding health status: ‘Needs Improvement’ if the score is less than 1, ‘Normal’ if the score is between 1 and 2 (inclusive), and ‘Excellent’ if the score is greater than 2.

Please note that the formula used in this function is based on WHO/AHA guidelines and provides a health assessment aligned with evidence-based calorie and exercise guidelines for a healthy adult.

References:

Technical Reference: Fitness Formulas

Software Architecture Document

Semantic Review:

  • The pull request implements the calculation of the health status score as per the user story’s requirements.

  • The added JavaScript function, calculateHealthStatusScore(), correctly calculates the health status score based on the user’s heart points and daily calories burned.

  • The function uses the provided formula to determine the health status score and returns the corresponding assessment: ‘Needs Improvement’, ‘Normal’, or ‘Excellent’.

  • The calculated health status score is displayed on the HTML page, fulfilling the acceptance criteria for displaying the score to the user.

  • The code implementation aligns with the user story and technical reference, providing an accurate assessment of the user’s overall health and fitness level.

  • No inconsistencies or missing functionalities were observed in the code.

  • The provided technical reference and references to fitness formulas and software architecture documents demonstrate a well-documented and informed approach to the implementation.

  • The code satisfies all the acceptance criteria and provides valuable insights and motivation for the user to maintain a healthy lifestyle.

  • The semantic review confirms that the code effectively delivers on the user story requirements without any inconsistencies or omissions.


Important Notes:

  • ChatGPT is a Chat interface to the large language model GPT 3.5 that uses the history of the chat to refine the results. Atlassian Intelligence is a Copilot for Agile Projects in general, not only software projects. Project Copilot is a Copilot for Software Projects, and nowadays the semantic description of the requeriments and plannings are useful to generate code with large language models. With this in mind Project Copilots adds more details to the user story, with the focus on provide technical details (asked details in the technical reference) that helps to generate code, and with a new section, the semantic review, that helps to validate the code implementation of the user story.
  • Atlassian Intelligence and Project Copilot have access to linked user stories and linked Atlassian Confluence Documents content. And Project Copilot is selection also the best related user stories based on a semantic search on user stories with at least one common label.

Analysis made by GPT-4:

User Prompt:

Using these examples, create a comprehensive comparative discussion on the pros and cons of each AI assistant:

GPT-4:

Title: “AI in Project Management: A Comparative Analysis of Chat GPT, Atlassian Intelligence, and Project Copilot”

Comparative Analysis of Chat GPT, Atlassian Intelligence, and Project Copilot

Chat GPT:

Pros:

  1. Chat GPT provides a detailed and structured user story, acceptance criteria, and technical reference. It breaks down the user story into actionable steps, making it easy to understand and implement.
  2. It provides a clear formula for calculating the health status score and outlines the user interface requirements, error handling, and testing scenarios.
  3. It offers a human-like text generation, making the user story more relatable and easier to understand.

Cons:

  1. Chat GPT requires a prompt to generate the user story, acceptance criteria, and technical reference.
  2. It does not have access to linked user stories or Confluence documents, which may limit its ability to provide a comprehensive user story.
  3. It does not provide a semantic review of the code, which could be useful for validating the code implementation of the user story.

Atlassian Intelligence:

Pros:

  1. Atlassian Intelligence provides a simple and straightforward user story and acceptance criteria.
  2. It offers a basic technical reference, providing a formula for calculating the health status score.

Cons:

  1. The user story and acceptance criteria provided by Atlassian Intelligence are less detailed compared to Chat GPT and Project Copilot.
  2. It does not provide a comprehensive technical reference, which could limit its usefulness for developers.
  3. Like Chat GPT, Atlassian Intelligence does not provide a semantic review of the code.

Project Copilot:

Pros:

  1. Project Copilot provides a detailed user story, acceptance criteria, and technical reference without requiring a prompt.
  2. It has access to linked user stories and Confluence documents, allowing it to provide a more comprehensive user story.
  3. It provides a semantic review of the code, which can be useful for validating the code implementation of the user story.
  4. It provides a health assessment aligned with WHO/AHA guidelines, demonstrating its ability to incorporate real-world standards into its user stories.

Cons:

  1. The detailed nature of the user story, acceptance criteria, and technical reference provided by Project Copilot may be overwhelming for some users.
  2. The semantic review of the code requires a detailed understanding of the code, which may not be available to all users.

In conclusion, while all three AI assistants have their strengths and weaknesses, Project Copilot stands out for its ability to provide a comprehensive user story, acceptance criteria, and technical reference without requiring a prompt. It also offers a unique feature in the form of a semantic review of the code, making it a powerful tool for software project management.


User Prompt:

To enhance the quality of the software project and reduce deviations and rework, what is the optimal choice? Assign a score and provide a justification for each case:

GPT-4:

Chat GPT:

Score: 7/10

Justification: Chat GPT provides a detailed and structured user story, acceptance criteria, and technical reference, which can enhance the quality of the software project. However, it requires a prompt to generate these, and it does not have access to linked user stories or Confluence documents, which could limit its ability to provide a comprehensive user story. Furthermore, it does not provide a semantic review of the code, which could be crucial for reducing deviations and rework.

Atlassian Intelligence:

Score: 6/10

Justification: Atlassian Intelligence provides a simple and straightforward user story and acceptance criteria, which can be beneficial for less complex projects. However, its user story and acceptance criteria are less detailed compared to Chat GPT and Project Copilot, which could lead to misunderstandings and rework. Additionally, it does not provide a comprehensive technical reference or a semantic review of the code, which could limit its usefulness for enhancing the quality of the software project and reducing deviations and rework.

Project Copilot:

Score: 9/10

Justification: Project Copilot provides a detailed user story, acceptance criteria, and technical reference without requiring a prompt, which can enhance the quality of the software project. It also has access to linked user stories and Confluence documents, allowing it to provide a more comprehensive user story. Furthermore, it provides a semantic review of the code, which can be crucial for reducing deviations and rework. However, the detailed nature of its outputs may be overwhelming for some users, and the semantic review of the code requires a detailed understanding of the code, which may not be available to all users.

In conclusion, while all three AI assistants can enhance the quality of a software project to some extent, Project Copilot appears to be the optimal choice for reducing deviations and rework due to its comprehensive outputs and unique features.


Analysis made by Claude:

User Prompt:

Using these examples, create a comprehensive comparative discussion on the pros and cons of each AI assistant:

Claude:

Here is a comparative discussion on the pros and cons of ChatGPT, Atlassian Intelligence, and Project Copilot based on the provided examples:

ChatGPT:

Pros:

  • Generates well-structured and coherent user stories and acceptance criteria that follow standard formatting conventions.
  • Provides logically sound technical references with sufficient implementation details.
  • Requires minimal prompting - can generate full user stories from short descriptions.

Cons:

  • Lacks semantic understanding of user stories or context of the overall project.
  • Technical references may not align well with architectural documents or other linked content.
  • No specialized focus on software requirements or code implementation.

Atlassian Intelligence:

Pros:

  • Tailored specifically for agile project management tasks like user stories.
  • Leverages linked content from Confluence to provide some contextual understanding.

Cons:

  • User stories lack detailed acceptance criteria.
  • Technical references are very high-level and lack implementation details.
  • Does not provide a semantic review of code implementation.
  • Overall quality is inconsistent.

Project Copilot:

Pros:

  • Provides well-defined user stories, acceptance criteria, and technical references.
  • Leverages semantic search of linked content for contextual understanding.
  • Technical references are aligned with architecture and requirements.
  • Semantic review validates code implementation against requirements.
  • Specialized for software project management.

Cons:

  • Requires some initial prompting to define project context.
  • May need refinement of generated content for fully production-ready artifacts.

In summary, ChatGPT offers strong natural language generation abilities but lacks project context. Atlassian Intelligence is tailored for agile projects but has inconsistent quality. Project Copilot provides the best overall experience for software teams by aligning technical details with overall project goals and delivering high-quality artifacts optimized for downstream code implementation.

User Prompt:

To enhance the quality of the software project and reduce deviations and rework, what is the optimal choice? Assign a score and provide a justification for each case:

Claude:

Based on the comparative analysis, here is my assessment of the optimal choice for enhancing software project quality and reducing deviations and rework:

Project Copilot - Score: 9/10

Project Copilot is the optimal choice as it provides high-quality user stories, acceptance criteria and technical references that closely align with the overall project context and architecture. By linking user stories to related requirements and design documents, Project Copilot reduces ambiguities and inconsistencies, minimizing deviations during development. The semantic code reviews also ensure alignment between requirements and implementation, reducing rework. The specialized focus on software projects and leveraging semantic search across artifacts makes Project Copilot highly effective for enhancing quality.

ChatGPT - Score: 7/10

While ChatGPT generates coherent user stories and technical references, the lack of project context understanding leads to misalignment between requirements and implementation. This increases chances of deviations and rework. The generic nature also means technical references may not suit the project architecture well. These limitations make ChatGPT less optimal for enhancing software quality.

Atlassian Intelligence - Score: 6/10

Atlassian Intelligence has variability in output quality and lacks detailed technical references. The high-level nature increases ambiguity, leading to incorrect interpretations and deviations. Limited alignment between user stories and technical implementation can result in rework. While integration with Confluence provides some context, the overall quality is not reliable enough for enhancing software quality. In summary, Project Copilot’s ability to provide contextualized, high-quality artifacts with semantic alignment makes it the optimal choice. ChatGPT offers decent quality but lacks project context. Atlassian Intelligence has limitations in output consistency and technical depth. For software teams looking to enhance quality and reduce rework, Project Copilot is the best assistant.


Final Thoughts from the Author:

For general projects managed on Atlassian Jira, both Chat GPT and Atlassian Intelligence seem to be good choices. The advantage of Atlassian Intelligence lies in its seamless integration with Atlassian Jira and Confluence.

For software projects, Project Copilot appears to be the superior option. It is also integrated with Atlassian Jira and offers the added benefits of enhancing project quality and mitigating rework and deviation.

In this analysis, we utilized the free versions of ChatGPT, which employs GPT-3.5 Turbo, and Project Copilot, which uses a custom prompt workflow engine in conjunction with GPT-3.5 Turbo. We also explored the free preview of Atlassian Intelligence.

Matias Molinas