AI Pair Programmers:
A hands-on analysis

Which of these tools are best suited for enterprise-wide implementation?

Thought leaders and other industry practitioners are constantly speculating about the far-reaching impact Artificial Intelligence will have on various industries, but practical use cases have been few and far between. The phrase ‘co-pilot’ has become the key metaphor to describe AI tools designed to assist and enhance human tasks, particularly in the context of technology and productivity. These tools are constantly being developed, some useful and others meagre.

One such tool is a developer assistant that promises to transform the way software is developed. These ‘AI pair programmers’, integrated in to Integrated Development Environments (IDEs), can be used by software developers to generate and complete code, amongst other activities. They offer suggestions such as variable names, function calls and automate repetitive tasks. They are aimed at improving productivity and accelerating the pace of software development.

With over 1,500 technology professionals and 150 running projects at Entelect, understanding which of these developer assistant tools provides meaningful value at an enterprise level is highly beneficial from a business aspect, as well as a technical one. Because of this, we decided to conduct a hands-on analysis to determine which of these tools improve developer productivity and effectiveness for enterprise-wide implementation.

Methodology

Measuring developer productivity is a complex task, and defining it remains elusive. With the intention of finding meaningful insights, not just numbers, our research focused on developers’ first hand perspective. This involved a qualitative, perception-based approach that facilitated a level of practicality. The goal is to use these insights as a baseline for a quantitative analysis down the road. This article, however, will be focusing on the qualitative metrics and insights.

These questions, focused on the tools code completion and suggestions features, were used to gather insights:

  • What can it be used for?
  • What is the learning curve like?
  • What is the quality of the suggestions? Are they accurate?
  • What bugs and issues did you encounter?
  • Were there any points of tension when engaging with the tool?
  • Where did it save you time?
  • What can’t it be used for?
  • Is it useful across different levels of experience?
  • What dangers and risks does it pose?
  • Would you continue using it?

The analysis was conducted by our Solution Architects in conjunction with our development teams. As the technical leadership in our organisation, Solution Architects focus on driving high-quality, relevant, and effective technology. Their extensive experience with software development at large corporations provides the necessary context needed to understand the viability of these tools. The team conducted secondary research around the AI tools before engaging in a 2-phase analysis.

Phase one

Phase one consisted of the practical application of the AI pair programmer assistants to internal projects and teams at Entelect. Projects were chosen based of their compatibility between the languages and IDEs the teams use, and the capabilities of the code assistant tools that were available at the time of the analysis. The AI tools that were selected for evaluated were:

  • GitHub Copilot
  • Amazon Q (Previously named AWS CodeWhisperer)
  • Tabnine
  • VS IntelliCode

Each team was assigned 1 of the 4 AI tools which they used for roughly 1 month. After this, the most promising AI tool was chosen for phase 2.

Phase 1 focused on understanding the quality of the code, barriers, risks and usefulness across experience levels.

Phase two

In Phase 2, the same teams from Phase 1 were all allocated the most promising tool from the previous analysis, which were used to validate the opinions of phase one, and ensure it wasn’t just a single team that had positive feedback for this specific tool, but that it was a consistent sentiment across all teams involved in the analysis.

The focus for this stage of the analysis was to understand more closely what the differentiators of the tools were and unpack the viability of enterprise-wide implementation, specifically around the inline code generation feature.

Key insights


Almost ready to be used, but not yet. While Amazon Q Developer fared better than half of the AI tools tested, there were glaring issues around the code quality, with generally more bugs and issues than benefits. That being said, our developers were still interested in seeing how the tool develops in future.

Key insights


While our developers engaged with the basic and pro version of Tabnine, it did receive drastically negative reviews. The application lacked consistency and quality code that worked, and it saved no time for the developers on the project.

Key insights


While VS IntelliCode helps with smaller actions and saves some time, it is not particularly revolutionary, and is not considered to be in the same category as the other AI tools. This is because VS IntelliCode forms part of IntelliSense, an autocomplete capability found in VS Studio. VS IntelliCode does ‘IntelliSense’ filtering, where it gives IntelliSense more context to the code, and improves the suggestions.  

Key insights

 

GitHub Copilot was found to be an effective tool, and provided fairly accurate, quality code that saved developers’ time.  

Project phase will play a big role in how useful the tool is i.e. its more helpful if you’re hoping to write large blocks of code, rather than if you’re testing or fixing bugs. It provides the most value when having to generate standard and templated code, as opposed to features that require something more complex. All things considered, the tool does what you expect it to, and outperformed all the others.

Concluding thoughts for phase one

All the tools required a level of interrogation for the quality of the code it suggested, with GitHub Copilot received the most positive feedback. However, this feedback was only from a single team. The 2nd phase of the analysis set out to understand if the experience provided by GitHub Copilot will be consistent across other projects and  teams, and see how it compared to the other tools tested in phase 1 directly. We decided to drop IntelliCode from the analysis at this stage – while it provided a decent amount of value, it wasn’t truly comparable to the capabilities found in the other tools.

Moving into phase 2, the teams that tested Amazon Q Developer and in phase 1, would use GitHub Copilot and compare the tools more directly.

Key insights

GitHub Copilot outperformed Amazon Q Developer by providing better quality code suggestions that correlated with an existing codebase and had better performance across a wider range of technologies.

 

Similarities

Both tools provided a low learning curve, and the developers felt that the code required a level of interrogation to judge its usefulness and accuracy in both cases.

 

Differences

Developers felt that the quality of the code suggested by GitHub Copilot was significantly better and more accurate, with little duplication, which was an issue they had with Amazon Q Developer. Overall, Q Developer had significantly more bugs and friction points, such as unstable IDE plugins, and the need to log in with AWS builder ID repeatedly. While the developers found Amazon Q Developer to be better at editing existing code, Git-Hub Copilot generated more meaningful and accurate code.

 

Key insights

Github Copilot was found superior when compared to Tabnine. The developers found its performance to be more accurate and faster, without being invasive. The experience was also found to be generally better, with GitHub Copilot delivering on its marketed services.

 

Similarities

While the tools had similar capabilities, that was the only real resemblance they had with each other.

Differences

GitHub Copilot made good recommendations from the existing codebase. Its suggestions were aimed at finishing a line of code, whereas with Tabnine, the suggestions were often for a block of code, which was found to be less helpful. The user experience of GitHub Copilot far outweighed the experience our developers had with Tabnine. Tabnine would take every newline as a prompt to suggest new code, which was both distracting and unnecessary, while Co-Pilot only suggested code when necessary. The friction points presents negated any real value Tabnine could provide.

 

 

 

Conclusion

The fast paced environment of these technologies and AI tools means they are constantly being updated and introduced into the market. There were several tools launched or improved upon during our analysis, therefore, these conclusions are based off of our experience from 2023/2024.

While these tools have exciting potential, for the most part they fall short and lack maturity to deliver on their promises of accelerating software development.

Github Copilot stands out as the most effective tool, providing high-quality code suggestions, improving developer productivity, and offering a positive user experience, demonstrating adaptability across various project requirements and technologies. The teams used it for JavaScript, CSS, React and Typescript and successfully managed to generate code, comments, as well as unit tests with comments.

Even with the positive outcomes for GitHub Copilot, organisations should still scrutinise if the value it provides is worth the price tag of almost $19 per user a month – especially when considering it for enterprise wide implementation. When it comes down to it, the tool is just a supplement to writing code, it won’t simply 10x an organisations teams’ productivity. The tool could even do harm if it’s used without discretion. Juniors and less experience developers will find them less useful, as they will only be able to correct Github Copilot’s mistakes, bugs, or bad designs with an understanding of best practices, and experienced developers must still exercise discernment and validation when integrating AI-generated code into their projects.

Effective ways to introduce, transform, and enhance ways of working.

Agile implementation has slipped away from us. Many organisations are deeply engaged in it, but opportunism and a desire for quick results have diluted the practice. We have forgotten that Agile is the mindset we should adopt and not the framework that makes it happen.

 

In our latest publication, we draw from our experience of working with local and international blue-chip organisations for over 20 years, and unpack various Agile methodologies.