Thought leaders and other industry practitioners are constantly speculating about the far-reaching impact Artificial Intelligence will have on various industries, but practical use cases have been few and far between. The phrase ‘co-pilot’ has become the key metaphor to describe AI tools designed to assist and enhance human tasks, particularly in the context of technology and productivity. These tools are constantly being developed, some useful and others meagre.
One such tool is a developer assistant that promises to transform the way software is developed. These ‘AI pair programmers’, integrated in to Integrated Development Environments (IDEs), can be used by software developers to generate and complete code, amongst other activities. They offer suggestions such as variable names, function calls and automate repetitive tasks. They are aimed at improving productivity and accelerating the pace of software development.
With over 1,500 technology professionals and 150 running projects at Entelect, understanding which of these developer assistant tools provides meaningful value at an enterprise level is highly beneficial from a business aspect, as well as a technical one. Because of this, we decided to conduct a hands-on analysis to determine which of these tools improve developer productivity and effectiveness for enterprise-wide implementation.
Methodology
Measuring developer productivity is a complex task, and defining it remains elusive. With the intention of finding meaningful insights, not just numbers, our research focused on developers’ first hand perspective. This involved a qualitative, perception-based approach that facilitated a level of practicality. The goal is to use these insights as a baseline for a quantitative analysis down the road. This article, however, will be focusing on the qualitative metrics and insights.
These questions, focused on the tools code completion and suggestions features, were used to gather insights:
- What can it be used for?
- What is the learning curve like?
- What is the quality of the suggestions? Are they accurate?
- What bugs and issues did you encounter?
- Were there any points of tension when engaging with the tool?
- Where did it save you time?
- What can’t it be used for?
- Is it useful across different levels of experience?
- What dangers and risks does it pose?
- Would you continue using it?
The analysis was conducted by our Solution Architects in conjunction with our development teams. As the technical leadership in our organisation, Solution Architects focus on driving high-quality, relevant, and effective technology. Their extensive experience with software development at large corporations provides the necessary context needed to understand the viability of these tools. The team conducted secondary research around the AI tools before engaging in a 2-phase analysis.
Phase one
Phase one consisted of the practical application of the AI pair programmer assistants to internal projects and teams at Entelect. Projects were chosen based of their compatibility between the languages and IDEs the teams use, and the capabilities of the code assistant tools that were available at the time of the analysis. The AI tools that were selected for evaluated were:
- GitHub Copilot
- Amazon Q (Previously named AWS CodeWhisperer)
- Tabnine
- VS IntelliCode
Each team was assigned 1 of the 4 AI tools which they used for roughly 1 month. After this, the most promising AI tool was chosen for phase 2.
Phase 1 focused on understanding the quality of the code, barriers, risks and usefulness across experience levels.
Phase two
In Phase 2, the same teams from Phase 1 were all allocated the most promising tool from the previous analysis, which were used to validate the opinions of phase one, and ensure it wasn’t just a single team that had positive feedback for this specific tool, but that it was a consistent sentiment across all teams involved in the analysis.
The focus for this stage of the analysis was to understand more closely what the differentiators of the tools were and unpack the viability of enterprise-wide implementation, specifically around the inline code generation feature.
Key insights
Almost ready to be used, but not yet. While Amazon Q Developer fared better than half of the AI tools tested, there were glaring issues around the code quality, with generally more bugs and issues than benefits. That being said, our developers were still interested in seeing how the tool develops in future.
- Amazon Q Developer Free Tier – Free
- Amazon Q Developer Pro Tier – $19/user/month
The Pro tier includes the ability to centrally manage subscriptions through an admin console via the AWS IAM Identity Centre, as well as providing higher limits to more advanced features such as chat interactions compared to the free tier. It includes Amazon Q Developer Agent for code transformation and security vulnerability and code quality scanning. The Customisation feature (currently in preview) is only available on the Pro tier.
With the pro tier, Amazon ensures your code is not used to train the underlying models for anyone but the user by default. Users utilising the free tier can manually opt out of data collection via settings.
The developers used Q Developer to generate code on an existing code base, of which the accuracy and quality was middling. The general sentiment was that it was only about 60% accurate, and often suggested similar code to what already existed in the project instead of suggesting re-use of the existing code. The results and quality of the code generated through prompt engineering was mixed, and required multiple iterations to get the desired results, consuming a lot of time.
Amazon Q Developer introduced the addition of Customisations for Professional licensed users in October 2023. This allows Amazon Q Developer the ability to give developers suggestions that conform to their team’s internal libraries, proprietary algorithmic techniques, and enterprise code style by training the Amazon Q Developer model on private codebases within an organisation.
We were unable to test this functionality during our analysis due to limitations with Amazon Q Developer Professional not being supported in AWS opt-in regions such as South Africa. When this functionality is available in AWS opt-in regions in future, it promises to have the ability to provide better code suggestions for teams with the AI model having been trained securely on their own private code repositories.
The learning curve was generally quite low. The tool mostly helped with debugging and was useful with larger blocks of code, saving time from typing – however, developers still needed to spend a significant amount of time going through the code and correct issues, due to the suggestions from outdated libraries.
The tool did not work for JavaScript unit tests, SQL queries or React / Material UI suggestions. The first code suggestion was found to be consistently wrong, which cost time, and developers noted it would often give suggestions at inappropriate times or give suggestions of code with variables that were not in the codebase. Sometimes the code didn’t compile on first try and required further investigations to correct. The tool often took a while to return suggestions, and developers had to keep refreshing the AWS user profile (builder ID), which would often interrupt workflow.
Key insights
While our developers engaged with the basic and pro version of Tabnine, it did receive drastically negative reviews. The application lacked consistency and quality code that worked, and it saved no time for the developers on the project.
- Basic – Free
- Pro – $12 /user/month, free for 90 days
- Enterprise – $39 / user / month
Basic: This tier provides basic code completion, and included only inline recommendation and code completion at the time of use.
Pro: Pro provides more advanced code completion, ranging from inline to sections of code. At the time of analysis, it only provided better code completion as a differentiator. Now, it includes customised completion based on your code base, and includes security vulnerability scanning.
Enterprise: Enterprise provides the same features as the pro tier, but includes a SaaS or self-hosted solution, where recommendations are based on your entire codebase. You can also configure the recommendations around your organisational standards.
Tabnine Enterprise is a fully private, self-contained environment, with end-to-end encryption and zero data retention.
Our developers primarily used the tool to generate single lines of code and auto-complete existing lines of code. The quality of the code generated was consistently poor, and it would often introduce new classes and parameters that did not exist.
One redeeming quality noted for Tabnine was its ability to handle repetitive tasks such as and use cases. This however required an initial test case to be written first as a baseline.
There were instances where Tabnine didn’t work without reason, and wouldn’t make any suggestions. It often suggested deprecated libraries, or referenced code that had not been written and did not exist in the code base. Our developers noticed it would override complete code, creating additional work for them to check change logs.
Key insights
While VS IntelliCode helps with smaller actions and saves some time, it is not particularly revolutionary, and is not considered to be in the same category as the other AI tools. This is because VS IntelliCode forms part of IntelliSense, an autocomplete capability found in VS Studio. VS IntelliCode does ‘IntelliSense’ filtering, where it gives IntelliSense more context to the code, and improves the suggestions.
VS intellicode is accessed via Visual Studio, and consists of two different versions.
- Full version: This option is built into the full version of Visual Studio. To use it in Visual Studio you’ll need a Professional or Enterprise license, starting at the price of $45 / person / month.
- Free version – This option is an extension for the free Visual Studio Code (VS Code).
The two options focus on different languages. The free VS Code extension only does IntelliSense filtering – if you want full code generation and refactoring functionality you’ll need to use it in Visual Studio.
All model execution happens on device with no cloud involved, this means your code never leaves your machine. There is an option to upload method signatures to Microsoft’s servers in order to create a custom model for your team. This is a once off upload and does not include the full codebase.
The code quality was considered quite good, with only occasionally incorrect suggestions in IntelliSense.
IntelliCode provided a low learning curve, and saved time on repetitive tasks such as refactoring and line completion. The tool seemed to be useful across experience levels, and there were fewer dangers compared to the other tools.
IntelliCode cannot generate code from scratch like the other tools, generate unit tests, or explain the code, making it a less impactful tool. It occasionally suggests invalid properties, and it was sometimes unreliable, but the risks associated with this were minor compared to generative AI risks.
Key insights
GitHub Copilot was found to be an effective tool, and provided fairly accurate, quality code that saved developers’ time.
Project phase will play a big role in how useful the tool is i.e. its more helpful if you’re hoping to write large blocks of code, rather than if you’re testing or fixing bugs. It provides the most value when having to generate standard and templated code, as opposed to features that require something more complex. All things considered, the tool does what you expect it to, and outperformed all the others.
- Copilot individual – $10/user/month: For individual developers
- Copilot business – $19/user/month: Copilot for teams.
- Copilot Enterprise – $39/user/month: Copilot for organisations.
The primary difference between the options is that Copilot Enterprise can access larger context for code suggestions, and can access organisation’s repos and knowledge bases.
GitHub does not use business or enterprise data for the purposes of training their models. They do store some transactional information, but these are used for system monitoring and debugging and are only retained for a short duration.
Our developers found the quality of the code to be quite accurate, and any abnormalities were easy to manage or remove from the code. It promoted consistency – an important quality for code to produce the best results.
GitHub was successfully used for generating functions from natural language prompts, generating comments based on the location in the code, as well as generating unit tests with some prompts. With a low learning curve, GitHub Copilot saved time for function calls, common logic scenarios and repetitive code. The developers did not find it intrusive, and had an overall positive experience with the tool, finding it increase their productivity – especially when compared to their experience with the other tools in this article.
While this feature wasn’t available during our implementation analysis, it is worth noting the checks and balances GitHub Copilot has put into place around licencing of certain code, with the ability to allow or block suggestions around public code in the configuration. This feature emerged due to the concerns around copyrights, and the use of code in the public domain.
There were some issues around IntelliJ IDE suggestions competing with Copilot suggestions. The HTML capability didn’t seem to work, and it was occasionally slow to show suggestions, taking about 1-2 seconds before providing results.
While we mentioned its low learning curve and ease of use, this could present some problems. Our developers found it saved them time, they didn’t feel comfortable using it for something that they couldn’t do themselves, because they wouldn’t be able to tell if it was correct or not. As an engineer, you’d still need to validate the code it generated and understand whether it is making the right calls.