GitHub Copilot—The New Era of Paired Programming

What is Github Copilot?

GitHub recently launched a new AI-powered tool that collaborates with people on their software development projects. Named as GitHub Copilot, this tool suggests lines or entire functions as the coder types. Just as Gmail offers suggestions to complete parts of sentences while typing an email, Copilot offers suggestions based on commented code or code being typed. Alternatively, there are multiple suggestions from which one can choose the best fit for a particular use.

How does Copilot work?

GitHub Copilot is powered by the OpenAI Codex, a new AI system created by OpenAI. OpenAI Codex was trained on publicly available source code and natural language, so it understands both programming and human languages. The GitHub Copilot editor extension sends your comments and code to the GitHub Copilot service, which then uses OpenAI Codex to synthesize and suggest individual lines and whole functions.

This is not the first AI-powered program synthesis tool. Some of the tools released before it were GitHub’s Natural Language Semantic Code Search in 2018, which demonstrated finding code examples using plain English descriptions, and Tabnine which has provided AI-powered code completion for a few years now. Copilot differs from the rest because it can generate AI-powered-line functions and even documentation and tests, based on the full context of a file of code. This is particularly exciting for a lot of developers because it will lower the barrier to coding.

The Flak Copilot Faced

Recently Copilot has come under huge criticism as Copilot is powered by a deep neural network language model called Codex, which was trained on public code repositories on GitHub. People have been critical about Github using all these public repository codes without permission. Github, however, stands firm on its claim that the codes were used as it was the entity of the company and that it has not sold or outsourced the codes which are against the company policy.

According to OpenAI’s paper, Codex only gives the correct answer 29% of the time. As seen in the demos, the code it writes is generally poorly refactored and fails to take full advantage of existing solutions. Copilot has read GitHub’s entire public code archive, consisting of tens of millions of repositories, including code from many of the world’s best programmers. Even after this Copilot writes such crappy code because that is how language models work. The models show on average how most people write. The models do not have any sense of what is correct or what is good. By software standards, most code on GitHub is pretty old and written, by and large, by average programmers. Copilot gives out its best guess of what those programmers might write if they were writing the same file you are.

Pair Programming—A Boon Or A Bane?

As coders and developers know the best, most of the coding time is not taken up in writing code, but rather in designing, debugging, and maintaining the written code. When code is automatically generated, it’s easy to end up with a lot more of it. That’s not a problem if all you have to do to maintain or debug and modify the source from which the code is auto-generated while using code template tools. Even then, things can get confusing when debugging, since the debugger and stack traces will generally point at the verbose generated code, not at the templated source.

With Copilot, we don’t have any of these upsides. We nearly always have to modify the code that’s generated by Copilot. If we want to change how it works, then we have to debug the generated code directly. Hence developers are learning lesser, slower, increasing technical deficit, and introducing subtle bugs in their programs.

Copilot might be more useful for languages that are high on boilerplate and have limited meta-programming functionality, such as Go. Another area that Copilot may be particularly suited to be is experienced programmers working in unfamiliar languages since it can help get the basic syntax right and point to library functions and common idioms.

The thing we need to remember is that Copilot is an early preview of a very new technology that is going to get better and better. There will be many competitors popping up in the coming months and years, and GitHub will no doubt release new and better versions of their own tool.

To see real improvements in program synthesis, we will need to go beyond just language models to a more holistic solution that incorporates best practices from human-computer interaction, software engineering, testing, and several other disciplines. Currently, Copilot feels like a product designed and implemented by machine learning researchers, rather than a complete solution incorporating all the needed domain expertise.

Written by: Anirudh Murthy

CyberManipal