To be honest, I’ve always been sceptical about AI-based autocomplete. When I program, I’m deeply focused on a problem and I have my preferences on how to write code. Suggestions that pop up here and there are a huge distraction for me. That’s why tools like Kite or Tab Nine have never worked for me. Copilot, a dev productivity tool from GitHub and OpenAI however, has different user interfaces that are more accommodating to a user. But…
Natural language models like GPT-3 are good for language related tasks, but for programming use cases they leave a lot of valuable information on the table, e.g. grammatical information that we can get from static analysis, compilation, etc. By applying transformations on a syntax tree (ST) level, we can use programming language grammar to modify a codebase depending on our use case. I think this alternative approach that doesn’t require character by character or line by line generation is very promising. This is the approach that I truly believe in.
Working with the fundamental level of the grammar of programming languages, we can develop a universal algorithm, and, more importantly, with this approach, it’s more scalable to all programming languages because the grammar of under-the-hood programming languages has a similar structure.
If you are operating at the level of syntax trees, adjusting to specific programming languages, like Python, Javascript, Java or any other popular language doesn’t present a real problem. The tree-sitter project is proof that maintaining parsers for many different programming languages is not an insurmountable problem.
Why do we even consider code generation? Is it really necessary for developer productivity?
Code generation is the next level of developer productivity
Let’s first reflect on what part of a developer’s work has the most value. It’s not actually at the point where they are typing characters on their keyboard. It’s thinking. It’s decisions that they should make around system architecture and what trade-offs make sense. Basically, it’s a high level of abstract mental work.
With code generation tools, most boilerplate code could be generated automatically without human involvement, which unlocks the next level of productivity for developers and frees up time for deep thinking.
Let me give you an example. With React, you can run the create-react-app, and React handles decisions that are very important, but quite tedious. The create-react-app generates a project structure from a well-accepted template that follows best practices. It makes it easy to customize this for your own needs if you need to later on.
The same approach could be applied to code generation for different use cases from suggesting changes of code to actual code fixes. I think it could be especially useful for bug fixes.
When you program, you might not even realize that there is a bug in your code, especially when you deal with a new project and build something from scratch or interact with a new library or API. Bugs, in most cases, occur in interactions between different components. More importantly, there are bugs that occur due to human miscommunication. Those bugs are impossible to remove unless we replace all humans with machines (joking).
But the remaining 80% of bugs occur for very logical reasons: environment variables were set incorrectly or a programmer is not familiar with a new library or an API, etc. I believe that with the syntax tree transformations approach and deep understanding of code grammar, generating code fixes will be possible in the near future.
What’s your take on code generation? What approach do you support?