An introduction into AST tooling

In this blog, we are going to take a deeper look at Abstract Syntax Trees (AST) and their applications in the tools and libraries that you use in your day-to-day job. We are more interested in the pragmatic usage and application of AST rather than the arcane academic theories of programming language development tools, such as compilers, parsers, and so on.

This is the first in the series of blog posts exploring the various AST tooling built and used by the front-end developers at Freshworks to carry out a myriad of activities like, such as large-scale code migrations, refactorings, and others.

What is AST?

Your AST is basically the tree representation of your source code. This definition can be a little improved by coming up with a proper analogy of sorts. Let’s consider the AST as the Document Object Model (DOM) of your code. Generally, you manipulate the DOM to update the content on your web page. You normally add nodes and update and remove nodes in the DOM for the browser to automatically reflect your changes on the web page. Similarly, you update your AST nodes to make modifications in your code in an easy and effective way. And often when we end up with this kind of definition, we seem to understand much better and clearer, even though this is an over-simplification.

It is called an abstract syntax tree because it does not capture all the information from your code. For example, it does not store information about delimiters, punctuations, spaces, etc.

How is an AST created?

This is how your code gets transformed into an AST. As you can see here, we have different stages of processing before an AST is created. We have ‘Lexical Analysis’ and ‘Syntax Analysis’; we will see what they actually do.

AST one

First, your code goes through a processing stage, which is known as Lexical Analysis, and the tool that does this is called a Lexical Analyzer or Scanner. All it does is it takes a string of code and splits it into a list of tokens.

Let’s consider a small JavaScript program that prints “hello world” to the console through a function.

AST two

The next stage is known as Syntax Analysis, which is done by a tool called Syntax Analyzer. Your Syntax Analyzer is also called Parser. It takes the tokens generated by the Lexical Analyzer and builds them into the tree structure, which is eventually known as Abstract Syntax Trees.

AST 3

So now, the code that we have written is converted into something like the following screenshot. Since we are taking JavaScript as an example, the AST for that is represented in JSON notation, which will be easy for all the other tools in the JavaScript ecosystem to understand and process the AST in whatever way they want. It doesn’t necessarily have to be the JSON notation for all the other programming languages out there; it could be any default notation the respective programming language supports. For example, in Ruby, AST is represented in S-expressions, while in other languages, it could be represented in YML and so on. This basically depends on what kind of parser you use and what kind of output the parser emits. You can also use cross-language parsers to analyze your code, just like you can use a JavaScript parser to analyze Ruby code and vice versa.

ast

AST Explorer

There is an online tool to explore the AST generated by various language parsers called AST Explorer built by the engineers at Facebook. It supports a wide variety of languages and parsers. You can type in the code and get the AST format it represents and inspect the nodes and data within the same browser window. You can also write transform functions, which will help you to change or refactor code using AST. For anyone who wants to understand AST and create tooling with its help, this is the best tool out there to learn and work with AST.

AST

Why is AST important to tooling?

Abstract Syntax Trees are the building blocks of any language development tool, because they are the scalable means to manipulate your source code in an effective way. By representing your code into a tree-like structure, you get to leverage all the efficient manipulation mechanisms that come with the tree data structures. It’s way faster than manipulating your source code expressed as text fragments or string representations. It’s because working with plain strings and character sequences will be a nightmare if you want to do advanced and complex manipulations with your source code. If you want to have a convenient representation for making changes with your code without having unintended consequences and side effects, AST is the best representation. Once you convert your source code into an abstract syntax tree format, any type of modification is possible.

That’s why all the language standards come with an AST specification by default. The language itself has built-in APIs to create, convert and work with AST. Apart from that, there will always be external tools and libraries that make working with AST a convenient and easy task.

So where actually are we making use of the ASTs? Simply put, it is used everywhere in the Javascript ecosystem of tools and libraries. As you can see from this list, without ASTs, tooling for Javascript projects is not at all possible.

  • Syntax Highlighting
  • Code Completion
  • Static Analysis
  • Code Coverage
  • Minification
  • JIT Compilation
  • Source Maps
  • Compiling to JS languages
  • Code Refactoring
  • Code Migrations, and much more.

Codemods

At Freshworks, we first came across AST tooling like Codemods when we were trying to migrate our Ember code to newer versions. The Ember Framework and the community itself provide a lot of codemods to help make your migrations easier and less time-consuming. We have already written a blog post about how we make use of Codemods to migrate our codebases. You might want to check it out to get a refresher on codemods, including why they make sense, and why they should be a crucial part of your automated migration tooling. Even frameworks, such as React, Angular, and Vue recommend and provide codemods for their upgrades from moving from one version of the framework to the next.

Once we started using Codemods for our migrations, our developers found it very helpful and time-saving to migrate their code to the newer versions of the framework. And we needed more and more codemods to solve some of our migration issues. So we started to explore and write codemods that will specifically address our own code migration problems, but at the same time make the codemods more generic so that we can share them with a larger community of JavaScript developers and make them open source.

But some of our developers were not able to fully grasp the underlying architecture of the Codemods like AST and the related tools like Babel, recast, and jscodeshift. So we decided to make this easy for our developers to understand and make the learning less painful. We built a bunch of tools around AST to help developers get familiar with AST and codemods. This series of blog posts will capture the essence and significance of those tools by taking a deep dive into their internal architecture and components.

References: