最近项目需要将一种语法转换到另外一种语法,代码不是我写的,但本着学习的心态阅读了部分代码,主要学习了两个库的使用,一个是htmlparser2,一个就是babylon

htmlparser2就不多说了,也比较简单,就是类似xml解析器,在解析的过程中会有一堆回调,然后在回调里面处理解析过来的节点就可以了。本文着重描述下babylon,之前都很少接触类似的解析器,正好趁此机会涨涨眼界。

什么是babylon

这个问题是必须要首先回答的。Babylon是Babel中用的JavaScript解析器。正好之前瞄了几眼编译原理,其实babylon干的活主要就是词法分析器和语法分析器,也就是编译器的前端。

好了,其实对babylon的解释就只能到这里了,再深入我也无能为力了,大家还是回家看看挤满灰尘的“龙书”吧。

什么是AST

如果你懂得什么是AST,那么你用babylon的时候就不会有什么障碍了。我就是属于不懂的,本文就是记录下到底babylon的AST有多少种节点,后面才能根据不同的节点做不同的事情啊。

AST的全称是Abstract Syntax Tree,中文名抽象语法树。我们应该都知道,不论我们用什么语言写代码,最终都会被编译器先搞成AST,以便检查我们的语法或者做些优化什么的。AST就是我们写的代码的树形结构,方便计算机处理的。借用下wikipedia上的图展示下AST长啥样, AST

AST能被计算机理解,所以也是比较容易让我们理解的,但把代码转换成AST是一件比较麻烦的事情,babylon就主要是帮我们做了这个转换工作,方便我们直接操作AST来达到修改源码的目的。

这里必须要推荐一个利器,AST Explorer。通过该工具,我们能非常方便的知道我们的代码会被转换成啥样的AST,然后我们就能有目的的去修改AST。 当然对于初学者的我,还有需要匹配一个网站才能把这个东西整明白JavaScript Reference

举个例子

{item.a}

经过转换之后变成(当然,我处理过了,原来可比这长多了)

{
  "type": "File",
  "start": 0,
  "end": 8,
  "loc": [Object]
  },
  "program": {
    "type": "Program",
    "start": 0,
    "end": 8,
    "loc": [Object]
    },
    "sourceType": "module",
    "body": [
      {
        "type": "BlockStatement",
        "start": 0,
        "end": 8,
        "loc": [Object]
        },
        "body": [
          {
            "type": "ExpressionStatement",
            "start": 1,
            "end": 7,
            "loc": [Object]
            },
            "expression": {
              "type": "MemberExpression",
              "start": 1,
              "end": 7,
              "loc": [Object]
              },
              "object": {
                "type": "Identifier",
                "start": 1,
                "end": 5,
                "loc": [Object]
                },
                "name": "item"
              },
              "property": {
                "type": "Identifier",
                "start": 6,
                "end": 7,
                "loc": [Object]
                },
                "name": "a"
              },
              "computed": false
            }
          }
        ]
      }
    ]
  },
  "comments": [],
  "tokens": [
    {
      "type": {
        "label": "{",
        "beforeExpr": true,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null
      },
      "start": 0,
      "end": 1,
      "loc": [Object]
    },
    {
      "type": {
        "label": "name",
        "beforeExpr": false,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": "item",
      "start": 1,
      "end": 5,
      "loc": [Object]
    },
    {
      "type": {
        "label": ".",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "start": 5,
      "end": 6,
      "loc": [Object]
    },
    {
      "type": {
        "label": "name",
        "beforeExpr": false,
        "startsExpr": true,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "value": "a",
      "start": 6,
      "end": 7,
      "loc": [Object]
    },
    {
      "type": {
        "label": "}",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null
      },
      "start": 7,
      "end": 8,
      "loc": [Object]
    },
    {
      "type": {
        "label": "eof",
        "beforeExpr": false,
        "startsExpr": false,
        "rightAssociative": false,
        "isLoop": false,
        "isAssign": false,
        "prefix": false,
        "postfix": false,
        "binop": null,
        "updateContext": null
      },
      "start": 8,
      "end": 8,
      "loc": [Object]
      }
    }
  ]
}

是不是感觉很夸张,就一行代码,结果整出了一篇论文的长度,这就是编程语言的魅力,以极致的简单和逻辑来简化编程的工作。

上面的结果里面有两个比较重要的部分,一个是program,一个是tokenstokens就是词法分析后得出的词法单元集合,program就是AST,可以发现AST的每个结构都有type、start、end、loc等字段,后三者都比较好理解,及时该节点在源码中的位置嘛。type相对而言就比较复杂了,我的理解是,type的定义,决定了一门语言。也就是说JavaScript所有的语法应该都在type中有定义。

babylon解析之后的AST的节点类型是基于ESTree的,做了少许修改,完整的列表看这里Babylon AST node types

Babylon AST Node Type

下面我们就主要看看不同的Node类型到底都长啥样,下面的例子忽略了所有的位置信息。

Identifier

标识符节点,一个简单的a变量,就是一个标识符。

// var a;

{
  "type": "Identifier",
  "name": "a"
}

Literals

RegexpLiteral

// /\w+/g

{
  "type": "ExpressionStatement",
  "expression": {
    "type": "RegExpLiteral",
    "extra": {
      "raw": "/\\w+/g"
    },
    "pattern": "\\w+",
    "flags": "g"
  }
}

NullLiteral

// null

{
  "type": "ExpressionStatement",
  "expression": {
    "type": "NullLiteral",
  }
}

StringLiteral

// var a = "a"

{
  "type": "StringLiteral",
  "extra": {
    "rawValue": "a",
    "raw": "\"a\""
  },
  "value": "a"
}

BooleanLiteral

// true

{
  "type": "ExpressionStatement",
  "expression": {
    "type": "BooleanLiteral",
    "value": true
  }
}

NumericLiteral

// 1

{
  "type": "ExpressionStatement",
  "expression": {
    "type": "NumericLiteral",
    "extra": {
      "rawValue": 1,
      "raw": "1"
    },
    "value": 1
  }
}

Programs

{
  "type": "Program",
  "sourceType": "module",
  "body": []
  "directives": []
}

Functions

// function a() {
//   return a;
// }

{
  "type": "FunctionDeclaration",
  "id": {
    "type": "Identifier",
    "name": "a"
  },
  "generator": false,
  "expression": false,
  "async": false,
  "params": [],
  "body": {}
}

Statements

ExpressionStatement

// var a

{
  "type": "ExpressionStatement",
  "expression": {}
}

BlockStatement

// { }

{
  "type": "BlockStatement",
  "body": [],
  "directives": []
}

EmptyStatement

//;

{
  "type": "EmptyStatement",
}

DebuggerStatement

// debugger;

{
  "type": "DebuggerStatement",
}

WithStatement

// with (o) {}

{
  "type": "WithStatement",
  "object": {},
  "body": {}
}

Control flow

ReturnStatement

// function a() {return a}

{
  "type": "ReturnStatement",
  "argument": {}
}

LabeledStatement

// loop1:
//   a = 1;

{
  "type": "LabeledStatement",
  "body": {},
  "label": {
    "type": "Identifier",
    "name": "loop1"
  }
}

BreakStatement

// while(1){
//   break;
// }

{
  "type": "BreakStatement",
  "label": null
}

ContinueStatement

// while(1){
//   continue;
// }

{
  "type": "ContinueStatement",
  "label": null
}

Choice

IfStatement

// if(1){}

{
  "type": "IfStatement",
  "test": {},
  "consequent": {},
  "alternate": null
}

SwitchStatement

// switch(1) {
//  case 1: break;
// }


{
  "type": "SwitchStatement",
  "discriminant": {},
  "cases": []
}

SwitchCase

// switch(1) {
//  case 1: break;
// }

{
  "type": "SwitchCase",
  "consequent": [],
  "test": {}
}

Exceptions

ThrowStatement

// throw "myException";

{
  "type": "ThrowStatement",
  "argument": {}
}

TryStatement

// try {} catch (e) {}

{
  "type": "TryStatement",
  "block": {},
  "handler": {},
  "guardedHandlers": [],
  "finalizer": null
}
CatchClause
// try {} catch (e) {}

{
  "type": "CatchClause",
  "param": {},
  "body": {}
},

Loops

WhileStatement

// while(1) {}

{
  "type": "WhileStatement",
  "test": {},
  "body": {}
}

DoWhileStatement

// do{} while(1)

{
  "type": "DoWhileStatement",
  "body": {},
  "test": {}
}

ForStatement

{
  "type": "ForStatement",
  "init": null,
  "test": null,
  "update": null,
  "body": {}
}

ForInStatement

// var obj = {a:1, b:2, c:3};
// for (var prop in obj) {
// }

{
  "type": "ForInStatement",
  "left": {},
  "right": {},
  "body": {}
}

ForOfStatement

// var obj = {a:1, b:2, c:3};
// for (var prop of obj) {
// }

{
  "type": "ForOfStatement",
  "left": {},
  "right": {},
  "body": {}
}

ForAwaitStatement

Declarations

FunctionDeclaration

// function a() {}

{
  "type": "FunctionDeclaration",
  "id": {},
  "generator": false,
  "expression": false,
  "async": false,
  "params": [],
  "body": {}
}

VariableDeclaration

// var a;

{
  "type": "VariableDeclaration",
  "declarations": [],
  "kind": "var"
}

VariableDeclarator

// var a;

{
  "type": "VariableDeclarator",
  "id": {},
  "init": null
}

Misc

Decorator


Directive

// a

{
  "type": "Directive",
  "value": {}
}

DirectiveLiteral

// a
{
  "type": "DirectiveLiteral",
  "value": "a",
  "extra": {
    "raw": "\"a\"",
    "rawValue": "a"
  }
}

Expressions

Super

//  class Base {
//   constructor() {
//   }
// }

// class Derivative extends Base {
//   constructor() {
//     super();
//   }
// }

{
  "type": "Super"
}

Import

ThisExpression

// class Base {
//   constructor() {
//   }
// }

// class Derivative extends Base {
//   constructor() {
//     super();
//   }
  
//   a() {
//     this.a = 1;
//   }
// }

{
  "type": "ThisExpression",
}

ArrowFunctionExpression

// a => {return a;}

{
  "type": "ArrowFunctionExpression",
  "id": null,
  "generator": false,
  "expression": false,
  "async": false,
  "params": [],
  "body": {}
}

YieldExpression

// function* foo(){
//   var index = 0;
//   while (index <= 2)
//     yield index++;
// }

{
  "type": "YieldExpression",
  "delegate": false,
  "argument": {}
}

AwaitExpression

// async function f2() {
//   var y = await 20;
//   console.log(y); // 20
// }
// f2();

{
  "type": "AwaitExpression",
  "argument": {}
}

ArrayExpression

// a = [];

{
  "type": "ArrayExpression",
  "elements": []
}

ObjectExpression

// a = {};

{
  "type": "ObjectExpression",
  "properties": []
}

ObjectMember

ObjectProperty
// a = {a:"1", b:"2"};
{
  "type": "ObjectProperty",
  "method": false,
  "shorthand": false,
  "computed": false,
  "key": {},
  "value": {}
}
ObjectMethod
// a = {
//   a(){}
// };

{
  "type": "ObjectMethod",
  "method": true,
  "shorthand": false,
  "computed": false,
  "key": {},
  "kind": "method",
  "id": null,
  "generator": false,
  "expression": false,
  "async": false,
  "params": [],
  "body": {}
}

RestProperty


SpreadProperty

// a = {
//   ...b
// }

{
  "type": "SpreadProperty",
  "argument": {}
}

FunctionExpression

// var a = function() {};

{
  "type": "FunctionExpression",
  "id": null,
  "generator": false,
  "expression": false,
  "async": false,
  "params": [],
  "body": {}
}

Unary operations

UnaryExpression

// a = ~1

{
  "type": "UnaryExpression",
  "operator": "~",
  "prefix": true,
  "argument": {},
  "extra": {}
}
UnaryOperator

UpdateExpression

// a ++

{
  "type": "UpdateExpression",
  "operator": "++",
  "prefix": false,
  "argument": {}
}
UpdateOperator

Binary operations

BinaryExpression

// var a = a + 1;

{
  "type": "BinaryExpression",
  "left": {},
  "operator": "+",
  "right": {}
  }
}
BinaryOperator

AssignmentExpression

// a = 1;

{
  "type": "AssignmentExpression",
  "operator": "=",
  "left": {},
  "right": {}
}
AssignmentOperator

LogicalExpression

// a || 1;

{
  "type": "LogicalExpression",
  "left": {},
  "operator": "||",
  "right": {}
}
LogicalOperator

SpreadElement

// function myFunction(x, y, z) { }
// var args = [0, 1, 2];
// myFunction(...args);

{
  "type": "SpreadElement",
  "argument": {}
}

MemberExpression

// a.b

{
  "type": "MemberExpression",
  "object": {},
  "property": {},
  "computed": false
}

BindExpression

ConditionalExpression

// 1 ? 'a' : 'b';

{
  "type": "ConditionalExpression",
  "test": {},
  "consequent": {},
  "alternate": {}
}

CallExpression

// call()

{
  "type": "CallExpression",
  "callee": {},
  "arguments": []
}

NewExpression

// new Obejct();

{
  "type": "NewExpression",
  "callee": {},
  "arguments": []
}

SequenceExpression

Template Literals

TemplateLiteral

// `a`

{
  "type": "TemplateLiteral",
  "expressions": [],
  "quasis": []
}

TaggedTemplateExpression

// tag `a`

{
  "type": "TaggedTemplateExpression",
  "tag": {},
  "quasi": {}
}

TemplateElement

// `a`

{
  "type": "TemplateElement",
  "value": {},
  "tail": true
}

Patterns

ObjectPattern

ArrayPattern

RestElement

// function f(a, b, ...args) {}

{
  "type": "RestElement",
  "argument": {}
}

AssignmentPattern

Classes

ClassBody

// class a {}

{
  "type": "ClassBody",
  "body": []
}

ClassMethod

// class a {
//   b(){}
// }

{
  "type": "ClassMethod",
  "computed": false,
  "key": {},
  "static": false,
  "kind": "method",
  "id": null,
  "generator": false,
  "expression": false,
  "async": false,
  "params": [],
  "body": {}
}

ClassProperty

// class a {
//   b: "a"
// }

{
  "type": "ClassProperty",
  "computed": false,
  "key": {},
  "variance": null,
  "static": false,
  "typeAnnotation": {},
  "value": null
}

ClassDeclaration

// class a {
//   b: "a"
// }


{
  "type": "ClassDeclaration",
  "id": {},
  "superClass": null,
  "body": {}
}

ClassExpression

// class a {
//   b: "a"
// }

{
  "type": "ClassExpression",
  "id": {},
  "superClass": null,
  "body": {}
}

MetaProperty

Modules

ModuleDeclaration

ModuleSpecifier

Imports

ImportDeclaration

// import { cube, foo } from 'my-module';

{
  "type": "ImportDeclaration",
  "specifiers": [],
  "importKind": "value",
  "source": {}
}

ImportSpecifier

// import { cube, foo } from 'my-module';

{
  "type": "ImportSpecifier",
  "imported": {},
  "local": {}
}

ImportDefaultSpecifier

// import myDefault from "my-module";

{
  "type": "ImportDefaultSpecifier",
  "local": {}
}

ImportNamespaceSpecifier

// import * as myModule from "my-module";

{
  "type": "ImportNamespaceSpecifier",
  "local": {}
}

Exports

ExportNamedDeclaration

// export { myFunction }

{
  "type": "ExportNamedDeclaration",
  "declaration": null,
  "specifiers": [],
  "source": null,
  "exportKind": "value"
}

ExportSpecifier

// export { myFunction }

{
  "type": "ExportSpecifier",
  "local": {},
  "exported": {}
}

ExportDefaultDeclaration

// export default { myFunction };

{
  "type": "ExportDefaultDeclaration",
  "declaration": {}
}

ExportAllDeclaration

// export * from "my-module";

{
  "type": "ExportAllDeclaration",
  "source": {}
}