babylon parser实践
最近项目需要将一种语法转换到另外一种语法,代码不是我写的,但本着学习的心态阅读了部分代码,主要学习了两个库的使用,一个是htmlparser2,一个就是babylon。
htmlparser2就不多说了,也比较简单,就是类似xml解析器,在解析的过程中会有一堆回调,然后在回调里面处理解析过来的节点就可以了。本文着重描述下babylon,之前都很少接触类似的解析器,正好趁此机会涨涨眼界。
什么是babylon
这个问题是必须要首先回答的。Babylon是Babel中用的JavaScript解析器。正好之前瞄了几眼编译原理,其实babylon干的活主要就是词法分析器和语法分析器,也就是编译器的前端。
好了,其实对babylon的解释就只能到这里了,再深入我也无能为力了,大家还是回家看看挤满灰尘的“龙书”吧。
什么是AST
如果你懂得什么是AST,那么你用babylon的时候就不会有什么障碍了。我就是属于不懂的,本文就是记录下到底babylon的AST有多少种节点,后面才能根据不同的节点做不同的事情啊。
AST的全称是Abstract Syntax Tree,中文名抽象语法树。我们应该都知道,不论我们用什么语言写代码,最终都会被编译器先搞成AST,以便检查我们的语法或者做些优化什么的。AST就是我们写的代码的树形结构,方便计算机处理的。借用下wikipedia上的图展示下AST长啥样, 。
AST能被计算机理解,所以也是比较容易让我们理解的,但把代码转换成AST是一件比较麻烦的事情,babylon就主要是帮我们做了这个转换工作,方便我们直接操作AST来达到修改源码的目的。
这里必须要推荐一个利器,AST Explorer。通过该工具,我们能非常方便的知道我们的代码会被转换成啥样的AST,然后我们就能有目的的去修改AST。 当然对于初学者的我,还有需要匹配一个网站才能把这个东西整明白JavaScript Reference。
举个例子
{item.a}
经过转换之后变成(当然,我处理过了,原来可比这长多了)
{
"type": "File",
"start": 0,
"end": 8,
"loc": [Object]
},
"program": {
"type": "Program",
"start": 0,
"end": 8,
"loc": [Object]
},
"sourceType": "module",
"body": [
{
"type": "BlockStatement",
"start": 0,
"end": 8,
"loc": [Object]
},
"body": [
{
"type": "ExpressionStatement",
"start": 1,
"end": 7,
"loc": [Object]
},
"expression": {
"type": "MemberExpression",
"start": 1,
"end": 7,
"loc": [Object]
},
"object": {
"type": "Identifier",
"start": 1,
"end": 5,
"loc": [Object]
},
"name": "item"
},
"property": {
"type": "Identifier",
"start": 6,
"end": 7,
"loc": [Object]
},
"name": "a"
},
"computed": false
}
}
]
}
]
},
"comments": [],
"tokens": [
{
"type": {
"label": "{",
"beforeExpr": true,
"startsExpr": true,
"rightAssociative": false,
"isLoop": false,
"isAssign": false,
"prefix": false,
"postfix": false,
"binop": null
},
"start": 0,
"end": 1,
"loc": [Object]
},
{
"type": {
"label": "name",
"beforeExpr": false,
"startsExpr": true,
"rightAssociative": false,
"isLoop": false,
"isAssign": false,
"prefix": false,
"postfix": false,
"binop": null,
"updateContext": null
},
"value": "item",
"start": 1,
"end": 5,
"loc": [Object]
},
{
"type": {
"label": ".",
"beforeExpr": false,
"startsExpr": false,
"rightAssociative": false,
"isLoop": false,
"isAssign": false,
"prefix": false,
"postfix": false,
"binop": null,
"updateContext": null
},
"start": 5,
"end": 6,
"loc": [Object]
},
{
"type": {
"label": "name",
"beforeExpr": false,
"startsExpr": true,
"rightAssociative": false,
"isLoop": false,
"isAssign": false,
"prefix": false,
"postfix": false,
"binop": null,
"updateContext": null
},
"value": "a",
"start": 6,
"end": 7,
"loc": [Object]
},
{
"type": {
"label": "}",
"beforeExpr": false,
"startsExpr": false,
"rightAssociative": false,
"isLoop": false,
"isAssign": false,
"prefix": false,
"postfix": false,
"binop": null
},
"start": 7,
"end": 8,
"loc": [Object]
},
{
"type": {
"label": "eof",
"beforeExpr": false,
"startsExpr": false,
"rightAssociative": false,
"isLoop": false,
"isAssign": false,
"prefix": false,
"postfix": false,
"binop": null,
"updateContext": null
},
"start": 8,
"end": 8,
"loc": [Object]
}
}
]
}
是不是感觉很夸张,就一行代码,结果整出了一篇论文的长度,这就是编程语言的魅力,以极致的简单和逻辑来简化编程的工作。
上面的结果里面有两个比较重要的部分,一个是program
,一个是tokens
,tokens
就是词法分析后得出的词法单元集合,program
就是AST,可以发现AST的每个结构都有type、start、end、loc等字段,后三者都比较好理解,及时该节点在源码中的位置嘛。type相对而言就比较复杂了,我的理解是,type的定义,决定了一门语言。也就是说JavaScript所有的语法应该都在type中有定义。
babylon解析之后的AST的节点类型是基于ESTree的,做了少许修改,完整的列表看这里Babylon AST node types。
Babylon AST Node Type
下面我们就主要看看不同的Node类型到底都长啥样,下面的例子忽略了所有的位置信息。
Identifier
标识符节点,一个简单的a变量,就是一个标识符。
// var a;
{
"type": "Identifier",
"name": "a"
}
Literals
RegexpLiteral
// /\w+/g
{
"type": "ExpressionStatement",
"expression": {
"type": "RegExpLiteral",
"extra": {
"raw": "/\\w+/g"
},
"pattern": "\\w+",
"flags": "g"
}
}
NullLiteral
// null
{
"type": "ExpressionStatement",
"expression": {
"type": "NullLiteral",
}
}
StringLiteral
// var a = "a"
{
"type": "StringLiteral",
"extra": {
"rawValue": "a",
"raw": "\"a\""
},
"value": "a"
}
BooleanLiteral
// true
{
"type": "ExpressionStatement",
"expression": {
"type": "BooleanLiteral",
"value": true
}
}
NumericLiteral
// 1
{
"type": "ExpressionStatement",
"expression": {
"type": "NumericLiteral",
"extra": {
"rawValue": 1,
"raw": "1"
},
"value": 1
}
}
Programs
{
"type": "Program",
"sourceType": "module",
"body": []
"directives": []
}
Functions
// function a() {
// return a;
// }
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "a"
},
"generator": false,
"expression": false,
"async": false,
"params": [],
"body": {}
}
Statements
ExpressionStatement
// var a
{
"type": "ExpressionStatement",
"expression": {}
}
BlockStatement
// { }
{
"type": "BlockStatement",
"body": [],
"directives": []
}
EmptyStatement
//;
{
"type": "EmptyStatement",
}
DebuggerStatement
// debugger;
{
"type": "DebuggerStatement",
}
WithStatement
// with (o) {}
{
"type": "WithStatement",
"object": {},
"body": {}
}
Control flow
ReturnStatement
// function a() {return a}
{
"type": "ReturnStatement",
"argument": {}
}
LabeledStatement
// loop1:
// a = 1;
{
"type": "LabeledStatement",
"body": {},
"label": {
"type": "Identifier",
"name": "loop1"
}
}
BreakStatement
// while(1){
// break;
// }
{
"type": "BreakStatement",
"label": null
}
ContinueStatement
// while(1){
// continue;
// }
{
"type": "ContinueStatement",
"label": null
}
Choice
IfStatement
// if(1){}
{
"type": "IfStatement",
"test": {},
"consequent": {},
"alternate": null
}
SwitchStatement
// switch(1) {
// case 1: break;
// }
{
"type": "SwitchStatement",
"discriminant": {},
"cases": []
}
SwitchCase
// switch(1) {
// case 1: break;
// }
{
"type": "SwitchCase",
"consequent": [],
"test": {}
}
Exceptions
ThrowStatement
// throw "myException";
{
"type": "ThrowStatement",
"argument": {}
}
TryStatement
// try {} catch (e) {}
{
"type": "TryStatement",
"block": {},
"handler": {},
"guardedHandlers": [],
"finalizer": null
}
CatchClause
// try {} catch (e) {}
{
"type": "CatchClause",
"param": {},
"body": {}
},
Loops
WhileStatement
// while(1) {}
{
"type": "WhileStatement",
"test": {},
"body": {}
}
DoWhileStatement
// do{} while(1)
{
"type": "DoWhileStatement",
"body": {},
"test": {}
}
ForStatement
{
"type": "ForStatement",
"init": null,
"test": null,
"update": null,
"body": {}
}
ForInStatement
// var obj = {a:1, b:2, c:3};
// for (var prop in obj) {
// }
{
"type": "ForInStatement",
"left": {},
"right": {},
"body": {}
}
ForOfStatement
// var obj = {a:1, b:2, c:3};
// for (var prop of obj) {
// }
{
"type": "ForOfStatement",
"left": {},
"right": {},
"body": {}
}
ForAwaitStatement
Declarations
FunctionDeclaration
// function a() {}
{
"type": "FunctionDeclaration",
"id": {},
"generator": false,
"expression": false,
"async": false,
"params": [],
"body": {}
}
VariableDeclaration
// var a;
{
"type": "VariableDeclaration",
"declarations": [],
"kind": "var"
}
VariableDeclarator
// var a;
{
"type": "VariableDeclarator",
"id": {},
"init": null
}
Misc
Decorator
Directive
// a
{
"type": "Directive",
"value": {}
}
DirectiveLiteral
// a
{
"type": "DirectiveLiteral",
"value": "a",
"extra": {
"raw": "\"a\"",
"rawValue": "a"
}
}
Expressions
Super
// class Base {
// constructor() {
// }
// }
// class Derivative extends Base {
// constructor() {
// super();
// }
// }
{
"type": "Super"
}
Import
ThisExpression
// class Base {
// constructor() {
// }
// }
// class Derivative extends Base {
// constructor() {
// super();
// }
// a() {
// this.a = 1;
// }
// }
{
"type": "ThisExpression",
}
ArrowFunctionExpression
// a => {return a;}
{
"type": "ArrowFunctionExpression",
"id": null,
"generator": false,
"expression": false,
"async": false,
"params": [],
"body": {}
}
YieldExpression
// function* foo(){
// var index = 0;
// while (index <= 2)
// yield index++;
// }
{
"type": "YieldExpression",
"delegate": false,
"argument": {}
}
AwaitExpression
// async function f2() {
// var y = await 20;
// console.log(y); // 20
// }
// f2();
{
"type": "AwaitExpression",
"argument": {}
}
ArrayExpression
// a = [];
{
"type": "ArrayExpression",
"elements": []
}
ObjectExpression
// a = {};
{
"type": "ObjectExpression",
"properties": []
}
ObjectMember
ObjectProperty
// a = {a:"1", b:"2"};
{
"type": "ObjectProperty",
"method": false,
"shorthand": false,
"computed": false,
"key": {},
"value": {}
}
ObjectMethod
// a = {
// a(){}
// };
{
"type": "ObjectMethod",
"method": true,
"shorthand": false,
"computed": false,
"key": {},
"kind": "method",
"id": null,
"generator": false,
"expression": false,
"async": false,
"params": [],
"body": {}
}
RestProperty
SpreadProperty
// a = {
// ...b
// }
{
"type": "SpreadProperty",
"argument": {}
}
FunctionExpression
// var a = function() {};
{
"type": "FunctionExpression",
"id": null,
"generator": false,
"expression": false,
"async": false,
"params": [],
"body": {}
}
Unary operations
UnaryExpression
// a = ~1
{
"type": "UnaryExpression",
"operator": "~",
"prefix": true,
"argument": {},
"extra": {}
}
UnaryOperator
UpdateExpression
// a ++
{
"type": "UpdateExpression",
"operator": "++",
"prefix": false,
"argument": {}
}
UpdateOperator
Binary operations
BinaryExpression
// var a = a + 1;
{
"type": "BinaryExpression",
"left": {},
"operator": "+",
"right": {}
}
}
BinaryOperator
AssignmentExpression
// a = 1;
{
"type": "AssignmentExpression",
"operator": "=",
"left": {},
"right": {}
}
AssignmentOperator
LogicalExpression
// a || 1;
{
"type": "LogicalExpression",
"left": {},
"operator": "||",
"right": {}
}
LogicalOperator
SpreadElement
// function myFunction(x, y, z) { }
// var args = [0, 1, 2];
// myFunction(...args);
{
"type": "SpreadElement",
"argument": {}
}
MemberExpression
// a.b
{
"type": "MemberExpression",
"object": {},
"property": {},
"computed": false
}
BindExpression
ConditionalExpression
// 1 ? 'a' : 'b';
{
"type": "ConditionalExpression",
"test": {},
"consequent": {},
"alternate": {}
}
CallExpression
// call()
{
"type": "CallExpression",
"callee": {},
"arguments": []
}
NewExpression
// new Obejct();
{
"type": "NewExpression",
"callee": {},
"arguments": []
}
SequenceExpression
Template Literals
TemplateLiteral
// `a`
{
"type": "TemplateLiteral",
"expressions": [],
"quasis": []
}
TaggedTemplateExpression
// tag `a`
{
"type": "TaggedTemplateExpression",
"tag": {},
"quasi": {}
}
TemplateElement
// `a`
{
"type": "TemplateElement",
"value": {},
"tail": true
}
Patterns
ObjectPattern
ArrayPattern
RestElement
// function f(a, b, ...args) {}
{
"type": "RestElement",
"argument": {}
}
AssignmentPattern
Classes
ClassBody
// class a {}
{
"type": "ClassBody",
"body": []
}
ClassMethod
// class a {
// b(){}
// }
{
"type": "ClassMethod",
"computed": false,
"key": {},
"static": false,
"kind": "method",
"id": null,
"generator": false,
"expression": false,
"async": false,
"params": [],
"body": {}
}
ClassProperty
// class a {
// b: "a"
// }
{
"type": "ClassProperty",
"computed": false,
"key": {},
"variance": null,
"static": false,
"typeAnnotation": {},
"value": null
}
ClassDeclaration
// class a {
// b: "a"
// }
{
"type": "ClassDeclaration",
"id": {},
"superClass": null,
"body": {}
}
ClassExpression
// class a {
// b: "a"
// }
{
"type": "ClassExpression",
"id": {},
"superClass": null,
"body": {}
}
MetaProperty
Modules
ModuleDeclaration
ModuleSpecifier
Imports
ImportDeclaration
// import { cube, foo } from 'my-module';
{
"type": "ImportDeclaration",
"specifiers": [],
"importKind": "value",
"source": {}
}
ImportSpecifier
// import { cube, foo } from 'my-module';
{
"type": "ImportSpecifier",
"imported": {},
"local": {}
}
ImportDefaultSpecifier
// import myDefault from "my-module";
{
"type": "ImportDefaultSpecifier",
"local": {}
}
ImportNamespaceSpecifier
// import * as myModule from "my-module";
{
"type": "ImportNamespaceSpecifier",
"local": {}
}
Exports
ExportNamedDeclaration
// export { myFunction }
{
"type": "ExportNamedDeclaration",
"declaration": null,
"specifiers": [],
"source": null,
"exportKind": "value"
}
ExportSpecifier
// export { myFunction }
{
"type": "ExportSpecifier",
"local": {},
"exported": {}
}
ExportDefaultDeclaration
// export default { myFunction };
{
"type": "ExportDefaultDeclaration",
"declaration": {}
}
ExportAllDeclaration
// export * from "my-module";
{
"type": "ExportAllDeclaration",
"source": {}
}