v8世界探险(2) - 词法和语法分析

本文涉及的产品
全局流量管理 GTM,标准版 1个月
云解析 DNS,旗舰版 1个月
公共DNS(含HTTPDNS解析),每月1000万次HTTP解析
简介: 上节我们学习了API的概况,这节开始我们就循着API来分析实现。 对于解释器或者编译器来说,我们第一个感兴趣的当然是编译的过程。

v8世界探险(2) - 词法和语法分析

上节我们学习了API的概况,这节开始我们就循着API来分析实现。
对于解释器或者编译器来说,我们第一个感兴趣的当然是编译的过程。
上节我们学习过了,编译调用的API是Script::Compile函数:

    // Compile the source code.
    Local<Script> script = Script::Compile(context, source).ToLocalChecked();

Script::Compile

API的实现,大部分都位于src/api.cc中,比如Script::Compile就是如此。

如果指定了ScriptOrigin对象,就用它构造ScriptCompiler::Source对象,否则就用String指定的。
不管哪一支,最后都调用ScriptCompiler::Compile函数去做编译。

MaybeLocal<Script> Script::Compile(Local<Context> context, Local<String> source,
                                   ScriptOrigin* origin) {
  if (origin) {
    ScriptCompiler::Source script_source(source, *origin);
    return ScriptCompiler::Compile(context, &script_source);
  }
  ScriptCompiler::Source script_source(source);
  return ScriptCompiler::Compile(context, &script_source);
}

ScriptCompiler::Compile

ScriptCompiler::Compile函数仍然在api.cc中。
我们之前讲过,没有绑定到Context的编译脚本叫做UnboundScript,ScriptCompiler::Compile首先调用CompileUnboundInternal来编译生成一个UnboundScript,最后再将其BindToCurrentContext()跟上下文绑定。

MaybeLocal<Script> ScriptCompiler::Compile(Local<Context> context,
                                           Source* source,
                                           CompileOptions options) {
  auto isolate = context->GetIsolate();
  auto maybe = CompileUnboundInternal(isolate, source, options, false);
  Local<UnboundScript> result;
  if (!maybe.ToLocal(&result)) return MaybeLocal<Script>();
  v8::Context::Scope scope(context);
  return result->BindToCurrentContext();
}

ScriptCompiler::CompileUnboundInternal

最主要的会调用Isolate的Compiler的CompileScript。

MaybeLocal<UnboundScript> ScriptCompiler::CompileUnboundInternal(
    Isolate* v8_isolate, Source* source, CompileOptions options,
    bool is_module) {
...
    result = i::Compiler::CompileScript(
        str, name_obj, line_offset, column_offset, source->resource_options,
        source_map_url, isolate->native_context(), NULL, &script_data, options,
        i::NOT_NATIVES_CODE, is_module);
    has_pending_exception = result.is_null();
    if (has_pending_exception && script_data != NULL) {
      // This case won't happen during normal operation; we have compiled
      // successfully and produced cached data, and but the second compilation
      // of the same source code fails.
      delete script_data;
      script_data = NULL;
    }
    RETURN_ON_FAILED_EXECUTION(UnboundScript);

    if ((options == kProduceParserCache || options == kProduceCodeCache) &&
        script_data != NULL) {
      // script_data now contains the data that was generated. source will
      // take the ownership.
      source->cached_data = new CachedData(
          script_data->data(), script_data->length(), CachedData::BufferOwned);
      script_data->ReleaseDataOwnership();
    } else if (options == kConsumeParserCache || options == kConsumeCodeCache) {
      source->cached_data->rejected = script_data->rejected();
    }
    delete script_data;
  }
  RETURN_ESCAPED(ToApiHandle<UnboundScript>(result));
}

Compiler::CompileScript

这个函数定义在src/compiler.cc中,我们先暂时略过细节,编译会调用到CompileTolevel函数。

 Handle<SharedFunctionInfo> Compiler::CompileScript(
     Handle<String> source, Handle<Object> script_name, int line_offset,
     int column_offset, ScriptOriginOptions resource_options,
     Handle<Object> source_map_url, Handle<Context> context,
     v8::Extension* extension, ScriptData** cached_data,
     ScriptCompiler::CompileOptions compile_options, NativesFlag natives,
     bool is_module) {
...
        static_cast<LanguageMode>(info.language_mode() | language_mode));
    result = CompileToplevel(&info);
    if (extension == NULL && !result.is_null()) {
      compilation_cache->PutScript(source, context, language_mode, result);
      if (FLAG_serialize_toplevel &&
          compile_options == ScriptCompiler::kProduceCodeCache) {
        HistogramTimerScope histogram_timer(
            isolate->counters()->compile_serialize());
        *cached_data = CodeSerializer::Serialize(isolate, result, source);
        if (FLAG_profile_deserialization) {
          PrintF("[Compiling and serializing took %0.3f ms]\n",
                 timer.Elapsed().InMillisecondsF());
        }
      }
    }
...

CompileToplevel

这是个static函数,定义在compiler.cc中。
这中间主要经过两个步骤:

  • Parser::ParseStatic - 词法分析,语法分析生成抽象语法树
  • CompileBaselineCode - 代码生成

我们先看前半部分:


static Handle<SharedFunctionInfo> CompileToplevel(CompilationInfo* info) {
  Isolate* isolate = info->isolate();
  PostponeInterruptsScope postpone(isolate);
  DCHECK(!isolate->native_context().is_null());
  ParseInfo* parse_info = info->parse_info();
  Handle<Script> script = parse_info->script();

...

  isolate->debug()->OnBeforeCompile(script);

...

  Handle<SharedFunctionInfo> result;

  { VMState<COMPILER> state(info->isolate());
    if (parse_info->literal() == NULL) {
      // Parse the script if needed (if it's already parsed, literal() is
      // non-NULL). If compiling for debugging, we may eagerly compile inner
      // functions, so do not parse lazily in that case.
      ScriptCompiler::CompileOptions options = parse_info->compile_options();
      bool parse_allow_lazy = (options == ScriptCompiler::kConsumeParserCache ||
                               String::cast(script->source())->length() >
                                   FLAG_min_preparse_length) &&
                              !info->is_debug();

      parse_info->set_allow_lazy_parsing(parse_allow_lazy);
      if (!parse_allow_lazy &&
          (options == ScriptCompiler::kProduceParserCache ||
           options == ScriptCompiler::kConsumeParserCache)) {
        // We are going to parse eagerly, but we either 1) have cached data
        // produced by lazy parsing or 2) are asked to generate cached data.
        // Eager parsing cannot benefit from cached data, and producing cached
        // data while parsing eagerly is not implemented.
        parse_info->set_cached_data(nullptr);
        parse_info->set_compile_options(ScriptCompiler::kNoCompileOptions);
      }
      if (!Parser::ParseStatic(parse_info)) {
        return Handle<SharedFunctionInfo>::null();
      }
    }

后半部分是编译的部分,调用CompileBaselineCode.


    DCHECK(!info->is_debug() || !parse_info->allow_lazy_parsing());

    info->MarkAsFirstCompile();

    FunctionLiteral* lit = parse_info->literal();
    LiveEditFunctionTracker live_edit_tracker(isolate, lit);

...

    // Compile the code.
    if (!CompileBaselineCode(info)) {
      return Handle<SharedFunctionInfo>::null();
    }

...

  return result;
}

Parser::ParseStatic

下面我们开始进入Parser的世界,入口在Parser::ParseStatic这个静态工厂函数。它定义在src/parsing/parser.cc中:

ParseStatic会构造一个Parser对象,然后调用parser的Parse函数去做解析。

bool Parser::ParseStatic(ParseInfo* info) {
  Parser parser(info);
  if (parser.Parse(info)) {
    info->set_language_mode(info->literal()->language_mode());
    return true;
  }
  return false;
}

Parser::Parse

Parse开始解析脚本源代码,有两种情况,分别是:

  • ParseLazy
  • ParseProgram
bool Parser::Parse(ParseInfo* info) {
  DCHECK(info->literal() == NULL);
  FunctionLiteral* result = NULL;
  // Ok to use Isolate here; this function is only called in the main thread.
  DCHECK(parsing_on_main_thread_);
  Isolate* isolate = info->isolate();
  pre_parse_timer_ = isolate->counters()->pre_parse();
  if (FLAG_trace_parse || allow_natives() || extension_ != NULL) {
    // If intrinsics are allowed, the Parser cannot operate independent of the
    // V8 heap because of Runtime. Tell the string table to internalize strings
    // and values right after they're created.
    ast_value_factory()->Internalize(isolate);
  }

  if (info->is_lazy()) {
    DCHECK(!info->is_eval());
    if (info->shared_info()->is_function()) {
      result = ParseLazy(isolate, info);
    } else {
      result = ParseProgram(isolate, info);
    }
  } else {
    SetCachedData(info);
    result = ParseProgram(isolate, info);
  }
  info->set_literal(result);

  Internalize(isolate, info->script(), result == NULL);
  DCHECK(ast_value_factory()->IsInternalized());
  return (result != NULL);
}

Parser::ParseProgram

我们先看Parser::ParseProgram,主要干活的会调用Parser::DoParseProgram.

FunctionLiteral* Parser::ParseProgram(Isolate* isolate, ParseInfo* info) {
  // TODO(bmeurer): We temporarily need to pass allow_nesting = true here,
  // see comment for HistogramTimerScope class.

  // It's OK to use the Isolate & counters here, since this function is only
  // called in the main thread.
  DCHECK(parsing_on_main_thread_);

  HistogramTimerScope timer_scope(isolate->counters()->parse(), true);
  Handle<String> source(String::cast(info->script()->source()));
  isolate->counters()->total_parse_size()->Increment(source->length());
  base::ElapsedTimer timer;
  if (FLAG_trace_parse) {
    timer.Start();
  }
  fni_ = new (zone()) FuncNameInferrer(ast_value_factory(), zone());

  // Initialize parser state.
  CompleteParserRecorder recorder;

  if (produce_cached_parse_data()) {
    log_ = &recorder;
  } else if (consume_cached_parse_data()) {
    cached_parse_data_->Initialize();
  }

  source = String::Flatten(source);
  FunctionLiteral* result;

  if (source->IsExternalTwoByteString()) {
    // Notice that the stream is destroyed at the end of the branch block.
    // The last line of the blocks can't be moved outside, even though they're
    // identical calls.
    ExternalTwoByteStringUtf16CharacterStream stream(
        Handle<ExternalTwoByteString>::cast(source), 0, source->length());
    scanner_.Initialize(&stream);
    result = DoParseProgram(info);
  } else {
    GenericStringUtf16CharacterStream stream(source, 0, source->length());
    scanner_.Initialize(&stream);
    result = DoParseProgram(info);
  }
  if (result != NULL) {
    DCHECK_EQ(scanner_.peek_location().beg_pos, source->length());
  }
  HandleSourceURLComments(isolate, info->script());

  if (FLAG_trace_parse && result != NULL) {
    double ms = timer.Elapsed().InMillisecondsF();
    if (info->is_eval()) {
      PrintF("[parsing eval");
    } else if (info->script()->name()->IsString()) {
      String* name = String::cast(info->script()->name());
      base::SmartArrayPointer<char> name_chars = name->ToCString();
      PrintF("[parsing script: %s", name_chars.get());
    } else {
      PrintF("[parsing script");
    }
    PrintF(" - took %0.3f ms]\n", ms);
  }
  if (produce_cached_parse_data()) {
    if (result != NULL) *info->cached_data() = recorder.GetScriptData();
    log_ = NULL;
  }
  return result;
}

Parser::DoParseProgram

下面的代码虽然多,但是我们现在只主要关注两个函数就好了:

    if (info->is_module()) {
      ParseModuleItemList(body, &ok);
    } else {
      ParseStatementList(body, Token::EOS, &ok);
    }
  • ParseModuleItemList: 解析ES6支持的module语句的列表
  • ParseStatementList: 解析普通的语句
FunctionLiteral* Parser::DoParseProgram(ParseInfo* info) {
...

  Mode parsing_mode = FLAG_lazy && allow_lazy() ? PARSE_LAZILY : PARSE_EAGERLY;
  if (allow_natives() || extension_ != NULL) parsing_mode = PARSE_EAGERLY;

  FunctionLiteral* result = NULL;
  {
    // TODO(wingo): Add an outer SCRIPT_SCOPE corresponding to the native
    // context, which will have the "this" binding for script scopes.
    Scope* scope = NewScope(scope_, SCRIPT_SCOPE);
    info->set_script_scope(scope);
    if (!info->context().is_null() && !info->context()->IsNativeContext()) {
      scope = Scope::DeserializeScopeChain(info->isolate(), zone(),
                                           *info->context(), scope);
      // The Scope is backed up by ScopeInfo (which is in the V8 heap); this
      // means the Parser cannot operate independent of the V8 heap. Tell the
      // string table to internalize strings and values right after they're
      // created. This kind of parsing can only be done in the main thread.
      DCHECK(parsing_on_main_thread_);
      ast_value_factory()->Internalize(info->isolate());
    }
    original_scope_ = scope;
    if (info->is_eval()) {
      if (!scope->is_script_scope() || is_strict(info->language_mode())) {
        parsing_mode = PARSE_EAGERLY;
      }
      scope = NewScope(scope, EVAL_SCOPE);
    } else if (info->is_module()) {
      scope = NewScope(scope, MODULE_SCOPE);
    }

    scope->set_start_position(0);

    // Enter 'scope' with the given parsing mode.
    ParsingModeScope parsing_mode_scope(this, parsing_mode);
    AstNodeFactory function_factory(ast_value_factory());
    FunctionState function_state(&function_state_, &scope_, scope,
                                 kNormalFunction, &function_factory);

    // Don't count the mode in the use counters--give the program a chance
    // to enable script/module-wide strict/strong mode below.
    scope_->SetLanguageMode(info->language_mode());
    ZoneList<Statement*>* body = new(zone()) ZoneList<Statement*>(16, zone());
    bool ok = true;
    int beg_pos = scanner()->location().beg_pos;
    if (info->is_module()) {
      ParseModuleItemList(body, &ok);
    } else {
      ParseStatementList(body, Token::EOS, &ok);
    }

    // The parser will peek but not consume EOS.  Our scope logically goes all
    // the way to the EOS, though.
    scope->set_end_position(scanner()->peek_location().beg_pos);

    if (ok && is_strict(language_mode())) {
      CheckStrictOctalLiteral(beg_pos, scanner()->location().end_pos, &ok);
    }
    if (ok && is_sloppy(language_mode()) && allow_harmony_sloppy_function()) {
      // TODO(littledan): Function bindings on the global object that modify
      // pre-existing bindings should be made writable, enumerable and
      // nonconfigurable if possible, whereas this code will leave attributes
      // unchanged if the property already exists.
      InsertSloppyBlockFunctionVarBindings(scope, &ok);
    }
    if (ok && (is_strict(language_mode()) || allow_harmony_sloppy() ||
               allow_harmony_destructuring_bind())) {
      CheckConflictingVarDeclarations(scope_, &ok);
    }

    if (ok && info->parse_restriction() == ONLY_SINGLE_FUNCTION_LITERAL) {
      if (body->length() != 1 ||
          !body->at(0)->IsExpressionStatement() ||
          !body->at(0)->AsExpressionStatement()->
              expression()->IsFunctionLiteral()) {
        ReportMessage(MessageTemplate::kSingleFunctionLiteral);
        ok = false;
      }
    }

    if (ok) {
      ParserTraits::RewriteDestructuringAssignments();
      result = factory()->NewFunctionLiteral(
          ast_value_factory()->empty_string(), scope_, body,
          function_state.materialized_literal_count(),
          function_state.expected_property_count(), 0,
          FunctionLiteral::kNoDuplicateParameters,
          FunctionLiteral::kGlobalOrEval, FunctionLiteral::kShouldLazyCompile,
          FunctionKind::kNormalFunction, 0);
    }
  }

...

  return result;
}

Parser::ParseModuleItemList

module语句是ES6中引入的新feature,针对每一条,调用ParseModuleItem语句去解析。

void* Parser::ParseModuleItemList(ZoneList<Statement*>* body, bool* ok) {
  // (Ecma 262 6th Edition, 15.2):
  // Module :
  //    ModuleBody?
  //
  // ModuleBody :
  //    ModuleItem*

  DCHECK(scope_->is_module_scope());
  RaiseLanguageMode(STRICT);

  while (peek() != Token::EOS) {
    Statement* stat = ParseModuleItem(CHECK_OK);
    if (stat && !stat->IsEmpty()) {
      body->Add(stat, zone());
    }
  }

  // Check that all exports are bound.
  ModuleDescriptor* descriptor = scope_->module();
  for (ModuleDescriptor::Iterator it = descriptor->iterator(); !it.done();
       it.Advance()) {
    if (scope_->LookupLocal(it.local_name()) == NULL) {
      // TODO(adamk): Pass both local_name and export_name once ParserTraits
      // supports multiple arg error messages.
      // Also try to report this at a better location.
      ParserTraits::ReportMessage(MessageTemplate::kModuleExportUndefined,
                                  it.local_name());
      *ok = false;
      return NULL;
    }
  }

  scope_->module()->Freeze();
  return NULL;
}

Parser::ParseModuleItem

根据token是import,export还是普通语句,分别调用ParseImportDeclaration,ParseExportDeclaration或ParseStatementListItem.

Statement* Parser::ParseModuleItem(bool* ok) {
  // (Ecma 262 6th Edition, 15.2):
  // ModuleItem :
  //    ImportDeclaration
  //    ExportDeclaration
  //    StatementListItem

  switch (peek()) {
    case Token::IMPORT:
      return ParseImportDeclaration(ok);
    case Token::EXPORT:
      return ParseExportDeclaration(ok);
    default:
      return ParseStatementListItem(ok);
  }
}

Parser::ParseImportDeclaration

Statement* Parser::ParseImportDeclaration(bool* ok) {
  // ImportDeclaration :
  //   'import' ImportClause 'from' ModuleSpecifier ';'
  //   'import' ModuleSpecifier ';'
  //
  // ImportClause :
  //   NameSpaceImport
  //   NamedImports
  //   ImportedDefaultBinding
  //   ImportedDefaultBinding ',' NameSpaceImport
  //   ImportedDefaultBinding ',' NamedImports
  //
  // NameSpaceImport :
  //   '*' 'as' ImportedBinding

  int pos = peek_position();
  Expect(Token::IMPORT, CHECK_OK);

  Token::Value tok = peek();

  // 'import' ModuleSpecifier ';'
  if (tok == Token::STRING) {
    const AstRawString* module_specifier = ParseModuleSpecifier(CHECK_OK);
    scope_->module()->AddModuleRequest(module_specifier, zone());
    ExpectSemicolon(CHECK_OK);
    return factory()->NewEmptyStatement(pos);
  }

  // Parse ImportedDefaultBinding if present.
  ImportDeclaration* import_default_declaration = NULL;
  if (tok != Token::MUL && tok != Token::LBRACE) {
    const AstRawString* local_name =
        ParseIdentifier(kDontAllowRestrictedIdentifiers, CHECK_OK);
    VariableProxy* proxy = NewUnresolved(local_name, IMPORT);
    import_default_declaration = factory()->NewImportDeclaration(
        proxy, ast_value_factory()->default_string(), NULL, scope_, pos);
    Declare(import_default_declaration, DeclarationDescriptor::NORMAL, true,
            CHECK_OK);
  }

  const AstRawString* module_instance_binding = NULL;
  ZoneList<ImportDeclaration*>* named_declarations = NULL;
  if (import_default_declaration == NULL || Check(Token::COMMA)) {
    switch (peek()) {
      case Token::MUL: {
        Consume(Token::MUL);
        ExpectContextualKeyword(CStrVector("as"), CHECK_OK);
        module_instance_binding =
            ParseIdentifier(kDontAllowRestrictedIdentifiers, CHECK_OK);
        // TODO(ES6): Add an appropriate declaration.
        break;
      }

      case Token::LBRACE:
        named_declarations = ParseNamedImports(pos, CHECK_OK);
        break;

      default:
        *ok = false;
        ReportUnexpectedToken(scanner()->current_token());
        return NULL;
    }
  }

  ExpectContextualKeyword(CStrVector("from"), CHECK_OK);
  const AstRawString* module_specifier = ParseModuleSpecifier(CHECK_OK);
  scope_->module()->AddModuleRequest(module_specifier, zone());

  if (module_instance_binding != NULL) {
    // TODO(ES6): Set the module specifier for the module namespace binding.
  }

  if (import_default_declaration != NULL) {
    import_default_declaration->set_module_specifier(module_specifier);
  }

  if (named_declarations != NULL) {
    for (int i = 0; i < named_declarations->length(); ++i) {
      named_declarations->at(i)->set_module_specifier(module_specifier);
    }
  }

  ExpectSemicolon(CHECK_OK);
  return factory()->NewEmptyStatement(pos);
}

Parser::ParseStatementList

终于开始做语句的词法和语法分析了,它将继续调用ParseStatementListItem去处理每条语句,后面有一些细节我们先略过:

void* Parser::ParseStatementList(ZoneList<Statement*>* body, int end_token,
                                 bool* ok) {
  // StatementList ::
  //   (StatementListItem)* <end_token>

  // Allocate a target stack to use for this set of source
  // elements. This way, all scripts and functions get their own
  // target stack thus avoiding illegal breaks and continues across
  // functions.
  TargetScope scope(&this->target_stack_);

  DCHECK(body != NULL);
  bool directive_prologue = true;     // Parsing directive prologue.

  while (peek() != end_token) {
    if (directive_prologue && peek() != Token::STRING) {
      directive_prologue = false;
    }

    Scanner::Location token_loc = scanner()->peek_location();
    Scanner::Location old_this_loc = function_state_->this_location();
    Scanner::Location old_super_loc = function_state_->super_location();
    Statement* stat = ParseStatementListItem(CHECK_OK);

    if (is_strong(language_mode()) && scope_->is_function_scope() &&
        IsClassConstructor(function_state_->kind())) {
      Scanner::Location this_loc = function_state_->this_location();
      Scanner::Location super_loc = function_state_->super_location();
      if (this_loc.beg_pos != old_this_loc.beg_pos &&
          this_loc.beg_pos != token_loc.beg_pos) {
        ReportMessageAt(this_loc, MessageTemplate::kStrongConstructorThis);
        *ok = false;
        return nullptr;
      }
      if (super_loc.beg_pos != old_super_loc.beg_pos &&
          super_loc.beg_pos != token_loc.beg_pos) {
        ReportMessageAt(super_loc, MessageTemplate::kStrongConstructorSuper);
        *ok = false;
        return nullptr;
      }
    }

    if (stat == NULL || stat->IsEmpty()) {
      directive_prologue = false;   // End of directive prologue.
      continue;
    }

...

    body->Add(stat, zone());
  }

  return 0;
}

Parser::ParseStatement

只管空语句,其余的交给ParseSubStatement去处理。

1720Statement* Parser::ParseStatement(ZoneList<const AstRawString*>* labels,
1721                                  bool* ok) {
1722  // Statement ::
1723  //   EmptyStatement
1724  //   ...
1725
1726  if (peek() == Token::SEMICOLON) {
1727    Next();
1728    return factory()->NewEmptyStatement(RelocInfo::kNoPosition);
1729  }
1730  return ParseSubStatement(labels, ok);
1731}

AstNodeFactory::NewEmptyStatement

语法分析的输出结果,会生成一棵Ast树。AstNodeFactory就是生成AstNode的Helper函数的工厂类。
我们先看下它的定义:

3086// ----------------------------------------------------------------------------
3087// AstNode factory
3088
3089class AstNodeFactory final BASE_EMBEDDED {
3090 public:
3091  explicit AstNodeFactory(AstValueFactory* ast_value_factory)
3092      : local_zone_(ast_value_factory->zone()),
3093        parser_zone_(ast_value_factory->zone()),
3094        ast_value_factory_(ast_value_factory) {}
3095
3096  AstValueFactory* ast_value_factory() const { return ast_value_factory_; }
3097
3098  VariableDeclaration* NewVariableDeclaration(
3099      VariableProxy* proxy, VariableMode mode, Scope* scope, int pos,
3100      bool is_class_declaration = false, int declaration_group_start = -1) {
3101    return new (parser_zone_)
3102        VariableDeclaration(parser_zone_, proxy, mode, scope, pos,
3103                            is_class_declaration, declaration_group_start);
3104  }

我们先看一个最简单的例子:NewEmptyStatement:

  EmptyStatement* NewEmptyStatement(int pos) {
    return new (local_zone_) EmptyStatement(local_zone_, pos);
  }

这些具体的AST类,定义于src/ast/ast.h:

class EmptyStatement final : public Statement {
 public:
  DECLARE_NODE_TYPE(EmptyStatement)

 protected:
  explicit EmptyStatement(Zone* zone, int pos): Statement(zone, pos) {}
};

Parser::ParseSubStatement

针对不同的语句,分别有不同的Parse函数来处理,我们选其中的三个例子继续看一下:

  • 代码块:ParseBlock
  • if语句:ParseIfStatement
  • do-while循环:ParseDoWhileStatement

其余的我们看一下解析子语句的完整实现,代码不长,很清晰,不言自明,就不多解释了:

Statement* Parser::ParseSubStatement(ZoneList<const AstRawString*>* labels,
                                     bool* ok) {
  // Statement ::
  //   Block
  //   VariableStatement
  //   EmptyStatement
  //   ExpressionStatement
  //   IfStatement
  //   IterationStatement
  //   ContinueStatement
  //   BreakStatement
  //   ReturnStatement
  //   WithStatement
  //   LabelledStatement
  //   SwitchStatement
  //   ThrowStatement
  //   TryStatement
  //   DebuggerStatement

  // Note: Since labels can only be used by 'break' and 'continue'
  // statements, which themselves are only valid within blocks,
  // iterations or 'switch' statements (i.e., BreakableStatements),
  // labels can be simply ignored in all other cases; except for
  // trivial labeled break statements 'label: break label' which is
  // parsed into an empty statement.
  switch (peek()) {
    case Token::LBRACE:
      return ParseBlock(labels, ok);

    case Token::SEMICOLON:
      if (is_strong(language_mode())) {
        ReportMessageAt(scanner()->peek_location(),
                        MessageTemplate::kStrongEmpty);
        *ok = false;
        return NULL;
      }
      Next();
      return factory()->NewEmptyStatement(RelocInfo::kNoPosition);

    case Token::IF:
      return ParseIfStatement(labels, ok);

    case Token::DO:
      return ParseDoWhileStatement(labels, ok);

    case Token::WHILE:
      return ParseWhileStatement(labels, ok);

    case Token::FOR:
      return ParseForStatement(labels, ok);

    case Token::CONTINUE:
    case Token::BREAK:
    case Token::RETURN:
    case Token::THROW:
    case Token::TRY: {
      // These statements must have their labels preserved in an enclosing
      // block
      if (labels == NULL) {
        return ParseStatementAsUnlabelled(labels, ok);
      } else {
        Block* result =
            factory()->NewBlock(labels, 1, false, RelocInfo::kNoPosition);
        Target target(&this->target_stack_, result);
        Statement* statement = ParseStatementAsUnlabelled(labels, CHECK_OK);
        if (result) result->statements()->Add(statement, zone());
        return result;
      }
    }

    case Token::WITH:
      return ParseWithStatement(labels, ok);

    case Token::SWITCH:
      return ParseSwitchStatement(labels, ok);

    case Token::FUNCTION: {
      // FunctionDeclaration is only allowed in the context of SourceElements
      // (Ecma 262 5th Edition, clause 14):
      // SourceElement:
      //    Statement
      //    FunctionDeclaration
      // Common language extension is to allow function declaration in place
      // of any statement. This language extension is disabled in strict mode.
      //
      // In Harmony mode, this case also handles the extension:
      // Statement:
      //    GeneratorDeclaration
      if (is_strict(language_mode())) {
        ReportMessageAt(scanner()->peek_location(),
                        MessageTemplate::kStrictFunction);
        *ok = false;
        return NULL;
      }
      return ParseFunctionDeclaration(NULL, ok);
    }

    case Token::DEBUGGER:
      return ParseDebuggerStatement(ok);

    case Token::VAR:
      return ParseVariableStatement(kStatement, NULL, ok);

    case Token::CONST:
      // In ES6 CONST is not allowed as a Statement, only as a
      // LexicalDeclaration, however we continue to allow it in sloppy mode for
      // backwards compatibility.
      if (is_sloppy(language_mode()) && allow_legacy_const()) {
        return ParseVariableStatement(kStatement, NULL, ok);
      }

    // Fall through.
    default:
      return ParseExpressionOrLabelledStatement(labels, ok);
  }
}

构造一个代码块 Parser::ParseBlock

Block* Parser::ParseBlock(ZoneList<const AstRawString*>* labels,
                          bool finalize_block_scope, bool* ok) {
  // The harmony mode uses block elements instead of statements.
  //
  // Block ::
  //   '{' StatementList '}'

下面是遇到左大括号时,调用AstNodeFactory的NewBlock函数生成一个Block类的AST节点。

  // Construct block expecting 16 statements.
  Block* body =
      factory()->NewBlock(labels, 16, false, RelocInfo::kNoPosition);
  Scope* block_scope = NewScope(scope_, BLOCK_SCOPE);

  // Parse the statements and collect escaping labels.
  Expect(Token::LBRACE, CHECK_OK);
  block_scope->set_start_position(scanner()->location().beg_pos);
  { BlockState block_state(&scope_, block_scope);
    Target target(&this->target_stack_, body);

下面如果没遇到右括号,就处理语句列表,递归:

    while (peek() != Token::RBRACE) {
      Statement* stat = ParseStatementListItem(CHECK_OK);
      if (stat && !stat->IsEmpty()) {
        body->statements()->Add(stat, zone());
      }
    }
  }
  Expect(Token::RBRACE, CHECK_OK);
  block_scope->set_end_position(scanner()->location().end_pos);
  if (finalize_block_scope) {
    block_scope = block_scope->FinalizeBlockScope();
  }
  body->set_scope(block_scope);
  return body;
}

下面是src/ast/ast.h中AstNodeFactory::NewBlock的实现:

  Block* NewBlock(ZoneList<const AstRawString*>* labels, int capacity,
                  bool ignore_completion_value, int pos) {
    return new (local_zone_)
        Block(local_zone_, labels, capacity, ignore_completion_value, pos);
  }

Block是一个BreakableStatement:

class Block final : public BreakableStatement {
 public:
  DECLARE_NODE_TYPE(Block)

  ZoneList<Statement*>* statements() { return &statements_; }
  bool ignore_completion_value() const { return ignore_completion_value_; }

  static int num_ids() { return parent_num_ids() + 1; }
  BailoutId DeclsId() const { return BailoutId(local_id(0)); }

  bool IsJump() const override {
    return !statements_.is_empty() && statements_.last()->IsJump()
        && labels() == NULL;  // Good enough as an approximation...
  }

  void MarkTail() override {
    if (!statements_.is_empty()) statements_.last()->MarkTail();
  }

  Scope* scope() const { return scope_; }
  void set_scope(Scope* scope) { scope_ = scope; }

 protected:
  Block(Zone* zone, ZoneList<const AstRawString*>* labels, int capacity,
        bool ignore_completion_value, int pos)
      : BreakableStatement(zone, labels, TARGET_FOR_NAMED_ONLY, pos),
        statements_(capacity, zone),
        ignore_completion_value_(ignore_completion_value),
        scope_(NULL) {}
  static int parent_num_ids() { return BreakableStatement::num_ids(); }

 private:
  int local_id(int n) const { return base_id() + parent_num_ids() + n; }

  ZoneList<Statement*> statements_;
  bool ignore_completion_value_;
  Scope* scope_;
};

if语句 - Parser::ParseIfStatement

if比前面的Block更简单,但是可能遇到表达式,遇到就调用ParseExpression,然后处理then块和else块。没什么技术含量哈。

IfStatement* Parser::ParseIfStatement(ZoneList<const AstRawString*>* labels,
                                      bool* ok) {
  // IfStatement ::
  //   'if' '(' Expression ')' Statement ('else' Statement)?

  int pos = peek_position();
  Expect(Token::IF, CHECK_OK);
  Expect(Token::LPAREN, CHECK_OK);
  Expression* condition = ParseExpression(true, CHECK_OK);
  Expect(Token::RPAREN, CHECK_OK);
  Statement* then_statement = ParseSubStatement(labels, CHECK_OK);
  Statement* else_statement = NULL;
  if (peek() == Token::ELSE) {
    Next();
    else_statement = ParseSubStatement(labels, CHECK_OK);
  } else {
    else_statement = factory()->NewEmptyStatement(RelocInfo::kNoPosition);
  }
  return factory()->NewIfStatement(
      condition, then_statement, else_statement, pos);
}

do-while循环 - Parser::ParseDoWhileStatement

调用AstNodeFactory的NewDoWhileStatement生成ASTNode对象。然后处理do和while中间的语句,最后解析while中的表达式。

DoWhileStatement* Parser::ParseDoWhileStatement(
    ZoneList<const AstRawString*>* labels, bool* ok) {
  // DoStatement ::
  //   'do' Statement 'while' '(' Expression ')' ';'

  DoWhileStatement* loop =
      factory()->NewDoWhileStatement(labels, peek_position());
  Target target(&this->target_stack_, loop);

  Expect(Token::DO, CHECK_OK);
  Statement* body = ParseSubStatement(NULL, CHECK_OK);
  Expect(Token::WHILE, CHECK_OK);
  Expect(Token::LPAREN, CHECK_OK);

  Expression* cond = ParseExpression(true, CHECK_OK);
  Expect(Token::RPAREN, CHECK_OK);

  // Allow do-statements to be terminated with and without
  // semi-colons. This allows code such as 'do;while(0)return' to
  // parse, which would not be the case if we had used the
  // ExpectSemicolon() functionality here.
  if (peek() == Token::SEMICOLON) Consume(Token::SEMICOLON);

  if (loop != NULL) loop->Initialize(cond, body);
  return loop;
}

我们来一张UML图来复习一下上面的过程:
v8_parse

目录
相关文章
|
7月前
|
存储 自然语言处理 前端开发
编译原理 - 语义分析
编译原理 - 语义分析
103 1
|
7月前
|
自然语言处理 算法 前端开发
编译原理 - 词法分析
编译原理 - 词法分析
81 0
|
自然语言处理 C语言
编译原理实验-词法分析
编译原理实验C语言实现
116 0
【编译原理】语法分析:从顶向下、最左推导
【编译原理】语法分析:从顶向下、最左推导
|
自然语言处理 前端开发 算法
编译原理 (二)词法分析、语法分析、语义分析以及中间代码生成器的基本概念
编译原理 (二)词法分析、语法分析、语义分析以及中间代码生成器的基本概念
830 0
|
2月前
|
自然语言处理 编译器 C语言
软考:区分词法分析、语法分析、语义分析
本文解释了编译过程中的词法分析、语法分析和语义分析三个阶段的区别,并提供了相关练习题,帮助读者理解各阶段在编译过程中的作用和重要性。
90 4
谓词逻辑之 语法规则
  谓词逻辑公式涉及两种事物: ⑴是我们谈及的对象,如a和p这样的个体,以及x和u这样的变量和函数符号。在谓词逻辑中,用来表示对象的表达式称为项(terms); ⑵是表示真值,即公式,例如Y(x,m(x))是公式。
1050 0
|
7月前
|
算法 编译器 C语言
编译原理 - 语法分析
编译原理 - 语法分析
102 0
|
7月前
|
存储 自然语言处理 编译器
【编译原理】词法分析:C/C++实现
【编译原理】词法分析:C/C++实现
344 1
|
7月前
|
自然语言处理
【编译原理】词法分析
【编译原理】词法分析
65 0