C++实现编译器(9): 函数与栈帧¶

参照《Writing An Interpreter/Compiler In Go》，改用C++实现。

项目源码: https://github.com/LeslieZhu/monkey-cpp

引言¶

本篇对应的源码位于目录: src/08/

函数部分是最重要也是最难的部分，随着函数的编译需要切换编译作用域，而函数的递归和互相调用则需要切换调用栈环境，引入了栈帧。

函数常量¶

这里从一个没有参数、没有局部变量、没有全局变量的最简单函数入手:

let test = fn(){ 3 + 4 }

可以将函数当做字面常量来表示，它被定义后本身的内容或值是不变的；因此，新增一种对象类型，即函数编译后的对象，该对象包含自身的字节码。

enum class ObjectType
{
    //...
    COMPILED_FUNCTION,
};

struct CompiledFunction: Object
{
    bytecode::Instructions Instructions;

    CompiledFunction(bytecode::Instructions& ins): Instructions(ins){}
    virtual ObjectType Type() { return ObjectType::COMPILED_FUNCTION; }
    virtual std::string Inspect()
    {
        std::stringstream oss;
        oss << this;
        return oss.str();
    }
};

这样一个函数编译后就和其它对象一样看待处理，函数体就是一个代码块，而之前已经可以支持代码块编译为字节码了。因此，对于这个简单函数的编译不是问题，如何调用才是问题。

编译作用域¶

编译函数生成的字节码不能和主程序的指令混合在一起，因此需要引入带作用域的栈。

将之前编译器里面的三个元素独立出来作为一个编译作用域结构:

struct CompilationScope{
    bytecode::Instructions instructions;
    EmittedInstruction lastInstruction;
    EmittedInstruction prevInstruction;
};

struct Compiler
{
    std::vector<std::shared_ptr<objects::Object>> constants;
    std::shared_ptr<compiler::SymbolTable> symbolTable;

    std::vector<std::shared_ptr<CompilationScope>> scopes;
    int scopeIndex;

    Compiler(){
        symbolTable = compiler::NewSymbolTable();

        auto mainScope = std::make_shared<CompilationScope>();
        scopes.push_back(mainScope);
        scopeIndex = 0;
    }

    //...

    bytecode::Instructions currentInstructions()
    {
        return scopes[scopeIndex]->instructions;
    }

    void enterScope()
    {
        auto scope = std::make_shared<CompilationScope>();
        scopes.push_back(scope);
        scopeIndex += 1;
    }

    bytecode::Instructions leaveScope()
    {
        auto ins = currentInstructions();
        scopes.pop_back();
        scopeIndex -= 1;

        return ins;
    }
};

和之前的区别是通过enterScope和leaveScope来创建和销毁编译作用域，从而防止在编译时把主程序字节码打乱。

函数编译与调用约定¶

调用一个函数，对于返回值有两种可能：

显示或隐式返回了内容
没有返回内容

比如:

let fn1 = fn(){ return 5 + 10; }
let fn2 = fn(){ 5 + 10; }
let fn3 = fn() { }

不管是显示还是隐式返回，只要有返回内容，都应该编译为相同的字节码，新增操作码OpReturnValue；对于没有返回值的函数，即函数编译后，发现结尾不是一个操作码OpReturnValue结尾，那么就是无返回值类型，自动给它补上一个null无效值对象，新增操作码OpReturn。

对于函数的调用，新增OpCall操作码，它的操作数暂时是0，因为不处理带有参数的函数，后续根据参数个数作为操作数。

enum class OpcodeType : Opcode
{
    //...
    OpCall,
    OpReturnValue,
    OpReturn,   // return null
};

else if(node->GetNodeType() == ast::NodeType::FunctionLiteral)
{
    std::shared_ptr<ast::FunctionLiteral> funcObj = std::dynamic_pointer_cast<ast::FunctionLiteral>(node);

    // 在新编译作用域编译函数
    enterScope();

    auto resultObj = Compile(funcObj->pBody);
    if (evaluator::isError(resultObj))
    {
        return resultObj;
    }

    // 隐式返回
    if (lastInstructionIs(bytecode::OpcodeType::OpPop))
    {
        removeLastPopWithReturn();
    }

    // 无返回则强制返回null
    if(!lastInstructionIs(bytecode::OpcodeType::OpReturnValue))
    {
        emit(bytecode::OpcodeType::OpReturn);
    }

    // 回到上一层作用域并获取编译字节码
    auto ins = leaveScope();

    auto compiledFn = std::make_shared<objects::CompiledFunction>(ins);
    auto pos = addConstant(compiledFn);

    emit(bytecode::OpcodeType::OpConstant, {pos});
}
else if(node->GetNodeType() == ast::NodeType::ReturnStatement)
{
    std::shared_ptr<ast::ReturnStatement> returnObj = std::dynamic_pointer_cast<ast::ReturnStatement>(node);

    auto resultObj = Compile(returnObj->pReturnValue);
    if (evaluator::isError(resultObj))
    {
        return resultObj;
    }

    emit(bytecode::OpcodeType::OpReturnValue);
}

因为函数编译后是作为一个常量对象处理，所以它要么是被OpConstant变量加入到constants里面，要么是被OpSetGlobal绑到到全局变量了；总之，当它被调用的时候，这个字面量已经在调用栈栈顶了，这个时候只需要发出OpCall指令就可以调用它。

函数被调用后，如果有返回值，则函数本身出栈，返回值入栈；如果没有返回值，则函数出栈，一个null值入栈。

因为目前考虑的函数都没有参数，所以虚拟机在函数执行后不需要发出OpPop指令来让函数出栈。

else if(node->GetNodeType() == ast::NodeType::CallExpression)
{
    std::shared_ptr<ast::CallExpression> callObj = std::dynamic_pointer_cast<ast::CallExpression>(node);

    auto resultObj = Compile(callObj->pFunction);
    if (evaluator::isError(resultObj))
    {
        return resultObj;
    }

    emit(bytecode::OpcodeType::OpCall);
}

栈帧¶

帧是调用帧或者栈帧的简称，指保存与执行信息相关的数据结构，在编译器或解释器术语中，有时也叫活动记录。

namespace vm
{
    struct Frame{
        std::shared_ptr<objects::CompiledFunction> fn;
        int ip;

        Frame(std::shared_ptr<objects::CompiledFunction> f, const int i): fn(f), ip(i){}
        bytecode::Instructions Instruction()
        {
            return fn->Instructions;
        }
    };

    std::shared_ptr<Frame> NewFrame(std::shared_ptr<objects::CompiledFunction> f)
    {
        return std::make_shared<Frame>(f, -1);
    }
}

发现这个帧的定义和之前整个虚拟机的调用栈类似，只不过调用的主体变成了函数编译后的字节码对象。

既然整个虚拟机处理的就是一个字节码，那么可以把之前的主程序字节码也当做是一个编译后的函数，即main函数，对应的帧当做主帧。

// 最多嵌套1024层调用
const int FrameSize = 1024;

struct VM{
    std::vector<std::shared_ptr<objects::Object>> constants;
    std::vector<std::shared_ptr<objects::Object>> globals;

    std::vector<std::shared_ptr<objects::Object>> stack;
    int sp; // 始终指向调用栈的下一个空闲位置，栈顶的值是stack[sp-1]

    std::vector<std::shared_ptr<Frame>> frames;
    int frameIndex;

    //...

    std::shared_ptr<Frame> currentFrame()
    {
        return frames[frameIndex - 1];
    }

    void pushFrame(std::shared_ptr<Frame> f)
    {
        frames[frameIndex] = f;
        frameIndex += 1;
    }

    std::shared_ptr<Frame> popFrame()
    {
        frameIndex -= 1;
        return frames[frameIndex];
    }
};

std::shared_ptr<VM> New(std::shared_ptr<compiler::ByteCode> bytecode)
{
    auto mainFn = std::make_shared<objects::CompiledFunction>(bytecode->Instructions);
    auto mainFrame = NewFrame(mainFn);

    std::vector<std::shared_ptr<Frame>> frames(FrameSize);
    frames[0] = mainFrame; // frameIndex: 1

    return std::make_shared<VM>(bytecode->Constants, frames);
}

为了追求性能，预先分配好了栈帧空间，而不是采用追加动态扩展内存方式。

当前只需要修改Run函数，将之前直接使用instructions的地方改成使用主帧即可。

std::shared_ptr<objects::Object> Run()
{
   // 主帧初始化
   auto frame = currentFrame();

   int ip;
   bytecode::OpcodeType op;

   // 获取主帧字节码序列
   bytecode::Instructions instructions = currentFrame()->Instruction();
   int ins_size = instructions.size();

   while(frame->ip < ins_size - 1) // frame->ip start with -1
   {
       frame->ip += 1;

       ip = frame->ip;
       op = static_cast<bytecode::OpcodeType>(instructions[ip]);

       switch(op)
       {
          //...

          case bytecode::OpcodeType::OpCall:
               {
                   auto fnObj = stack[sp - 1];
                   if(fnObj->Type() != objects::ObjectType::COMPILED_FUNCTION)
                   {
                       return evaluator::newError("calling non-function");
                   }

                   auto compiledFnObj = std::dynamic_pointer_cast<objects::CompiledFunction>(fnObj);
                   auto funcFrame = NewFrame(compiledFnObj);

                   pushFrame(funcFrame);

                   // 立即切换栈帧环境
                   frame = currentFrame();
                   instructions = frame->Instruction();
                   ins_size = instructions.size();
               }
               break;
           case bytecode::OpcodeType::OpReturnValue:
               {
                   auto returnValue = Pop();

                   popFrame();

                   // 立即切换栈帧环境
                   frame = currentFrame();
                   instructions = frame->Instruction();
                   ins_size = instructions.size();

                   Pop(); // 函数本体出栈

                   auto result = Push(returnValue);
                   if(evaluator::isError(result))
                   {
                      return result;
                   }
               }
               break;
           case bytecode::OpcodeType::OpReturn:
               {
                   popFrame();

                   // 立即切换栈帧环境
                   frame = currentFrame();
                   instructions = frame->Instruction();
                   ins_size = instructions.size();

                   Pop(); // 函数本体出栈

                   auto result = Push(objects::NULL_OBJ);
                   if(evaluator::isError(result))
                   {
                      return result;
                   }
               }
               break;
       }
   }

在栈帧变化的时候需要切换栈帧环境，为下一次循环做好初始化工作。

局部变量操作码¶

let globalSeed = 50;

let minusOne = fn() {
    let num = 1;
    globalSeed - num;
}

let minusTwo = fn() {
    let num = 2;
    globalSeed - num;
}

minusOne() + minusTwo();

这个例子中全局变量和局部变量都有，其中num局部变量只能在函数内部访问。

相对于全局变量操作数使用2个字节，局部变量操作数使用1个字节，256个局部变量应该是够了:

{OpcodeType::OpGetGlobal, std::make_shared<Definition>("OpGetGlobal", 2)},
{OpcodeType::OpSetGlobal, std::make_shared<Definition>("OpSetGlobal", 2)},

{OpcodeType::OpGetLocal, std::make_shared<Definition>("OpGetLocal", 1)},
{OpcodeType::OpSetLocal, std::make_shared<Definition>("OpSetLocal", 1)},

支持单字节的编码和解码:

switch(width)
{
    case 2:
        {
            uint16_t uint16Value = static_cast<uint16_t>(operands[i]);
            WriteUint16(instruction, offset, uint16Value);
        }
        break;
    case 1:
        {
            instruction[offset] = static_cast<Opcode>(operands[i]);
        }
        break;
}

switch(width)
{
    case 2:
        {
            uint16_t uint16Value;
            ReadUint16(ins, pos, uint16Value);
            operands[i] = static_cast<int>(uint16Value);
        }
        break;
    case 1:
        {
            uint8_t uint8Value;
            ReadUint8(ins, pos, uint8Value);
            operands[i] = static_cast<int>(uint8Value);
        }
        break;
}

编译时作用域与符号表的切换¶

之前编译作用域的时候只使用了全局作用域，但已经能够切换作用域了，那么伴随作用域的切换同时也切换符号表:

namespace SymbolScopeType
{
    const SymbolScope GlobalScope = "GLOBAL";
    const SymbolScope LocalScope = "LOCAL";
}

struct SymbolTable{
    std::shared_ptr<SymbolTable> Outer;
    std::map<std::string, std::shared_ptr<Symbol>> store;
    int numDefinitions;

    std::shared_ptr<Symbol> Define(std::string name)
    {
        auto symbol = std::make_shared<Symbol>(name, numDefinitions);

        // 决定是否外部变量或非当前局部变量
        if(Outer == nullptr)
        {
            symbol->Scope = SymbolScopeType::GlobalScope;
        } else {
            symbol->Scope = SymbolScopeType::LocalScope;
        }

        store[name] = symbol;
        numDefinitions++;
        return symbol;
    }

    std::shared_ptr<Symbol> Resolve(std::string name)
    {
        auto fit = store.find(name);
        if(fit != store.end())
        {
            return fit->second;
        }
        else if(Outer != nullptr)
        {
            return Outer->Resolve(name);
        }
        else
        {
            return nullptr;
        }
    }
};

std::shared_ptr<SymbolTable> NewEnclosedSymbolTable(std::shared_ptr<SymbolTable> outer)
{
    auto symbol = std::make_shared<SymbolTable>();
    symbol->Outer = outer;
    return symbol;
}

新的符号表会有一条嵌套的外部符号表，编译的时候根据这个外部符号表是否存在来决定它是不是外部变量或者非当前局部变量:

void enterScope()
{
    auto scope = std::make_shared<CompilationScope>();
    scopes.push_back(scope);
    scopeIndex += 1;
    symbolTable = NewEnclosedSymbolTable(symbolTable);
}

bytecode::Instructions leaveScope()
{
    auto ins = currentInstructions();
    scopes.pop_back();
    scopeIndex -= 1;
    symbolTable = symbolTable->Outer;

    return ins;
}

编译为字节码的时候区别就是是否本地局部变量操作码的选择:

if(symbol->Scope == compiler::SymbolScopeType::GlobalScope)
{
    emit(bytecode::OpcodeType::OpSetGlobal, {symbol->Index});
} else {
    emit(bytecode::OpcodeType::OpSetLocal, {symbol->Index});
}

if(symbol->Scope == compiler::SymbolScopeType::GlobalScope)
{
    emit(bytecode::OpcodeType::OpGetGlobal, {symbol->Index});
} else {
    emit(bytecode::OpcodeType::OpGetLocal, {symbol->Index});
}

虚拟机运行时局部绑定¶

之前已经可以处理全局变量了，虽然可以在调用函数的时候遇到SetLocal指令就临时建立一个数据结构来存储局部变量，但这种涉及内存动态分配会影响性能。

在虚拟机遇到OpCall指令需要调用函数时，此时栈顶是函数字节码本体，如果调用函数期间的局部变量可以借用虚拟机调用栈，就可以避免动态分配内存。

假设知道接下来调用函数有N个局部变量，那么在调用栈栈顶现在借用N个空间，执行的时候遇到局部变量定义就入栈一个元素，遇到使用局部变量就从调用栈读取。

最后函数调用完毕，归还这部分借用的N个元素空间，这里的归还其实就是调用栈的栈顶指针回归原位，这个原位又叫基指针或帧指针。

先获取函数的局部变量数量，这个信息有助于需要调用调用栈多少空间:

struct CompiledFunction: Object
{
    bytecode::Instructions Instructions;
    int NumLocals;
    //...
};

struct Frame
{
   std::shared_ptr<objects::CompiledFunction> fn;
   int ip;
   int basePointer;
   //..
};

遇到OpCall指令时:

case bytecode::OpcodeType::OpCall:
    {
        auto fnObj = stack[sp - 1];
        if(fnObj->Type() != objects::ObjectType::COMPILED_FUNCTION)
        {
            return evaluator::newError("calling non-function");
        }

        auto compiledFnObj = std::dynamic_pointer_cast<objects::CompiledFunction>(fnObj);
        auto funcFrame = NewFrame(compiledFnObj, sp);

        pushFrame(funcFrame);

        frame = currentFrame();
        instructions = frame->Instruction();
        ins_size = instructions.size();

        // 借用一段调用栈空间
        sp = frame->basePointer + compiledFnObj->NumLocals;
    }
    break;

如果只看一个简单函数，会觉得这个地方的sp并不需要改变值，但因为函数嵌套的缘故，必须改变sp值以达到借用的效果。

因为借用调用栈处理局部变量绑定，当遇到嵌套很深的递归调用时，可能调用栈空间很快就不足了。

进入函数调用内部后，处理SetLocal与GetLocal指令:

case bytecode::OpcodeType::OpSetLocal:
    {
        uint8_t localIndex;
        bytecode::ReadUint8(instructions, ip+1, localIndex);
        frame->ip += 1;

        stack[frame->basePointer + int(localIndex)] = Pop();
    }
    break;
case bytecode::OpcodeType::OpGetLocal:
    {
        uint8_t localIndex;
        bytecode::ReadUint8(instructions, ip+1, localIndex);
        frame->ip += 1;

        auto result = Push(stack[frame->basePointer + int(localIndex)]);
        if(evaluator::isError(result))
        {
           return result;
        }
    }
    break;

通过基指针来精确定位调用栈中的空间，在使用完毕函数退出的时候要把这段空间归还:

case bytecode::OpcodeType::OpReturnValue:
     {
         auto returnValue = Pop();

         auto callFrame = popFrame();
         sp = callFrame->basePointer - 1;

         frame = currentFrame();
         instructions = frame->Instruction();
         ins_size = instructions.size();

         //Pop(); // 函数本体出栈

         auto result = Push(returnValue);
         if(evaluator::isError(result))
         {
            return result;
         }
     }
     break;
case bytecode::OpcodeType::OpReturn:
     {
         auto callFrame = popFrame();
         sp = callFrame->basePointer - 1;

         frame = currentFrame();
         instructions = frame->Instruction();
         ins_size = instructions.size();

         //Pop(); // 函数本体出栈

         auto result = Push(objects::NULL_OBJ);
         if(evaluator::isError(result))
         {
            return result;
         }
     }
     break;

本来是直接恢复基指针，但直接通过基指针减1的方式，可以省去一个Pop操作来弹出函数本体；至此，函数局部变量可以处理了。

函数参数¶

函数调用的参数是一种特殊的局部变量绑定，只不是是隐式的绑定。

之前假设函数是不带参数的，因此函数本体入栈后，接着便是OpCall指令；但现在做出改变，函数本体入栈后，函数参数入栈，最后才是OpCall指令。为了让OpCall知道函数有多少个参数，它需要使用1个字节来作为操作数。那么它的工作方式是往前寻找N个位置到达函数本体，函数本体调用的时候后面N个元素便是它的参数了。

// {OpcodeType::OpCall, std::make_shared<Definition>("OpCall")},
{OpcodeType::OpCall, std::make_shared<Definition>("OpCall", 1)},

函数编译后要包含参数个数信息:

struct CompiledFunction: Object
{
    bytecode::Instructions Instructions;
    int NumLocals;
    int NumParameters;

    CompiledFunction(bytecode::Instructions &ins, const int &numLocals, const int &numParameters)
        : Instructions(ins),
          NumLocals(numLocals),
          NumParameters(numParameters)
    {
    }

    //...
};

编译函数的时候把参数当做局部变量绑定发出SetLocal指令:

else if(node->GetNodeType() == ast::NodeType::FunctionLiteral)
{
    std::shared_ptr<ast::FunctionLiteral> funcObj = std::dynamic_pointer_cast<ast::FunctionLiteral>(node);

    enterScope();

    // 把参数当做局部变量进行绑定
    for(auto &args: funcObj->v_pParameters)
    {
        symbolTable->Define(args->Value);
    }

    auto resultObj = Compile(funcObj->pBody);
    if (evaluator::isError(resultObj))
    {
        return resultObj;
    }

    if (lastInstructionIs(bytecode::OpcodeType::OpPop))
    {
        removeLastPopWithReturn();
    }

    if(!lastInstructionIs(bytecode::OpcodeType::OpReturnValue))
    {
        emit(bytecode::OpcodeType::OpReturn);
    }

    auto numLocals = symbolTable->numDefinitions;
    auto numParameters = funcObj->v_pParameters.size();
    auto ins = leaveScope();

    auto compiledFn = std::make_shared<objects::CompiledFunction>(ins, numLocals, numParameters);
    auto pos = addConstant(compiledFn);
    emit(bytecode::OpcodeType::OpConstant, {pos});
}

而编译生成OpCall指令时要带上被调用函数参数个数作为操作数:

else if(node->GetNodeType() == ast::NodeType::CallExpression)
{
    std::shared_ptr<ast::CallExpression> callObj = std::dynamic_pointer_cast<ast::CallExpression>(node);

    // 将函数本体编译，最先入栈
    auto resultObj = Compile(callObj->pFunction);
    if (evaluator::isError(resultObj))
    {
        return resultObj;
    }

    // 将每个实参编译进去，入栈时在OpCall前面
    for(auto &args: callObj->pArguments)
    {
        resultObj = Compile(args);
        if (evaluator::isError(resultObj))
        {
            return resultObj;
        }
    }

    int argsNum = callObj->pArguments.size();

    // 入栈OpCall指令，操作数是实参个数
    emit(bytecode::OpcodeType::OpCall, {argsNum});
}

虚拟机具体执行的时候只要找准函数本体的位置作为基指针，那么所有问题都水到渠成了，等同于实参已经自动绑定到形参:

case bytecode::OpcodeType::OpCall:
    {
        // 读取1字节为实参个数
        uint8_t numArgs;
        bytecode::ReadUint8(instructions, ip+1, numArgs);
        frame->ip += 1;

        auto result = callFunction((int)numArgs);
        if(evaluator::isError(result))
        {
           return result;
        }

        frame = currentFrame();
        instructions = frame->Instruction();
        ins_size = instructions.size();
    }
    break;

std::shared_ptr<objects::Object> callFunction(int numArgs)
{
    // 要跳过实参个数
    auto fnObj = stack[sp - 1 - numArgs];

    if (fnObj->Type() != objects::ObjectType::COMPILED_FUNCTION)
    {
        return evaluator::newError("calling non-function");
    }

    auto compiledFnObj = std::dynamic_pointer_cast<objects::CompiledFunction>(fnObj);

    // 判断实参个数与形参个数是否一致
    if(compiledFnObj->NumParameters != numArgs)
    {
        std::string str1 = std::to_string(compiledFnObj->NumParameters);
        std::string str2 = std::to_string(numArgs);
        return evaluator::newError("wrong number of arguments: want=" + str1 + ", got=" + str2);
    }

    // 基指针也要跳过实参个数
    auto funcFrame = NewFrame(compiledFnObj, sp - numArgs);

    pushFrame(funcFrame);

    sp = funcFrame->basePointer + compiledFnObj->NumLocals;

    return nullptr;
}

执行效果:

>> let one = fn(){ 1; }
CompiledFunction[0x600000e84058]

>> let two = fn(){ let result = one(); return result + result; }
CompiledFunction[0x600000e84098]

>> let three = fn(two){ two() + 1; }
CompiledFunction[0x600000e8c058]

>> three(two)
3

总结¶

之前只有主帧的时候，一直在想后面对函数会怎么设计，栈帧让人有种豁然开朗的感觉；将函数参数与局部变量都使用调用栈存储，而不是动态分配内存，提高了性能。

本文作者：「吴羽舒(LeslieZhu) 」创作于Thursday, December 15, 2022，共3693字，需要19分钟阅读时间
微信搜索： 「 MinYiLife 」, 关注公众号!
本文链接： https://www.lesliezhu.com/blog/2022/12/15/writing_an_interpreter_in_cpp_9/
版权声明： 原创文章，如需转载请注明文章作者和出处。谢谢！