The execution of the compiled code do not rely on the source code at all. They simply made it so that the serializer dumps the source code into the package along with the compiled bytecode. UScript uses UProperty meta objects to briefly describe member types, names and their offsets. This is what allows functions such as GetPropertyText and SetPropertyText to function using the name of a member. The compiler reserves 4 bytes in the byte code right after the 1 byte opcode number for one of the following member types ;
- UBoolProperty
- UIntProperty
- UStrProperty
- UNameProperty
- UFloatProperty
- UByteProperty
- UFixedArrayProperty
- UClassProperty
- UArrayProperty (Dynamic array)
- UMapProperty (Mapped array)
- UStructProperty
- UObjectProperty
The dynamic UScript linker then iterates over these reserved slots and fills them in with the address to the meta objects in runtime, similar to how WIndows and Linux loads dlls and so files. This gives the uscript instruction direct access to the meta object without having to look for it, in order to get the offset for the desired member variable in the given context (local, instance, default). It would look something like this:
Code: Select all
GProperty = (UProperty*)Stack.ReadObject();
GPropAddr = Stack.Locals + GProperty->Offset;
GPropAddr would now store the address to the variable. It's messy and slow, but this is how they made it.
Functions are even slower. Any virtual function is being searched for (requires iterations) when called. Only final functions are treated the same way as the above (address stored in the byte code itself).
Virtual functions are poorly hashed using the first 8 bits of the name hash and placed in hash bins that are searched everytime you call a virtual UScript function.
Therefore, it's possible to determine the name of the variables, functions and classes, but not what line the code corresponds to. That would take up a lot of extra, unnecessary space in the byte code, and would be impossible to do without rewriting the compiler.
It would be interesting to rewrite the compiler to generate an external debug database, similar to what native debuggers use to break at certain lines. That would then make it possible to determine the line the code corresponds to.
The opcode printed in the log isn't particularly useful unless you look at, and understand the native part of the Core. The opcode will correspond to an index into the GNatives table.
Epic initially made opcodes occupy 1 byte, this means a maximum of 256 opcodes could be used. This created severe limitations since it would limit the number of native functions + intrinsics to 256.
Therefore they extended it to 2 bytes, a so called High Native. The engine first reads the 1 byte (Stack.Step) like it's in an 8-bit mode, jumps to that index in the GNatives table, and if the index is >= 0x60, that index will point to a native function in the table that reads a second byte, adds them together (sort of like a dispatch that turns the engine into 16-bit mode).
It would then jump to the target native function using the new 16 bit opcode. Yeah the engine is pretty messy and hackish. So for native functions (declared in UScript as native), the printed opcode would point to the opcode number of that function. For opcodes < 0x60, it would correspond to either an internal script instruction, or the few low native functions declared in Object.uc. I put it in there to help me debug the debug library
The reason it isn't possible to simply open a package in a text editor, remove the source code and then save it again is because the package format contain headers that store offsets and sizes. If you simply erase the source code, without having all of those offsets and sizes changed accordingly, the package becomes corrupt and useless. In order to remove source code, the engine has to load the serialized objects, empty the text buffers, then save the objects again using recalculated offsets and sizes.