27

Compiling PHP's Eval() to .NET

 4 years ago
source link: https://www.peachpie.io/2020/02/evil-eval-2.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

We havealready mentioned the support of PHP’s eval() function a long time ago, but there have been some interesting updates on how the code being passed to eval() is processed!

eval is a well-known function of dynamic languages allowing you to execute code at runtime. The dynamic language is usually interpreted and the runtime contains the parser and interpreter already, so implementing such a feature is no rocket science. Still…

It’s very difficult to debug and analyze eval – that’s where the compilation and .NET come into place.

First and foremost, whenever eval is observed in the code, the compiler reports a warning diagnostic. This language construct allows for an execution of arbitrary code; therefore, PeachPie throws a warning as a precaution.

Features we’d like

So what makes eval() so evil? :smiling_imp:

  • It can potentially be dangerous
  • You don’t see the code
  • You can’t debug the code
  • It won’t get analyzed by your development tools
  • It generally indicates a design flaw, since the developer couldn’t write what they intended the code to do in the first place

Working with eval’ed code is simply inconvenient. Your development environment has to do a lot of work to allow uninterrupted debugging and exception handling.

Invocation of eval() at Run Time

PeachPie takes advantage of .NET and Roslyn whenever possible. This feature is a nice demonstration of the power of .NET, which isn’t needed as much in other typical .NET languages.

The runtime instantiates the whole compiler (PhpCompilation derived from Compilation Class ) and caches its instance for eventual subsequent calls to another eval() . Here goes the first trick:

PortablePDB and Embedded Source

Setting the compilation’s options to use the new PortablePDB allows us to take advantage of the Embedded Source feature. This means that we only need to remember the original code that’s being evaluated when it’s needed during debugging. And it’s all right there in the debug information in the standard .NET way! As a result, any CoreCLR debugger will be able to “step into” the code that only exists in memory.

2A3IJrn.png!webdotPeek : browsing the PortabPDB metadata. index.php.eval2.php is a made up file path.

In order to make this feature work, it is important to choose a virtual file path for our virtual embedded source code. The path must not contain any special characters (like $ , ~ , < , > or ` ) and must not exist on the file system. The Embedded Source itself must have a checksum using the SHA256 hashing algorithm.

if (options.EmitDebugInformation) { // == Debugger.IsAttached
  compilation = compilation
    .WithOptions(compilation.Options
        .WithOptimizationLevel(OptimizationLevel.Debug)
        .WithDebugPlusMode(true));

  emitOptions = emitOptions
    .WithDebugInformationFormat(DebugInformationFormat.PortablePdb);
  embeddedTexts = new[] {
    EmbeddedText.FromSource(tree.FilePath, tree.GetText())
  };
}

The snippet above sets up the compiler. EmitDebugInformation is set according to Debugger.IsAttached – we simply want to emit the debug information of eval() only if we are just debugging the program. Otherwise it’s just an unnecessary performance overhead.

Emit and Load!

result = compilation.Emit(peStream, pdbStream, emitOptions, embeddedTexts);

Roslyn’s compilation has a neat API method Emit , which according to our options performs the compilation – in-memory. It writes the content of the assembly and debug information into System.IO.Stream s – peStream and pdbStream . All the diagnostics are populated into the result object for further inspection; we might for example forbid the execution of code containing any warning or just a specific warning.

assembly = System.Runtime.Loader.AssemblyLoadContext
  .Default
  .LoadFromStream(peStream, pdbStream)

We keep things in memory only and load our resulting assembly from the stream. Assuming you have a debugger attached, it will load the PDB and process it. Using reflection, we find the script entry point from the loaded assembly (in case of PHP scripts it’s a static method <eval>`xxx.<Main>(Context, PhpArray, object, RuntimeTypeHandle) . It takes all the parameters it needs to properly execute within PHP code – runtimeContext, array with local variables, reference to $this object and the current type context.

The method is then invoked! The rest is handled by the CoreCLR debugger. If you press F11 in your favorite IDE, the debugger will jump over the compilation stuff and steps right inside the invoked method. The debugger will start looking for the source code, which is annotated to be in our virtual file path. It finds it within the loaded debug information and displays it to you!

Visual Studio, for instance, will create a temporary file, copies the content of Embedded Source document into it and opens it in the editor. Also, it remembers the mapping between the original virtual file path and allows you to put breakpoints in there.

Visual Studio Code will open a virtual document for this purpose allowing you to do the same.

Make things nicer

A small touch is to make the virtual Embedded Source text colorized when opened by the debugger in your IDE. Fortunately, this is not that hard. First, the virtual file path has to have the right file extension, in our case .php . (The other option would be to implement a custom language service for the IDE as an extension).

3mMfy2R.png!web Debugging a virtual embedded source document

Next, since the code passed to eval() in PHP omits the opening code tags ( <?php ), the editor syntax highlighter would treat the code as text or HTML. So before compiling the code, we prefix it with the commented opening tag: #<?php . The hash is treated as a line comment, but the editor views it as HTML text, and <?php as the opening tag. The rest of the code is nicely colorized. (the other editor features then work as well – tooltips, structure guidelines, outlining, etc.)

To Summarize

I don’t think Embedded Source and Roslyn’s compilation were designed for dynamic languages in the first place, but it seems like a perfect use case for those features. This greatly improves the debugging experience of compiled dynamic languages – out of the box.

Just a side note – this works for a few other dynamic documents in PHP that cause other IDEs headaches – create_function() , eval() and debugging inside the PHAR archives, which are embedded into PDB as well, at compile-time already.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK