Getting Deep Into EVM: How Ethereum Works Backstage

An Ultimate, In-depth Explanation of How EVM Works.

Receive curated Web 3.0 content like this with a summary every day via WhatsApp, Telegram, Discord, or Email.

SimpleAsWater Daily Web 3.0 Newsletter

SimpleAsWater Daily Web 3.0 Newsletter: Receive 1 curated Web 3.0 resource with summary every evening on WhatsApp…

This post is a continuation of my Getting Deep Into Series started in an effort to provide a deeper understanding of the internal workings and other cool stuff about Ethereum and blockchain in general which you will not find easily on the web. Here are the previous parts of the Series in case you missed them:

Getting Deep Into Geth: Why Syncing Ethereum Node Is Slow

Downloading the blocks is just a small part. There is a lot of stuff going on…

medium.com

Getting Deep Into Ethereum: How Data Is Stored In Ethereum?

In this post, we will see how states and transactions are stored in Ethereum and how it is different from Bitcoin.

medium.com

In this part, we are going to explore explain and describe in detail the core behavior of the EVM. We will see how contracts are created, how message calls work, and take a look at everything related to data management, such as storage, memory, calldata, and the stack.

To better understand this article, you should be familiar with the basics of the Ethereum. If you are not, I highly recommend reading these posts first.

8 Resources to Get Started With Ethereum

The ultimate guide for understanding & Starting with Ethereum.

medium.com

Throughout this post, we will illustrate some examples and demonstrations using sample contracts you can find in this repository. Please clone it, run npm install, and check it out before beginning.

Enjoy, and please do not hesitate to reach out with questions, suggestions or feedback.

EVM: 10,000 ft Perspective

Before diving into understanding how EVM works and seeing it working via code examples, let’s see where EVM fits in the Ethereum and what are its components. Don’t get scared by these diagrams because as soon as you are done reading this article you will be able to make a lot of sense out of these diagrams.

The below diagram shows where EVM fits into Ethereum.

getting-deep-into-evm-how-ethereum-works-backstage-ab6ad9c0d0bf

The below diagram shows the basic Architecture of EVM.

This below diagram shows how different parts of EVM interact with each other to make Ethereum do its magic.

We have seen what EVM looks like. Now it’s time to start understanding how these parts play a significant role in the way Ethereum works.

Ethereum Contracts

Basics

Smart contracts are just computer programs, and we can say that Ethereum contracts are smart contracts that run on the Ethereum Virtual Machine. The EVM is the sandboxed runtime and a completely isolated environment for smart contracts in Ethereum. This means that every smart contract running inside the EVM has no access to the network, file system, or other processes running on the computer hosting the VM.

As we already know, there are two kinds of accounts: contracts and external accounts. Every account is identified by an address, and all accounts share the same address space. The EVM handles addresses of 160-bit length.

Every account consists of a balance, a nonce, bytecode, and stored data (storage). However, there are some differences between these two kinds of accounts. For instance, the code and storage of external accounts are empty, while contract accounts store their bytecode and the Merkle root hash of the entire state tree. Moreover, while external addresses have a corresponding private key, contract accounts don’t. The actions of contract accounts are controlled by the code they host in addition to the regular cryptographic signing of every Ethereum transaction.

Creation

The creation of a contract is simply a transaction in which the receiver address is empty and its data field contains the compiled bytecode of the contract to be created (this makes sense — contracts can create contracts too). Let’s look at a quick example. Please open the directory of exercise 1; in it, you will find a contract called MyContract with the following code:

Let’s open a truffle console in develop mode running truffle develop. Once inside, follow the subsequent commands to deploy an instance of MyContract:

truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: MyContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)

We can check that our contract has been deployed successfully by running the following code:

truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> myContract = new MyContract(receipt.contractAddress)
truffle(develop)> myContract.add(10, 2){ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }

Let’s go deeper to analyze what we just did. The first thing that happens when a new contract is deployed to the Ethereum blockchain is that its account is created. As you can see, we logged the address of the contract in the constructor in the example above. You can confirm this by checking that receipt.logs[0].data is the address of the contract-padded 32 bytes and that receipt.logs[0].topics is the keccak-256 hash of the string “Log(address)”.

As the next step, the data sent in with the transaction is executed as bytecode. This will initialize the state variables in storage, and determine the body of the contract being created. This process is executed only once during the lifecycle of a contract. The initialization code is not what is stored in the contract; it actually produces as its return value the bytecode to be stored. Bear in mind that after a contract account has been created, there is no way to change its code.²

Given the fact that the initialization process returns the code of the contract’s body to be stored, it makes sense that this code isn’t reachable from the constructor logic. For example, let’s take a look at the Impossible contract of exercise 1:

If you try to compile this contract, you will get a warning saying you’re referencing this within the constructor function, but it will compile. However, if you try to deploy a new instance, it will revert. This is because it makes no sense to attempt to run code that is not stored yet.³ On the other hand, we were able to access the address of the contract: the account exists, but it doesn’t have any code yet.

However, code execution can produce other events, such as altering the storage, creating further accounts, or making further message calls. For example, let’s take a look at the AnotherContract code:

Let’s see how it works running the following commands inside a truffle console:

truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: AnotherContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)
truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> anotherContract = AnotherContract.at(receipt.contractAddress)
truffle(develop)> anotherContract.myContract().then(a => myContractAddress = a)
truffle(develop)> myContract = MyContract.at(myContractAddress)
truffle(develop)> myContract.add(10, 2){ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }

Additionally, contracts can be created using the CREATE opcode, which is what the Solidity new construct compiles down to. Both alternatives work the same way. Let’s continue exploring how message calls work.

Message Calls

Contracts can call other contracts through message calls. Every time a Solidity contract calls a function of another contract, it does so by producing a message call. Every call has a sender, a recipient, a payload, a value, and an amount of gas. The depth of the message call is limited to less than 1024 levels.

Solidity provides a native call method for the address type that works as follows:

address.call.gas(gas).value(value)(data)

gas is the amount of gas to be forwarded, the address is the address to be called, value is the amount of Ether to be transferred in Wei, and data is the payload to be sent. Bear in mind that value and gas are optional parameters here, but be careful because almost all the remaining gas of the sender will be sent by default in a low-level call.

As you can see, each contract can decide the amount of gas to be forwarded in a call. Given that every call can end in an out-of-gas (OOG) exception, to avoid security issues at least 1/64th of the sender’s remaining gas will be saved. This allows senders to handle inner calls’ out-of-gas errors so that they are able to finish its execution without themselves running out of gas, and thus bubbling the exception up.

Let’s take a look at the Caller contract of exercise 2:

The Caller contract has only a fallback function that redirects every received call to an instance of Implementation. This instance simply throws through an assert(false) on every received call, which will consume all the gas given. Then, the idea here is to log the amount of gas in Caller before and right after forwarding a call to Implementation. Let’s open a truffle console and see what happens:

truffle(develop)> compile
truffle(develop)> Caller.new().then(i => caller = i)
truffle(develop)> opts = { gas: 4600000 }
truffle(develop)> caller.sendTransaction(opts).then(r => result = r)
truffle(develop)> logs = result.receipt.logs
truffle(develop)> parseInt(logs[0].data) //4578955
truffle(develop)> parseInt(logs[1].data) //71495

As you can see, 71495 is approximately the 64th part of 4578955. This example clearly demonstrates that we can handle an OOG exception from an inner call.

Solidity also provides the following opcode, allowing us to manage calls with inline assembly:

call(g, a, v, in, insize, out, outsize)

Where g is the amount of gas to be forwarded, a is the address to be called, v is the amount of Ether to be transferred in wei, in states the memory position of insize bytes where the call data is held, and out and outsize state where the return data will be stored in memory. The only difference is that an assembly call allows us to handle return data, while the function will only return 1 or 0 whether it failed or not.

The EVM supports a special variant of a message call called delegatecall. Once again, Solidity provides a built-in address method in addition to an inline assembly version of it. The difference with a low-level call is that the target code is executed within the context of the calling contract, and msg.sender and msg.value do not change.

Let’s analyze the following example to understand better how a delegatecall works. Let’s start with the Greeter contract:

As you can see, the Greeter contract simply declares a thanks function that emits an event carrying the msg.value and msg.sender data. We can try this method by running the following lines in a truffle console:

truffle(develop)> compile
truffle(develop)> someone = web3.eth.accounts[0]
truffle(develop)> ETH_2 = new web3.BigNumber(‘2e18’)
truffle(develop)> Greeter.new().then(i => greeter = i)
truffle(develop)> opts = { from: someone, value: ETH_2 }
truffle(develop)> greeter.thanks(opts).then(tx => log = tx.logs[0])
truffle(develop)> log.event                     //Thanks
truffle(develop)> log.args.sender === someone   //true
truffle(develop)> log.args.value.eq(ETH_2)      //true

Now that we have confirmed its functionality, let’s pay attention to the Wallet contract:

This contract only defines a fallback function that executes the Greeter contract’s thanks method through a delegatecall. Let’s see what happens when we call Greeter contract’s thanks through the Wallet contract:

truffle(develop)> Wallet.new().then(i => wallet = i)
truffle(develop)> wallet.sendTransaction(opts).then(r => tx = r)
truffle(develop)> logs = tx.receipt.logs
truffle(develop)> SolidityEvent = require(‘web3/lib/web3/event.js’)
truffle(develop)> Thanks = Object.values(Greeter.events)[0]
truffle(develop)> event = new SolidityEvent(null, Thanks, 0)
truffle(develop)> log = event.decode(logs[0])
truffle(develop)> log.event                    // Thanks
truffle(develop)> log.args.sender === someone  // true
truffle(develop)> log.args.value.eq(ETH_2)     // true

As you may have noticed, we have just confirmed that the delegatecall function preserves the msg.value and msg.sender.

This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address. This makes it possible to implement the ‘library’ feature in Solidity.”

There is one more thing we should explore about delegate calls. As mentioned above, the storage of the calling contract is the one being accessed by the executed code. Let’s see the Calculator contract:

The Calculator contract has just two functions: add and product. The Calculator contract doesn’t know how to add or multiply; it delegates those calls to the Addition and Product contracts respectively instead. However, all these contracts share the same state variable result to store the result of each calculation. Let’s see how this works:

truffle(develop)> Calculator.new().then(i => calculator = i)
truffle(develop)> calculator.addition().then(a => additionAddress=a)
truffle(develop)> addition = Addition.at(additionAddress)
truffle(develop)> calculator.product().then(a => productAddress = a)
truffle(develop)> product = Product.at(productAddress)
truffle(develop)> calculator.add(5)
truffle(develop)> calculator.result().then(r => r.toString()) // 5
truffle(develop)> addition.result().then(r => r.toString())   // 0
truffle(develop)> product.result().then(r => r.toString())    // 0
truffle(develop)> calculator.mul(2)
truffle(develop)> calculator.result().then(r => r.toString()) // 10
truffle(develop)> addition.result().then(r => r.toString())   // 0
truffle(develop)> product.result().then(r => r.toString())    // 0

We have just confirmed that we are using the storage of the Calculator contract. Besides that, the code being executed is stored in the Addition and in the Product contracts.

Additionally, as for the call function, there is a Solidity assembly opcode version for delegatecall. Let’s take a look at the Delegator contract to see how we can use it:

This time we are using inline assembly to execute the delegatecall. As you may have noticed, there is no value argument here, since msg.value will not change. You may be wondering why we are loading the 0x40 address, or what calldatacopy and calldatasize are. Don’t panic — we will describe them in the next post of the series. In the meantime, feel free to run the same commands over a truffle console to validate its behavior.

Once again, it’s important to understand clearly how a delegatecall works. Every triggered call will be sent from the current contract and not the delegate-called contract. Additionally, the executed code can read and write to the storage of the caller contract. If not implemented properly, even the smallest of your mistakes can lead to losses in millions. Here is a list of the most expensive mistakes in the history of Ethereum.

https://medium.com/hackernoon/hackpedia-16-solidity-hacks-vulnerabilities-their-fixes-and-real-world-examples-f3210eba5148

Data Management

The EVM manages different kinds of data depending on their context, and it does that in different ways. We can distinguish at least four main types of data: stack, calldata, memory, and storage, besides the contract code. Let’s analyze each of these:

Stack

The EVM is a stack machine, meaning that it doesn’t operate on registers but on a virtual stack. The stack has a maximum size of 1024. Stack items have a size of 256 bits; in fact, the EVM is a 256-bit word machine (this facilitates Keccak256 hash scheme and elliptic-curve computations). Here is where most opcodes consume their parameters from.

The EVM provides many opcodes to modify the stack directly. Some of these include:

POP removes item from the stack.
PUSH n places the following n bytes item in the stack, with n from 1 to 32.
DUP n duplicates the nth stack item, with n from 1 to 32.
SWAP n exchanges the 1st and nth stack item, with n from 1 to 32.

Call data

The calldata is a read-only byte-addressable space where the data parameter of a transaction or call is held. Unlike the stack, to use this data you have to specify an exact byte offset and number of bytes you want to read.

The opcodes provided by the EVM to operate with the calldata include:

CALLDATASIZE tells the size of the transaction data.
CALLDATALOADloads 32 bytes of the transaction data onto the stack.
CALLDATACOPYcopies a number of bytes of the transaction data to memory.

Solidity also provides an inline assembly version of these opcodes. These are calldatasize, calldataload and calldatacopy respectively. The last one expects three arguments (t, f, s): it will copy s bytes of calldata at position f into memory at position t. In addition, Solidity lets you access to the calldata through msg.data.

As you may have noticed, we used some of these opcodes in some examples of the previous post. Let’s take a look at the inline assembly code block of a delegatecall again:

assembly {
  let ptr := mload(0x40)
  calldatacopy(ptr, 0, calldatasize)
  let result := delegatecall(gas, _impl, ptr, calldatasize, 0, 0)
}

To delegate the call to the _impl address, we must forward msg.data. Given that the delegatecall opcode operates with data in memory, we need to copy the calldata into memory first. That’s why we use calldatacopy to copy all the calldata to a memory pointer (note that we are using calldatasize).

Let’s analyze another example using calldata. You will find a Calldata contract in the exercise 3 folder with the following code:

The idea here is to return the addition of two numbers passed by arguments. As you can see, once again we are loading a memory pointer reading from 0x40, but please ignore that for now; we will explain it right after this example. We are storing that memory pointer in the variable a and storing in b the following position which is 32-bytes right after a. Then we use calldatacopy to store the first parameter in a. You may have noticed we are copying it from the 4th position of the calldata instead of its beginning. This is because the first 4 bytes of the calldata hold the signature of the function being called, in this case, bytes4(keccak256(“add(uint256,uint256)”)); this is what the EVM uses to identify which function has to be executed on a call. Then, we store the second parameter in b copying the following 32 bytes of the calldata. Finally, we just need to calculate the addition of both values loading them from memory.

You can test this yourself with a truffle console running the following commands:

truffle(develop)> compile
truffle(develop)> Calldata.new().then(i => calldata = i)
truffle(develop)> calldata.add(1, 6).then(r => r.toString())    // 7

Memory

Memory is a volatile read-write byte-addressable space. It is mainly used to store data during execution, mostly for passing arguments to internal functions. Given this is a volatile area, every message call starts with a cleared memory. All locations are initially defined as zero. As calldata, memory can be addressed at the byte level, but can only read 32-byte words at a time.

Memory is said to “expand” when we write to a word in it that was not previously used. Additionally to the cost of the write itself, there is a cost to this expansion, which increases linearly for the first 724 bytes and quadratically after that.

The EVM provides three opcodes to interact with the memory area:

MLOAD loads a word from memory into the stack.
MSTOREsaves a word to memory.
MSTORE8 saves a byte to memory.

Solidity also provides an inline assembly version of these opcodes.

There is another key thing we need to know about memory. Solidity always stores a free memory pointer at position 0x40, i.e. a reference to the first unused word in memory. That’s why we load this word to operate with inline assembly. Since the initial 64 bytes of memory is reserved for the EVM, this is how we can ensure that we are not overwriting memory that is used internally by Solidity. For instance, in the delegatecall example presented above, we were loading this pointer to store the given calldata to forward it. This is because the inline-assembly opcode delegatecall needs to fetch its payload from memory.

Additionally, if you pay attention to the bytecode output by the Solidity compiler, you will notice that all of them start with 0x6060604052…, which means:

PUSH1   :  EVM opcode is 0x60
0x60    :  The free memory pointer
PUSH1   :  EVM opcode is 0x60
0x40    :  Memory position for the free memory pointer
MSTORE  :  EVM opcode is 0x52

You must be very careful when operating with memory at assembly level. Otherwise, you could overwrite a reserved space.

Storage

Storage is a persistent read-write word-addressable space. This is where each contract stores its persistent information. Unlike memory, storage is a persistent area and can only be addressed by words. It is a key-value mapping of 2²⁵⁶ slots of 32 bytes each. A contract can neither read nor write to any storage apart from its own. All locations are initially defined as zero.

The amount of gas required to save data into storage is one of the highest among operations of the EVM. This cost is not always the same. Modifying a storage slot from a zero value to a non-zero one costs 20,000. While storing the same non-zero value or setting a non-zero value to zero costs 5,000. However, in the last scenario, when a non-zero value is set to zero, a refund of 15,000 will be given.

The EVM provides two opcodes to operate the storage:

SLOAD loads a word from storage into the stack.
SSTOREsaves a word to storage.

These opcodes are also supported by the inline assembly of Solidity.

Solidity will automatically map every defined state variable of your contract to a slot in storage. The strategy is fairly simple — statically sized variables (everything except mappings and dynamic arrays) are laid out contiguously in storage starting from position 0.

For dynamic arrays, this slot (p) stores the length of the array and its data will be located at the slot number that results from hashing p(keccak256(p)). For mappings, this slot is unused and the value corresponding to a key k will be located at keccak256(k,p). Bear in mind that the parameters of keccak256 (k and p) are always padded to 32 bytes.

Let’s take a look at a code example to understand how this works. Inside the exercise 3 contracts folder you will find a Storage contract with the following code:

Now, let’s open a truffle console to test its storage structure. First, we will compile and create a new contract instance:

truffle(develop)> compile
truffle(develop)> Storage.new().then(i => storage = i)

Then we can ensure that the address 0 holds a number 2 and the address 1 holds the address of the contract:

truffle(develop)> web3.eth.getStorageAt(storage.address, 0)  // 0x02
truffle(develop)> web3.eth.getStorageAt(storage.address, 1)  // 0x..

We can check that the storage position 2 holds the length of the array as follows:

truffle(develop)> web3.eth.getStorageAt(storage.address, 2)  // 0x02

Finally, we can check that the storage position 3 is unused and the mapping values are stored as we described above:

truffle(develop)> web3.eth.getStorageAt(storage.address, 3) 
// 0x00truffle(develop)> mapIndex = ‘0000000000000000000000000000000000000000000000000000000000000003’truffle(develop)> firstKey = ‘0000000000000000000000000000000000000000000000000000000000000001’truffle(develop)> firstPosition = web3.sha3(firstKey + mapIndex, { encoding: ‘hex’ })truffle(develop)> web3.eth.getStorageAt(storage.address, firstPosition)
// 0x09truffle(develop)> secondKey = ‘0000000000000000000000000000000000000000000000000000000000000002’truffle(develop)> secondPosition = web3.sha3(secondKey + mapIndex, { encoding: ‘hex’ })truffle(develop)> web3.eth.getStorageAt(storage.address, secondPosition)
// 0x0A

Great! We have demonstrated that the Solidity storage strategy works as we understand it! To learn more about how Solidity maps state variables into the storage, read the official documentation.

That’s it. I hope that now you have a much better understanding of what role EVM plays in Ethereum’s infrastructure.

“The address of the new account is defined as being the rightmost 160 bits of the Keccak hash of the RLP encoding of the structure containing only the sender and the account nonce.” (Ethereum Yellow Paper)

² One of the pillars of zeppelin_os is contract upgradeability. At Zeppelin, we’ve been exploring different strategies to implement this. You can read more about this here.

³ Solidity will check before calling an external function that the address has bytecode in it, and otherwise revert.

Getting Deep Into EVM: How Ethereum Works Backstage

Getting Deep Into EVM: How Ethereum Works Backstage

An Ultimate, In-depth Explanation of How EVM Works.

SimpleAsWater Daily Web 3.0 Newsletter

SimpleAsWater Daily Web 3.0 Newsletter: Receive 1 curated Web 3.0 resource with summary every evening on WhatsApp…

Getting Deep Into Geth: Why Syncing Ethereum Node Is Slow

Downloading the blocks is just a small part. There is a lot of stuff going on…

Getting Deep Into Ethereum: How Data Is Stored In Ethereum?

In this post, we will see how states and transactions are stored in Ethereum and how it is different from Bitcoin.

8 Resources to Get Started With Ethereum

The ultimate guide for understanding & Starting with Ethereum.

EVM: 10,000 ft Perspective

Ethereum Contracts

Basics

Creation

Message Calls

Data Management

Stack

Call data

Memory

Storage

Recommend

How to Write Better Posts as a Developer.

Inferences from logistic regression models in the presence of small samples, rar...

Adediwura Boluro-Ajayi

GitHub - MoserMichael/pyasmtool: Explores the python bytecode, provides some too...

Muhammad Attique Anwar

We Made A Dashboard for IPFS Clusters and Now We Want You to Check it Out

What We Learned After Winning $2000 at EthIndia Hackathon?

Building a Chat Application Using Libp2p

Youtube On IPFS in 5 mins

GitHub - giampaolo/psutil: Cross-platform lib for process and system monitoring...

About Joyk