Getting Deep Into EVM: How Ethereum Works Backstage
source link: https://medium.com/swlh/getting-deep-into-evm-how-ethereum-works-backstage-ab6ad9c0d0bf
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Getting Deep Into EVM: How Ethereum Works Backstage
An Ultimate, In-depth Explanation of How EVM Works.
Receive curated Web 3.0 content like this with a summary every day via WhatsApp, Telegram, Discord, or Email.
This post is a continuation of my Getting Deep Into Series started in an effort to provide a deeper understanding of the internal workings and other cool stuff about Ethereum and blockchain in general which you will not find easily on the web. Here are the previous parts of the Series in case you missed them:
In this part, we are going to explore explain and describe in detail the core behavior of the EVM. We will see how contracts are created, how message calls work, and take a look at everything related to data management, such as storage
, memory
, calldata
, and the stack.
To better understand this article, you should be familiar with the basics of the Ethereum. If you are not, I highly recommend reading these posts first.
Throughout this post, we will illustrate some examples and demonstrations using sample contracts you can find in this repository. Please clone it, run npm install
, and check it out before beginning.
Enjoy, and please do not hesitate to reach out with questions, suggestions or feedback.
EVM: 10,000 ft Perspective
Before diving into understanding how EVM works and seeing it working via code examples, let’s see where EVM fits in the Ethereum and what are its components. Don’t get scared by these diagrams because as soon as you are done reading this article you will be able to make a lot of sense out of these diagrams.
The below diagram shows where EVM fits into Ethereum.
The below diagram shows the basic Architecture of EVM.
This below diagram shows how different parts of EVM interact with each other to make Ethereum do its magic.
We have seen what EVM looks like. Now it’s time to start understanding how these parts play a significant role in the way Ethereum works.
Ethereum Contracts
Basics
Smart contracts are just computer programs, and we can say that Ethereum contracts are smart contracts that run on the Ethereum Virtual Machine. The EVM is the sandboxed runtime and a completely isolated environment for smart contracts in Ethereum. This means that every smart contract running inside the EVM has no access to the network, file system, or other processes running on the computer hosting the VM.
As we already know, there are two kinds of accounts: contracts and external accounts. Every account is identified by an address, and all accounts share the same address space. The EVM handles addresses of 160-bit length.
Every account consists of a balance, a nonce, bytecode, and stored data (storage). However, there are some differences between these two kinds of accounts. For instance, the code and storage of external accounts are empty, while contract accounts store their bytecode and the Merkle root hash of the entire state tree. Moreover, while external addresses have a corresponding private key, contract accounts don’t. The actions of contract accounts are controlled by the code they host in addition to the regular cryptographic signing of every Ethereum transaction.
Creation
The creation of a contract is simply a transaction in which the receiver address is empty and its data field contains the compiled bytecode of the contract to be created (this makes sense — contracts can create contracts too). Let’s look at a quick example. Please open the directory of exercise 1; in it, you will find a contract called MyContract
with the following code:
Let’s open a truffle console in develop mode running truffle develop. Once inside, follow the subsequent commands to deploy an instance of MyContract
:
truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: MyContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)
We can check that our contract has been deployed successfully by running the following code:
truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> myContract = new MyContract(receipt.contractAddress)
truffle(develop)> myContract.add(10, 2){ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }
Let’s go deeper to analyze what we just did. The first thing that happens when a new contract is deployed to the Ethereum blockchain is that its account is created. As you can see, we logged the address of the contract in the constructor in the example above. You can confirm this by checking that receipt.logs[0].data
is the address of the contract-padded 32 bytes
and that receipt.logs[0].topics
is the keccak-256
hash of the string “Log(address)”
.
As the next step, the data sent in with the transaction is executed as bytecode. This will initialize the state variables in storage, and determine the body of the contract being created. This process is executed only once during the lifecycle of a contract. The initialization code is not what is stored in the contract; it actually produces as its return value the bytecode to be stored. Bear in mind that after a contract account has been created, there is no way to change its code.²
Given the fact that the initialization process returns the code of the contract’s body to be stored, it makes sense that this code isn’t reachable from the constructor logic. For example, let’s take a look at the Impossible
contract of exercise 1:
If you try to compile this contract, you will get a warning saying you’re referencing this within the constructor function, but it will compile. However, if you try to deploy a new instance, it will revert. This is because it makes no sense to attempt to run code that is not stored yet.³ On the other hand, we were able to access the address of the contract: the account exists, but it doesn’t have any code yet.
However, code execution can produce other events, such as altering the storage, creating further accounts, or making further message calls. For example, let’s take a look at the AnotherContract
code:
Let’s see how it works running the following commands inside a truffle console:
truffle(develop)> compile
truffle(develop)> sender = web3.eth.accounts[0]
truffle(develop)> opts = { from: sender, to: null, data: AnotherContract.bytecode, gas: 4600000 }
truffle(develop)> txHash = web3.eth.sendTransaction(opts)
truffle(develop)> receipt = web3.eth.getTransactionReceipt(txHash)
truffle(develop)> anotherContract = AnotherContract.at(receipt.contractAddress)
truffle(develop)> anotherContract.myContract().then(a => myContractAddress = a)
truffle(develop)> myContract = MyContract.at(myContractAddress)
truffle(develop)> myContract.add(10, 2){ [String: ‘12’] s: 1, e: 1, c: [ 12 ] }
Additionally, contracts can be created using the CREATE
opcode, which is what the Solidity new construct compiles down to. Both alternatives work the same way. Let’s continue exploring how message calls work.
Message Calls
Contracts can call other contracts through message calls. Every time a Solidity contract calls a function of another contract, it does so by producing a message call. Every call has a sender, a recipient, a payload, a value, and an amount of gas. The depth of the message call is limited to less than 1024 levels.
Solidity provides a native call method for the address type that works as follows:
address.call.gas(gas).value(value)(data)
gas is the amount of gas to be forwarded, the address is the address to be called, value
is the amount of Ether to be transferred in Wei, and data
is the payload to be sent. Bear in mind that value
and gas
are optional parameters here, but be careful because almost all the remaining gas
of the sender will be sent by default in a low-level call.
As you can see, each contract can decide the amount of gas to be forwarded in a call. Given that every call can end in an out-of-gas (OOG) exception, to avoid security issues at least 1/64th of the sender’s remaining gas will be saved. This allows senders to handle inner calls’ out-of-gas errors so that they are able to finish its execution without themselves running out of gas, and thus bubbling the exception up.
Let’s take a look at the Caller
contract of exercise 2:
The Caller
contract has only a fallback function that redirects every received call to an instance of Implementation. This instance simply throws through an assert(false)
on every received call, which will consume all the gas given. Then, the idea here is to log the amount of gas in Caller
before and right after forwarding a call to Implementation
. Let’s open a truffle console and see what happens:
truffle(develop)> compile
truffle(develop)> Caller.new().then(i => caller = i)
truffle(develop)> opts = { gas: 4600000 }
truffle(develop)> caller.sendTransaction(opts).then(r => result = r)
truffle(develop)> logs = result.receipt.logs
truffle(develop)> parseInt(logs[0].data) //4578955
truffle(develop)> parseInt(logs[1].data) //71495
As you can see, 71495 is approximately the 64th part of 4578955. This example clearly demonstrates that we can handle an OOG exception from an inner call.
Solidity also provides the following opcode, allowing us to manage calls with inline assembly:
call(g, a, v, in, insize, out, outsize)
Where g
is the amount of gas to be forwarded, a
is the address to be called, v
is the amount of Ether to be transferred in wei
, in
states the memory position of insize
bytes where the call data is held, and out
and outsize
state where the return data will be stored in memory. The only difference is that an assembly call allows us to handle return data, while the function will only return 1 or 0 whether it failed or not.
The EVM supports a special variant of a message call called delegatecall
. Once again, Solidity provides a built-in address method in addition to an inline assembly version of it. The difference with a low-level call is that the target code is executed within the context of the calling contract, and msg.sender
and msg.value
do not change.
Let’s analyze the following example to understand better how a delegatecall
works. Let’s start with the Greeter
contract:
As you can see, the Greeter
contract simply declares a thanks
function that emits an event
carrying the msg.value
and msg.sender
data. We can try this method by running the following lines in a truffle console:
truffle(develop)> compile
truffle(develop)> someone = web3.eth.accounts[0]
truffle(develop)> ETH_2 = new web3.BigNumber(‘2e18’)
truffle(develop)> Greeter.new().then(i => greeter = i)
truffle(develop)> opts = { from: someone, value: ETH_2 }
truffle(develop)> greeter.thanks(opts).then(tx => log = tx.logs[0])
truffle(develop)> log.event //Thanks
truffle(develop)> log.args.sender === someone //true
truffle(develop)> log.args.value.eq(ETH_2) //true
Now that we have confirmed its functionality, let’s pay attention to the Wallet
contract:
This contract only defines a fallback
function that executes the Greeter
contract’s thanks
method through a delegatecall
. Let’s see what happens when we call Greeter
contract’s thanks
through the Wallet
contract:
truffle(develop)> Wallet.new().then(i => wallet = i)
truffle(develop)> wallet.sendTransaction(opts).then(r => tx = r)
truffle(develop)> logs = tx.receipt.logs
truffle(develop)> SolidityEvent = require(‘web3/lib/web3/event.js’)
truffle(develop)> Thanks = Object.values(Greeter.events)[0]
truffle(develop)> event = new SolidityEvent(null, Thanks, 0)
truffle(develop)> log = event.decode(logs[0])
truffle(develop)> log.event // Thanks
truffle(develop)> log.args.sender === someone // true
truffle(develop)> log.args.value.eq(ETH_2) // true
As you may have noticed, we have just confirmed that the delegatecall
function preserves the msg.value
and msg.sender
.
This means that a contract can dynamically load code from a different address at runtime. Storage, current address and balance still refer to the calling contract, only the code is taken from the called address. This makes it possible to implement the ‘library’ feature in Solidity.”
There is one more thing we should explore about delegate
calls. As mentioned above, the storage of the calling contract is the one being accessed by the executed code. Let’s see the Calculator
contract:
The Calculator
contract has just two functions: add
and product
. The Calculator contract doesn’t know how to add or multiply; it delegates those calls to the Addition
and Product
contracts respectively instead. However, all these contracts share the same state variable result to store the result of each calculation. Let’s see how this works:
truffle(develop)> Calculator.new().then(i => calculator = i)
truffle(develop)> calculator.addition().then(a => additionAddress=a)
truffle(develop)> addition = Addition.at(additionAddress)
truffle(develop)> calculator.product().then(a => productAddress = a)
truffle(develop)> product = Product.at(productAddress)
truffle(develop)> calculator.add(5)
truffle(develop)> calculator.result().then(r => r.toString()) // 5
truffle(develop)> addition.result().then(r => r.toString()) // 0
truffle(develop)> product.result().then(r => r.toString()) // 0
truffle(develop)> calculator.mul(2)
truffle(develop)> calculator.result().then(r => r.toString()) // 10
truffle(develop)> addition.result().then(r => r.toString()) // 0
truffle(develop)> product.result().then(r => r.toString()) // 0
We have just confirmed that we are using the storage of the Calculator
contract. Besides that, the code being executed is stored in the Addition
and in the Product
contracts.
Additionally, as for the call function, there is a Solidity assembly opcode version for delegatecall
. Let’s take a look at the Delegator
contract to see how we can use it:
This time we are using inline assembly to execute the delegatecall
. As you may have noticed, there is no value argument here, since msg.value
will not change. You may be wondering why we are loading the 0x40
address, or what calldatacopy
and calldatasize
are. Don’t panic — we will describe them in the next post of the series. In the meantime, feel free to run the same commands over a truffle console to validate its behavior.
Once again, it’s important to understand clearly how a delegatecall
works. Every triggered call will be sent from the current contract and not the delegate-called contract. Additionally, the executed code can read and write to the storage of the caller contract. If not implemented properly, even the smallest of your mistakes can lead to losses in millions. Here is a list of the most expensive mistakes in the history of Ethereum.
Data Management
The EVM manages different kinds of data depending on their context, and it does that in different ways. We can distinguish at least four main types of data: stack
, calldata
, memory
, and storage
, besides the contract code. Let’s analyze each of these:
Stack
The EVM is a stack machine, meaning that it doesn’t operate on registers but on a virtual stack. The stack has a maximum size of 1024. Stack items have a size of 256 bits; in fact, the EVM is a 256-bit word machine (this facilitates Keccak256
hash scheme and elliptic-curve computations). Here is where most opcodes consume their parameters from.
The EVM provides many opcodes to modify the stack directly. Some of these include:
POP
removes item from the stack.PUSH n
places the following n bytes item in the stack, with n from 1 to 32.DUP n
duplicates the nth stack item, with n from 1 to 32.SWAP n
exchanges the 1st and nth stack item, with n from 1 to 32.
Call data
The calldata
is a read-only byte-addressable space where the data parameter of a transaction or call is held. Unlike the stack, to use this data you have to specify an exact byte offset and number of bytes you want to read.
The opcodes provided by the EVM to operate with the calldata
include:
CALLDATASIZE
tells the size of the transaction data.CALLDATALOAD
loads 32 bytes of the transaction data onto the stack.CALLDATACOPY
copies a number of bytes of the transaction data to memory.
Solidity also provides an inline assembly version of these opcodes. These are calldatasize
, calldataload
and calldatacopy
respectively. The last one expects three arguments (t
, f
, s
): it will copy s
bytes of calldata
at position f
into memory at position t
. In addition, Solidity lets you access to the calldata
through msg.data
.
As you may have noticed, we used some of these opcodes in some examples of the previous post. Let’s take a look at the inline assembly code block of a delegatecall
again:
assembly {
let ptr := mload(0x40)
calldatacopy(ptr, 0, calldatasize)
let result := delegatecall(gas, _impl, ptr, calldatasize, 0, 0)
}
To delegate the call to the _impl
address, we must forward msg.data
. Given that the delegatecall
opcode operates with data in memory, we need to copy the calldata
into memory first. That’s why we use calldatacopy
to copy all the calldata
to a memory pointer (note that we are using calldatasize
).
Let’s analyze another example using calldata
. You will find a Calldata
contract in the exercise 3 folder with the following code:
The idea here is to return the addition of two numbers passed by arguments. As you can see, once again we are loading a memory pointer reading from 0x40
, but please ignore that for now; we will explain it right after this example. We are storing that memory pointer in the variable a
and storing in b
the following position which is 32-bytes right after a
. Then we use calldatacopy
to store the first parameter in a
. You may have noticed we are copying it from the 4th position of the calldata
instead of its beginning. This is because the first 4 bytes of the calldata
hold the signature of the function being called, in this case, bytes4(keccak256(“add(uint256,uint256)”));
this is what the EVM uses to identify which function has to be executed on a call. Then, we store the second parameter in b copying the following 32 bytes of the calldata
. Finally, we just need to calculate the addition of both values loading them from memory.
You can test this yourself with a truffle console running the following commands:
truffle(develop)> compile
truffle(develop)> Calldata.new().then(i => calldata = i)
truffle(develop)> calldata.add(1, 6).then(r => r.toString()) // 7
Memory
Memory is a volatile read-write byte-addressable space. It is mainly used to store data during execution, mostly for passing arguments to internal functions. Given this is a volatile area, every message call starts with a cleared memory. All locations are initially defined as zero. As calldata, memory can be addressed at the byte level, but can only read 32-byte words at a time.
Memory is said to “expand” when we write to a word in it that was not previously used. Additionally to the cost of the write itself, there is a cost to this expansion, which increases linearly for the first 724 bytes and quadratically after that.
The EVM provides three opcodes to interact with the memory area:
MLOAD
loads a word from memory into the stack.MSTORE
saves a word to memory.MSTORE8
saves a byte to memory.
Solidity also provides an inline assembly version of these opcodes.
There is another key thing we need to know about memory. Solidity always stores a free memory pointer at position 0x40
, i.e. a reference to the first unused word in memory. That’s why we load this word to operate with inline assembly. Since the initial 64 bytes of memory is reserved for the EVM, this is how we can ensure that we are not overwriting memory that is used internally by Solidity. For instance, in the delegatecall
example presented above, we were loading this pointer to store the given calldata
to forward it. This is because the inline-assembly opcode delegatecall
needs to fetch its payload from memory.
Additionally, if you pay attention to the bytecode output by the Solidity compiler, you will notice that all of them start with 0x6060604052…, which means:
PUSH1 : EVM opcode is 0x60
0x60 : The free memory pointer
PUSH1 : EVM opcode is 0x60
0x40 : Memory position for the free memory pointer
MSTORE : EVM opcode is 0x52
You must be very careful when operating with memory at assembly level. Otherwise, you could overwrite a reserved space.
Storage
Storage is a persistent read-write word-addressable space. This is where each contract stores its persistent information. Unlike memory, storage is a persistent area and can only be addressed by words. It is a key-value mapping of 2²⁵⁶ slots of 32 bytes each. A contract can neither read nor write to any storage apart from its own. All locations are initially defined as zero.
The amount of gas required to save data into storage is one of the highest among operations of the EVM. This cost is not always the same. Modifying a storage slot from a zero value to a non-zero one costs 20,000. While storing the same non-zero value or setting a non-zero value to zero costs 5,000. However, in the last scenario, when a non-zero value is set to zero, a refund of 15,000 will be given.
The EVM provides two opcodes to operate the storage:
SLOAD
loads a word from storage into the stack.SSTORE
saves a word to storage.
These opcodes are also supported by the inline assembly of Solidity.
Solidity will automatically map every defined state variable of your contract to a slot in storage. The strategy is fairly simple — statically sized variables (everything except mappings and dynamic arrays) are laid out contiguously in storage starting from position 0.
For dynamic arrays, this slot (p
) stores the length of the array and its data will be located at the slot number that results from hashing p(keccak256(p)
). For mappings, this slot is unused and the value corresponding to a key k
will be located at keccak256(k,p)
. Bear in mind that the parameters of keccak256 (k
and p
) are always padded to 32 bytes.
Let’s take a look at a code example to understand how this works. Inside the exercise 3 contracts folder you will find a Storage contract with the following code:
Now, let’s open a truffle console to test its storage structure. First, we will compile and create a new contract instance:
truffle(develop)> compile
truffle(develop)> Storage.new().then(i => storage = i)
Then we can ensure that the address 0 holds a number 2 and the address 1 holds the address of the contract:
truffle(develop)> web3.eth.getStorageAt(storage.address, 0) // 0x02
truffle(develop)> web3.eth.getStorageAt(storage.address, 1) // 0x..
We can check that the storage position 2 holds the length of the array as follows:
truffle(develop)> web3.eth.getStorageAt(storage.address, 2) // 0x02
Finally, we can check that the storage position 3 is unused and the mapping values are stored as we described above:
truffle(develop)> web3.eth.getStorageAt(storage.address, 3)
// 0x00truffle(develop)> mapIndex = ‘0000000000000000000000000000000000000000000000000000000000000003’truffle(develop)> firstKey = ‘0000000000000000000000000000000000000000000000000000000000000001’truffle(develop)> firstPosition = web3.sha3(firstKey + mapIndex, { encoding: ‘hex’ })truffle(develop)> web3.eth.getStorageAt(storage.address, firstPosition)
// 0x09truffle(develop)> secondKey = ‘0000000000000000000000000000000000000000000000000000000000000002’truffle(develop)> secondPosition = web3.sha3(secondKey + mapIndex, { encoding: ‘hex’ })truffle(develop)> web3.eth.getStorageAt(storage.address, secondPosition)
// 0x0A
Great! We have demonstrated that the Solidity storage strategy works as we understand it! To learn more about how Solidity maps state variables into the storage, read the official documentation.
That’s it. I hope that now you have a much better understanding of what role EVM plays in Ethereum’s infrastructure.
“The address of the new account is defined as being the rightmost 160 bits of the Keccak hash of the RLP encoding of the structure containing only the sender and the account nonce.” (Ethereum Yellow Paper)
² One of the pillars of zeppelin_os is contract upgradeability. At Zeppelin, we’ve been exploring different strategies to implement this. You can read more about this here.
³ Solidity will check before calling an external function that the address has bytecode in it, and otherwise revert.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK