- The new machine learning tool predicts how long a given computer chip takes to execute specific code.
- It is more accurate than Intel’s own prediction model.
Determining the number of clock cycles a processor takes to execute a block of assembly instructions in steady-state is crucial for both performance engineers and compiler designers.
Developing an analytical model to do so is an extremely complicated task, especially in modern processor architectures where the task becomes more error-prone and must be performed from scratch for each processor generation.
Now, MIT researchers have built a machine learning tool that automates this process, making it faster, easier and more accurate than state-of-the-art handwritten tools currently used in static machine code analyzers and compiler backends.
They described this novel machine-learning pipeline in three conference papers:
1. Ithemal: A neural network model is trained on basic blocks of labeled data (blocks of computing instructions). It then predicts how long a given microprocessor takes to run unprocessed basic blocks.
2. BHive: To validate Ithemal, researchers created a benchmark suite of basic blocks from different fields, such as cryptography, compilers, machine learning, and graphics. They gathered over 300,000 blocks and put them into BHive, an open-source dataset.
The testing showed that Ithemal was able to predict how fast Intel processors would run code more accurately than the performance model developed by Intel itself.
3. Vemal: Researchers built a new method to automatically create an algorithm named Vemal that transforms specific code into vectors so that it can be sued for parallel computing.
Vemal performs better than hand-crafted vectorization algorithms used in industrial compilers, including the LLVM compiler.
Using Data Instead Of Chip’s Documentation
Intel does provide detailed documentation to explain its chip’s architecture. But only certain expert developers create performance models to simulate the code execution on those architectures. And since these chips are proprietary, Intel omits certain information in documentations.
What researchers did is they clocked the average number of cycles a chip takes to execute basic block instructions (such as run a specific command, shut down, and reboot), using a neural network.
The neural network automatically profiles million of blocks and gradually learns how different processor architectures run code. In simple terms, researchers used an artificial intelligence model to analyze data without focusing on the chip’s documentation.
Ithemal takes unseen basic blocks as input and generates a single number suggesting how long a given processor will take to run that code.
In the second paper, researchers demonstrated that Ithemal performs better than conventional hand-crafted models. While the error rate of Intel’s prediction model was 20%, Ithemal’s error rate was 10% on various basic blocks across different domains.
The model can be easily trained on new architectures: just gather more data from that chip, run it through the profiler and utilize that information to train Ithemal. That’s it; the model is now ready to estimate performance. It can learn performance speeds for any processor architecture, including Google’s new Tensor Processing Unit.
However, researchers still don’t know how this model makes predictions, as much of machine learning is a black box. In the next study, they will try to explore techniques that could interpret these models.