-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
122 lines (116 loc) · 8.33 KB
/
index.html
File metadata and controls
122 lines (116 loc) · 8.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="icon" sizes="16x16 32x32 48x48 64x64 256x256" href="/favicon.ico">
<link rel="canonical" href="https://liblisa.nl/">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="libLISA derives x86-64 instruction semantics bottom-up from CPU behavior. The semantics are machine-readable and CPU-specific.">
<meta property="og:type" content="website">
<meta property="og:title" content="libLISA - Instruction Discovery and Analysis on x86-64">
<meta property="og:url" content="https://liblisa.nl">
<meta property="og:image" content="https://liblisa.nl/logo.png">
<meta property="og:description" content="libLISA derives x86-64 instruction semantics bottom-up from CPU behavior. The semantics are machine-readable and CPU-specific.">
<meta name="twitter:card" content="summary">
<meta property="twitter:domain" content="liblisa.nl">
<meta property="twitter:url" content="https://liblisa.nl">
<meta name="twitter:title" content="libLISA - Instruction Discovery and Analysis on x86-64">
<meta name="twitter:description" content="libLISA derives x86-64 instruction semantics bottom-up from CPU behavior. The semantics are machine-readable and CPU-specific.">
<meta name="twitter:image" content="https://liblisa.nl/logo.png">
<title>libLISA - Instruction Discovery and Analysis on x86-64</title>
<link rel="stylesheet" href="style.css">
</head>
<body>
<div class="header">
<h1>libLISA - Instruction Discovery and Analysis on x86-64</h1>
<div class="authors">Jos Craaijo, Freek Verbeek, Binoy Ravindran</div>
</div>
<article class="page">
<p>
libLISA is a tool that can <i>fully automatically</i> scan instruction space, discover instructions and synthesize their semantics.
It produces machine-readable, CPU-specific x86-64 instruction semantics.
It relies on as little human specification as possible:
specifically, it does not rely on a handwritten (dis)assembler to dictate which instructions are executable on a given CPU, or what their operands are.
</p>
<p>
We presented libLISA at <a href="https://2024.splashcon.org/track/splash-2024-oopsla#event-overview" target="_blank">OOPSLA'24</a>.
</p>
<div class="buttonrow">
<a class="button green" href="liblisa.pdf">
<img src="file-pdf-regular.svg" class="glyph" alt="PDF file" />
Read the paper
</a>
<a class="button yellow" target="_blank" href="https://explore.liblisa.nl/">
<img src="map-regular.svg" class="glyph" alt="map icon" />
Explore the data
</a>
<a class="button purple" target="_blank" href="https://github.com/liblisa">
<img src="github-brands-solid.svg" class="glyph" alt="GitHub" />
View source code
</a>
</div>
<h2>Motivation</h2>
<p>
Even though heavily researched, a full formal model of the x86-64 instruction set is still not available.
This is caused by the sheer complexity of the x86-64 architecture:
the informal specification found in <a href="https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html" target="_blank">Intel manuals is roughly 4700 pages</a>, and even these are known to be <a href="https://stefanheule.com/publications/pldi16-strata/" target="_blank">not</a> <a href="https://comsec.ethz.ch/research/hardware-design-security/rememberr/" target="_blank">trustworthy</a>.
</p>
<p>
Specifications of CPU architectures often rely heavily on manual work, which is error-prone and labor-intensive.
The current <a href="https://github.com/kframework/x86-64-semantics" target="_blank">state-of-the-art formal semantics</a> for x86-64 took 8 man-months to write, and even that specification still contains 34 errors (see Section 5.2 of our paper).
</p>
<p>
This situation becomes even more dire when taking into account that different x86-64 machines will behave differently: not only can they have different instruction sets, but behavior is also allowed to be undefined, in which case the same instruction has different behavior on different machines.
</p>
<p>
libLISA aims to solve this problem by using a CPU as the ground truth, and deriving semantics by observing instruction execution.
It uses fuzzing and program synthesis to derive semantics automatically.
</p>
<p>
One of the use-cases for libLISA's semantics is verification of other binary analysis tools.
Errors in instruction semantics often propagate to the final output of a binary analyzer or emulator, making them behave incorrectly.
Differences between actual CPU behavior and these tools pose a security risk: malware can abuse such differences to obfuscate its behavior, or evade analysis in controlled environments like emulators.
</p>
<p>
Our current focus is on using libLISA's automatically inferred semantics to verify the correctness of disassemblers, emulators and other semantics.
</p>
<h2>x86-64 Instruction Semantics</h2>
<p>
We analyzed five different architectures: AMD 3900X, AMD 7700X, Intel i9-13900 (p), Intel i9-13900 (e) and Intel Xeon Silver 4110.
For each architecture, we generated around 120k <i>encodings</i>.
</p>
<p>
The semantics can be accessed in the following two ways:
<ul>
<li>Use <a target="_blank" href="https://explore.liblisa.nl/">explore.liblisa.nl</a> to browse the semantics manually.</li>
<li>Use the <a target="_blank" href="https://github.com/libLISA/liblisa/tree/main/cli/liblisa-semantics-tool"><tt>liblisa-semantics-tool</tt></a> or <a target="_blank" href="https://crates.io/crates/liblisa">the <tt>liblisa</tt> Rust crate</a> to use the semantics programmatically.</li>
</ul>
</p>
<p>
The generated x86-64 semantics are CPU-specific.
Around 90% of all encodings is identical on all 5 CPU architectures that we analyzed.
The remaining 10% differs between architectures.
We can broadly classify these differences into two categories: instruction set extensions and undefined behavior.
Some x86-64 instruction set extensions, such as the <tt>SHA1</tt> instructions, are not implemented on all CPUs.
There are also many differences in how undefined behavior is implemented across different x86-64 CPU architectures.
</p>
<p>
An example of undefined behavior is the <a target="_blank" href="https://explore.liblisa.nl/instruction/0FAFC3"><tt>IMUL</tt> instruction</a>.
The <a target="_blank" href="https://www.felixcloutier.com/x86/imul#flags-affected">reference manual</a> states: <i>"The SF, ZF, AF, and PF flags are undefined."</i>
This means that the values of these flags can differ between CPU architectures, even if the instruction is provided with the same inputs.
As can be seen <a target="_blank" href="https://explore.liblisa.nl/instruction/0FAFC3">in libLISA's semantics explorer</a>, The AMD 3900X and AMD 7700X do not modify these flags.
The Intel i9-13900 (p) and Intel Xeon Silver 4110 compute the SF and PF correctly, and set the AF and ZF to 0.
The Intel i9-13900 (e) additionally also computes the ZF flag correctly, and sets only the AF to 0.
</p>
<p>
The AMD 3900X and AMD 7700X often share the same instruction semantics.
This is likely because the AMD 7700X is a successor of the AMD 3900X.
Similarly, the Intel i9-13900 (p) and Intel Xeon Silver 4110 often also share semantics, likely because the Intel i9-13900 (p) is also a successor of the Intel Xeon Silver 4110.
</p>
<h2>Acknowledgements</h2>
<p>
This work is supported by the Defense Advanced Research Projects Agency (DARPA) and Naval Information Warfare Center Pacific (NIWC Pacific) under Contract No. N66001-21-C-4028.
</p>
</article>
</body>
</html>