Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions docs/cs-dsa-memory-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Context-Sensitive DSA Memory Regions in SMACK

## Overview

SMACK uses **context-sensitive** sea-dsa analysis (`-sea-dsa=cs`) with **per-function memory regions**. Each function gets its own region vector computed from its CS graph, and memory maps are threaded through procedure signatures as parameters (reads) and returns (writes).

```boogie
// Memory maps are local in/out parameters, not globals
procedure foo(x: ref, $M.0.in: [ref]i8) returns ($M.0.out: [ref]i8)
{ var $M.0: [ref]i8;
$M.0 := $M.0.in;
$M.0 := $store.i8($M.0, x, 42);
$M.0.out := $M.0; }

procedure main()
modifies $M.0;
{ call $M.0 := foo(x, $M.0); }
```

Entry points (`main`) keep globals with `modifies` clauses. Non-entry procedures use local in/out parameters.

---

## Architecture

### Phase 1: Per-Function Region Construction

**Files:** `Regions.cpp` (runOnModule)

Each function's region vector is built from:
1. **Formal pointer parameters** -- `idx(&formalArg, &F)`
2. **Instructions** -- load/store/atomic/memcpy pointer operands via `visit(F)`
3. **Call-site actual pointer arguments** -- `idx(actualArg, &F)` for each call in the function
4. **Globals** -- `idx(&GV, &F)` for globals present in the function's DSA graph
5. **Pointer-returning calls** -- `idx(&callInst, &F)` for calls that return pointers (both declarations and definitions)
6. **Link-following** -- for each existing region, follow DSA pointer links to discover reachable nodes and create regions for them using `Region(Node*, ctx)`. This ensures callers have regions for data accessible through pointer indirection (e.g., `**arg`).

The `Region(Node*, LLVMContext&)` constructor creates a region directly from a DSA node without needing a Value*. This is needed because callers may not have LLVM values for data they never directly access but their callees do.

### Phase 2: Read/Write Sets

Direct memory accesses in each function are recorded:
- `LoadInst` -> readRegions
- `StoreInst` -> modifiedRegions
- `AtomicCmpXchgInst`, `AtomicRMWInst` -> both
- `MemSetInst` -> modifiedRegions
- `MemTransferInst` -> readRegions (source) + modifiedRegions (dest)

### Phase 2.5: Global Memory Mappings

For non-entry `usesGlobalMemory` functions (e.g., `__SMACK_static_init`), compute mappings from their region indices to the entry function's indices via shared globals.

### Phase 3: Call-Site Mappings

**`computeOneCallSiteMapping(CI, caller, callee)`** builds a map from callee region indices to caller region indices through:

1. **Parameter pairs** -- formal pointer params mapped to actual args
2. **Return value** -- callee's return value region mapped to caller's call result region
3. **Global pairs** -- shared globals (parameter mappings take priority over globals to preserve call-site specificity)
4. **Rep-matching extension** -- unmapped callee regions whose DSA node matches a mapped callee region's node
5. **DSA link-following** -- traverse pointer edges in both callee and caller DSA graphs to discover node correspondences for heap structures reachable through globals/parameters. Creates missing caller regions via `Region(Node*, ctx)` when needed.

**Conflict detection:** When a callee DSA node maps to multiple different caller nodes (e.g., two globals unified in the callee), it's marked as conflicting and excluded from link-following to avoid incorrect merging.

**Iteration:** `computeCallSiteMappings` runs iteratively (up to 10 passes) because link-following may create new regions in callers, which then need mappings computed for their own callers.

### Phase 3.5: Region Merge Propagation

**`propagateRegionMerges(M)`** enforces the soundness invariant: **regions must not alias**. Uses SCCs for proper ordering.

**Top-down pass:** When a caller maps two callee regions to the same caller region, the callee regions are merged (they alias from the caller's perspective). `mergeCalleeRegion` handles the merge and updates all affected call-site mappings.

**Bottom-up pass:** When a callee has collapsed regions that the caller keeps separate, the caller's regions are merged to match.

**Key invariant in `mergeCalleeRegion`:** When shifting callee-side keys in call-site mappings, existing entries (typically from parameter mappings) take priority over entries from merged-away regions. This prevents global mapping collisions from overwriting call-site-specific parameter mappings.

### Phase 4: Transitive Closure

**`computeFunctionRegions(M)`** propagates callee region accesses to callers through call-site mappings until convergence. Only mapped regions are propagated.

---

## Key Design Decisions

### Parameter Mapping Priority
Global pairs must not overwrite parameter pairs in the mapping (`!mapping.count(calleeR)` guard). Parameter mappings are call-site-specific and more precise. Without this, functions called with different global arguments (e.g., `acquire_lock(&main_lock)` vs `acquire_lock(&global_lock)`) would get incorrect mappings.

### DSA Link-Following
DSA graphs encode pointer relationships (e.g., `head` global links to the list struct node). The link-following extension traverses these edges to discover that nodes reachable through globals/parameters in the callee correspond to specific nodes in the caller. Without this, heap structures like linked lists would have incomplete mappings.

### Region Creation from DSA Nodes
The `Region(const seadsa::Node*, LLVMContext&)` constructor enables creating regions for DSA nodes that have no corresponding LLVM Value in the function. This is needed when:
- A caller passes a pointer and the callee accesses through multiple levels of indirection
- Phase 1 link-following discovers reachable nodes
- Phase 3 link-following creates regions during mapping computation

### Return Value Mapping
Call-site mappings include the callee's return value (matched via the callee's `ReturnInst` and the caller's `CallInst`). This is critical for function pointer dispatch patterns like `devirtbounce`, where data flows through return values rather than parameters.

---

## Test Results

**197 total tests, 195 passed, 0 failed, 2 unknown.**

The 2 unknowns (`smack_code_call`, `smack_code_call_fail`) use `__SMACK_code` to emit inline BPL calls that bypass the memory map threading. This is a pre-existing limitation of inline BPL with per-function memory maps.

---

## File Summary

| File | Change |
|------|--------|
| `DSAWrapper.h/cpp` | Function-aware `getNode`/`getOffset`/`isTypeSafe` with per-function graph lookup |
| `Regions.h/cpp` | Per-function region vectors, call-site mappings, link-following, merge propagation, `Region(Node*)` constructor |
| `SmackRep.h/cpp` | Memory map params/returns in procedure signatures, call-site mapping for threading |
| `SmackInstGenerator.cpp` | Prologue/epilogue for local memory shadows, entry block initialization |
| `SmackModuleGenerator.cpp` | Local var declarations for non-entry functions, global-scope region handling |
| `Prelude.cpp` | Per-function region types in prelude generation |
| `SmackOptions.h/cpp` | `usesGlobalMemory`, `isEntryPoint` helpers |
| `top.py` | Switch to `-sea-dsa=cs`, fix `VProperty.__members__` for `--check` flag |
18 changes: 14 additions & 4 deletions include/smack/DSAWrapper.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,10 @@ class DSAWrapper : public llvm::ModulePass {
private:
llvm::Module *module;
seadsa::GlobalAnalysis *SD;
// The ds graph since we're using the context-insensitive version which
// results in one graph for the whole module.
// Fallback graph for globals/constants (entry function's graph).
seadsa::Graph *DG;
std::unordered_set<const seadsa::Node *> staticInits;
std::unordered_set<const seadsa::Node *> memOpds;
std::unordered_set<const llvm::Value *> memOpds;
// Mapping from the DSNodes associated with globals to the numbers of
// globals associated with them.
std::unordered_map<const seadsa::Node *, unsigned> globalRefCount;
Expand All @@ -34,6 +33,10 @@ class DSAWrapper : public llvm::ModulePass {
void collectMemOpds(llvm::Module &M);
void countGlobalRefs();

// Resolve the appropriate DSA graph for a given value based on its
// enclosing function. Falls back to DG for globals/constants.
seadsa::Graph &getGraphForValue(const llvm::Value *v);

public:
static char ID;
DSAWrapper() : ModulePass(ID) {}
Expand All @@ -42,14 +45,21 @@ class DSAWrapper : public llvm::ModulePass {
virtual bool runOnModule(llvm::Module &M) override;

bool isStaticInitd(const seadsa::Node *n);
bool isMemOpd(const seadsa::Node *n);
bool isMemOpd(const llvm::Value *v);
bool isRead(const llvm::Value *V);
bool isSingletonGlobal(const llvm::Value *V);
unsigned getPointedTypeSize(const llvm::Value *v);
unsigned getOffset(const llvm::Value *v);
unsigned getOffset(const llvm::Value *v, const llvm::Function &F);
const seadsa::Node *getNode(const llvm::Value *v);
const seadsa::Node *getNode(const llvm::Value *v, const llvm::Function &F);
bool isTypeSafe(const llvm::Value *v);
bool isTypeSafe(const llvm::Value *v, const llvm::Function &F);
unsigned getNumGlobals(const seadsa::Node *n);

// Per-function graph access for context-sensitive analysis.
seadsa::Graph &getGraph(const llvm::Function &F);
bool hasGraph(const llvm::Function &F) const;
};
} // namespace smack

Expand Down
84 changes: 65 additions & 19 deletions include/smack/Regions.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@
#include "llvm/IR/InstVisitor.h"
#include "llvm/Pass.h"

#include <map>
#include <set>

using namespace llvm;

namespace llvm {
Expand All @@ -32,20 +35,23 @@ class Region {
bool incomplete;
bool complicated;
bool collapsed;
bool globalScope;

static const DataLayout *DL;
static DSAWrapper *DSA;

static bool isSingleton(const llvm::Value *v, unsigned length);
static bool isSingleton(const llvm::Value *v, unsigned length,
const llvm::Function *F);
static bool isAllocated(const seadsa::Node *N);
static bool isComplicated(const seadsa::Node *N);

void init(const Value *V, unsigned length);
void init(const Value *V, unsigned length, const llvm::Function *F);
bool isDisjoint(unsigned offset, unsigned length);

public:
Region(const Value *V);
Region(const Value *V, unsigned length);
Region(const Value *V, const llvm::Function *F);
Region(const Value *V, const llvm::Function *F, unsigned length);
Region(const seadsa::Node *node, LLVMContext &ctx);

static void init(Module &M, Pass &P);

Expand All @@ -55,35 +61,75 @@ class Region {
bool isSingleton() const { return singleton; };
bool isAllocated() const { return allocated; };
bool bytewiseAccess() const { return bytewise; }
bool isGlobalScope() const { return globalScope; }
const Type *getType() const { return type; }
const seadsa::Node *getRepresentative() const { return representative; }

void print(raw_ostream &);
};

struct FunctionRegionInfo {
std::set<unsigned> readRegions;
std::set<unsigned> modifiedRegions;
};

class Regions : public ModulePass, public InstVisitor<Regions> {
private:
std::vector<Region> regions;
unsigned idx(Region &R);
// Per-function region vectors (each function has its own local numbering).
std::map<const llvm::Function *, std::vector<Region>> funcRegionVecs;

// Per-function read/write sets (using function-local region indices).
std::map<const llvm::Function *, FunctionRegionInfo> funcRegions;

// Call-site mapping: callee region index -> caller region index.
std::map<const llvm::CallInst *, std::map<unsigned, unsigned>>
callSiteMappings;

// For non-entry usesGlobalMemory functions: mapping from their region
// indices to the entry function's region indices (matched via globals).
std::map<const llvm::Function *, std::map<unsigned, unsigned>>
globalMemoryMappings;

static FunctionRegionInfo emptyRegionInfo;

// The function currently being visited (set during visit phase).
const llvm::Function *currentFunction = nullptr;

// DSAWrapper pointer cached during runOnModule.
DSAWrapper *DSA = nullptr;

// Per-function idx: find or create a region in F's vector.
unsigned idx(Region &R, const llvm::Function *F);

void computeCallSiteMappings(llvm::Module &M);
void computeOneCallSiteMapping(llvm::CallInst *CI,
const llvm::Function *caller,
llvm::Function *callee);
void propagateRegionMerges(llvm::Module &M);
bool mergeCalleeRegion(const llvm::Function *F, unsigned keep,
unsigned remove);
void computeGlobalMemoryMappings(llvm::Module &M);
void computeFunctionRegions(llvm::Module &M);

public:
static char ID;
Regions() : ModulePass(ID) {}
virtual void getAnalysisUsage(llvm::AnalysisUsage &AU) const override;
virtual bool runOnModule(llvm::Module &M) override;

unsigned size() const;
unsigned idx(const llvm::Value *v);
unsigned idx(const llvm::Value *v, unsigned length);
Region &get(unsigned R);

// void visitModule(Module& M) {
// for (const GlobalValue& G : M.globals())
// collect(&G);
// }

// void visitAllocaInst(AllocaInst& I) {
// getRegion(&I);
// }
// Per-function region access.
unsigned size(const llvm::Function *F) const;
unsigned idx(const llvm::Value *v, const llvm::Function *F);
unsigned idx(const llvm::Value *v, const llvm::Function *F, unsigned length);
Region &get(const llvm::Function *F, unsigned R);

const FunctionRegionInfo &
getFunctionRegionInfo(const llvm::Function *F) const;
std::set<unsigned> getAccessedRegions(const llvm::Function *F) const;
const std::map<unsigned, unsigned> &
getCallSiteMapping(const llvm::CallInst *CI) const;
const std::map<unsigned, unsigned> &
getGlobalMemoryMapping(const llvm::Function *F) const;

void visitLoadInst(LoadInst &);
void visitStoreInst(StoreInst &);
Expand Down
3 changes: 3 additions & 0 deletions include/smack/SmackOptions.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ class SmackOptions {
static const llvm::cl::opt<bool> WrappedIntegerEncoding;

static bool isEntryPoint(llvm::StringRef);
// Returns true for functions that should use global memory (not parameterized
// regions) — entry points and init functions.
static bool usesGlobalMemory(llvm::StringRef);
static bool shouldCheckFunction(llvm::StringRef);
};
} // namespace smack
Expand Down
13 changes: 10 additions & 3 deletions include/smack/SmackRep.h
Original file line number Diff line number Diff line change
Expand Up @@ -59,8 +59,14 @@ class SmackRep {
std::map<std::string, Decl *> auxDecls;

public:
// Current function being processed (set by SmackModuleGenerator).
const llvm::Function *currentFunction = nullptr;
// Entry-point function (set by SmackModuleGenerator, used for global decls).
const llvm::Function *entryFunction = nullptr;

SmackRep(const llvm::DataLayout *L, Naming *N, Program *P, Regions *R);
Program *getProgram() { return program; }
Regions *getRegions() { return regions; }

private:
unsigned storageSize(llvm::Type *T);
Expand Down Expand Up @@ -185,11 +191,12 @@ class SmackRep {
unsigned getElementSize(const llvm::Value *v);

std::string memReg(unsigned i);
std::string memType(unsigned region);
std::string memType(const llvm::Function *F, unsigned region);
std::string memPath(unsigned region);
std::string memPath(const llvm::Value *v);
std::string memPath(const llvm::Value *v, const llvm::Function *F);

std::list<std::pair<std::string, std::string>> memoryMaps();
std::list<std::pair<std::string, std::string>>
memoryMaps(const llvm::Function *F);

// used in SmackInstGenerator
std::string getString(const llvm::Value *v);
Expand Down
Loading
Loading