This repository contains the LIMA Static Analyzer β the static analysis component of LIMA (Lock Interference Mapping Analyzer), a tool for measuring kernel-level data isolation across isolation platforms such as LXC, gVisor, and Firecracker.
LIMA measures isolation by observing which kernel locks are shared across co-running workloads, then maps those locks to the underlying kernel data structures they protect. It is built around the insight that the OS synchronizes all access to shared kernel objects, so lock acquisitions serve as a proxy for object-level sharing.
LIMA has two complementary components:
- Dynamic Tracer (separate repository): traces kernel lock acquisitions at runtime using eBPF, identifying which locks are shared across co-running workloads.
- Static Analyzer (this repository): uses the clangd language server to resolve lock names from dynamic traces to the kernel structs they protect, completing the lock-to-object mapping.
The static analyzer was used to analyze Linux 6.1 running on Ubuntu 22.04. It produced databases covering 2,488 global locks, 335 lock wrapper functions, and resolved approximately 85% of the 984 unique locks observed in traces to their containing kernel objects.
This repository is designed to be version-agnostic. While the paper used linux-6.1, the toolchain can run on any kernel version by modifying config.yaml. All generated data is stored in a version-specific directory (e.g., data/linux-6.2/).
clangd.py: Main LSP client. All paths are resolved throughconfig_loader.pyβ no hardcoded directories. A singlemain()parses--kernel-version, builds acfgobject, then dispatches to analysis functions via a closure-basedinteractcallback.ls_intereact.py: LSP message types and request/response handlingcommon.py: JSON-RPC protocol implementation with logging and debuggingconfig_loader.py: Singleton config loader. Readsconfig.yamland exposesget_source_dir(),get_output_dir(key),get_clangd_command(), etc. so every script resolves paths through one place.utils.py: Lock analysis engine and symbol processing utilitiesrun_get_def.py: Batch runner that iterates a CSV of lock sites and callsclangd.py --choice getDefinitionper row to resolve lock variable definitions.
Follow these steps to prepare the analysis environment.
Open config.yaml and edit the paths under the setup section to match your system:
build_dir: The absolute path where dependencies like the LLVM project will be downloaded and built (e.g.,/home/user/lima_build).kernel_source_dir: The absolute path where the Linux kernel source code will be downloaded and stored (e.g.,/home/user/lima_kernels).kernel_build_dir: The absolute path where the kernel build outputs will be placed (e.g.,/home/user/lima_builds).
You can also change the kernel_source_url if you wish to analyze a different kernel version.
Make the script executable and run it:
chmod +x setup.sh
./setup.shThis script will prepare the entire environment for you.
The setup.sh script performs the following steps:
- Installs Dependencies: Installs all required
aptpackages (likeg++,bear,cmake) and relevant Python packages. - Builds Custom Clangd: Downloads the LLVM/Clangd source code, applies a necessary patch, and builds it from source.
- Prepares Linux Kernel: Downloads the kernel version specified in
config.yaml, extracts it, and copies the appropriate kernel.configfile. - Generates Compilation Database: Runs
bear makewithin the kernel source directory to create thecompile_commands.jsonfile. This database is essential forclangdto provide semantic analysis. For modern versions of Bear (3.0+), the equivalent command isbear -- make.
Once the script finishes, your environment should be ready for analysis.
All analysis is driven by config.yaml. The general command pattern is:
python3 clangd.py --kernel-version <version_name> --log --choice <ANALYSIS_TYPE>--kernel-version: Kernel to analyze (e.g.linux-6.1). Must match a key inconfig.yaml. Defaults to the first listed version if omitted.--log: Enables JSON-RPC communication logging to the terminal.--choice: The analysis to run (see modes below).
Run the modes in this order β each stage feeds the next:
hello β documentSymbol β outGoingCalls β inComingCalls β getAST β getDefinition
β
utils.py post-processing
Verifies that clangd starts correctly and can index a file. Run this first to confirm the environment is working before launching a full analysis.
python3 clangd.py --kernel-version linux-6.1 --log --choice helloOpens include/linux/fs.h, waits for clangd to index it, and prints the document symbols. No output is saved.
Extracts all symbols (functions, structs, variables) from every .c and .h file under the configured directories. The --subchoice flag selects what to extract:
Prerequisite: None β this is the first real step.
# subchoice 0 (default): dump all symbols as JSON β required for all later stages
python3 clangd.py --kernel-version linux-6.1 --log --choice documentSymbol --subchoice 0
# subchoice 5: extract struct-to-lock mappings
python3 clangd.py --kernel-version linux-6.1 --log --choice documentSymbol --subchoice 5
# subchoice 12: extract function symbols only
python3 clangd.py --kernel-version linux-6.1 --log --choice documentSymbol --subchoice 12--subchoice |
What it does | Output |
|---|---|---|
0 |
Full JSON dump of all symbols per file | data/linux-6.1/document_symbols/ |
5 |
Struct definitions that contain lock fields | data/linux-6.1/lock_defs/<dir>_locks.csv |
12 |
Function symbols only | data/linux-6.1/functions_document_symbol_db/document_symbol_<dir>.csv |
Note: Subchoices
5and12re-query clangd for the same data that0already saves. If subchoice0has already been run, the same outputs can be generated without running clangd again β usecreate_lock_struct_maps(cfg)for subchoice5andcreate_function_symbol_db(cfg)for subchoice12(see Post-Processing section).
Builds a call graph by querying clangd for every function's outgoing calls.
Prerequisite: documentSymbol must have been run. Reads from data/linux-6.1/functions_document_symbol_db/.
python3 clangd.py --kernel-version linux-6.1 --log --choice outGoingCallsOutput: data/linux-6.1/outgoing_calls_db/ β one CSV per subsystem with caller-callee mappings and source locations.
Finds all callers of known lock acquisition primitives, used to build the lock wrapper function database.
Prerequisite: documentSymbol must have been run. Requires lock_acquire_functions.csv in the project root.
python3 clangd.py --kernel-version linux-6.1 --log --choice inComingCallsOutput: data/linux-6.1/incoming_calls_lock_acquire/ β one CSV per lock type listing every wrapper function found.
Generates Abstract Syntax Trees for source files, used to disambiguate generic lock variables named lock.
Prerequisite: documentSymbol must have been run.
python3 clangd.py --kernel-version linux-6.1 --log --choice getASTOutput: data/linux-6.1/ast/ β one JSON AST file per source file, mirroring the kernel directory structure (e.g., data/linux-6.1/ast/kernel/sched/core.c.json).
Resolves where a single symbol is defined. Useful for one-off lookups; for batch processing use run_get_def.py (see below).
Prerequisite: getAST must have been run.
python3 clangd.py --kernel-version linux-6.1 --log --choice getDefinition \
-p <absolute_path_to_file> -l <line> -ch <character> -in <row_index>Parameters:
-p: Absolute path to the source file-l: Line number (1-based)-ch: Character position (1-based)-in: Row index used as the output filename (data/linux-6.1/lock_defs/<row_index>.json)
Resolves lock variable definitions for every row in a CSV file. This is the standard way to run getDefinition at scale, as it drives clangd.py once per row and writes results back into a single output CSV.
Prerequisite: getAST and outGoingCalls must have been run. Input CSV comes from acquire_generic_lock_funcs_details/ or acquire_non_generic_lock_funcs_details/.
python3 run_get_def.py <input_csv> <output_csv> getDefinition --kernel-version linux-6.1Both <input_csv> and <output_csv> are paths relative to the versioned data directory (data/linux-6.1/).
Example β resolve definitions for all generic lock sites:
python3 run_get_def.py \
acquire_generic_lock_funcs_details/lock_acquire_function_details_combined_filtered.csv \
generic_locks_defs.csv \
getDefinition \
--kernel-version linux-6.1Re-running failed rows β if some rows came back with lock_def_line == -1, retry only those:
python3 run_get_def.py \
acquire_generic_lock_funcs_details/lock_acquire_function_details_combined_filtered.csv \
generic_locks_defs.csv \
handleEmpty \
--kernel-version linux-6.1Output columns added to the CSV:
| Column | Description |
|---|---|
lock_def_line |
Line number where the lock variable is defined (-1 if not found) |
lock_def_path |
Kernel-relative path to the file containing the definition (None if not found) |
utils.py is a standalone post-processing module that operates on data already collected by clangd.py. It reads from the document symbol JSON files, AST files, and outgoing call CSVs to produce the lock databases used for downstream analysis. All functions accept a cfg object from config_loader.py.
from utils import create_lock_struct_maps, get_global_locks, get_generic_lock_details
from config_loader import get_config
cfg = get_config('linux-6.1')Parses the document symbol JSON files already saved by --choice documentSymbol and extracts function symbols (LSP kind 12) into per-subsystem CSVs. Produces the same output as --subchoice 12 without re-querying clangd.
Prerequisite: --choice documentSymbol --subchoice 0 must have been run first.
create_function_symbol_db(cfg)Output: One CSV per top-level kernel directory in data/linux-6.1/functions_document_symbol_db/
| Column | Description |
|---|---|
name |
Function name |
detail |
Function signature |
kind |
Always 12 (LSP function kind) |
path |
Absolute path to the source file |
range_* |
Line/character range of the full function body |
selectionRange_* |
Line/character range of the function name |
Walks the document symbol JSON files and builds a CSV mapping every struct to the lock fields it contains. The drivers/ directory is skipped by default.
Prerequisite: Document symbol analysis (--choice documentSymbol) must have been run first.
create_lock_struct_maps(cfg)Output: data/linux-6.1/lock_struct_map_db/lock_struct_map.csv
| Column | Description |
|---|---|
file |
Kernel-relative source file path |
type |
Struct type keyword (e.g. struct) |
name |
Struct name |
lock_type |
Lock primitive type (e.g. spinlock_t, struct mutex) |
lock_name |
Name of the lock field within the struct |
start_line |
First line of the struct definition |
end_line |
Last line of the struct definition |
An optional output_csv path can be passed to write to a custom location instead of the default.
Scans document symbol JSON files for globally-defined locks β variables declared with macros like DEFINE_MUTEX, DEFINE_SPINLOCK, etc., or typed directly with a lock primitive.
Prerequisite: Document symbol analysis must have been run first.
get_global_locks(cfg)Output: data/linux-6.1/global_locks/global_locks.csv
| Column | Description |
|---|---|
lock_name |
Name of the global lock variable |
lock_type |
Declaration macro or primitive type |
file |
Kernel-relative source file path |
Finds every lock acquisition call where the lock argument follows the ->lock pattern (generic locks), then uses AST analysis to pinpoint the exact character position of each lock variable in source. Drivers are excluded by default.
Prerequisites: Outgoing call analysis (--choice outGoingCalls) and AST generation (--choice getAST) must have been run first, and lock_acquire_functions.csv must exist in the project root.
get_generic_lock_details(cfg)
# to include the drivers directory:
get_generic_lock_details(cfg, include_drivers=True)Output: One CSV per subsystem in data/linux-6.1/acquire_generic_lock_funcs_details/
| Column | Description |
|---|---|
caller_name |
Function that acquires the lock |
caller_path |
Kernel-relative path to the caller |
callee_name |
Lock primitive called (e.g. mutex_lock) |
lock_name |
Lock variable expression (e.g. &dev->lock) |
line |
Line number of the lock argument (1-based) |
start_char |
Start character position of the lock argument |
end_char |
End character position of the lock argument |
Internal helpers β
traverse_ast()andget_lock_ast()are used internally byget_generic_lock_details()and are not intended to be called directly.
- Mutex locks:
struct mutex - Spinlocks:
spinlock_t,raw_spinlock_t - Reader-Writer locks:
struct rw_semaphore,rwlock_t - Sequence locks:
seqlock_t - Wait queues:
wait_queue_head_t - RT mutex:
struct rt_mutex - Lock references:
struct lockref - Write-Write mutex:
struct ww_mutex
All analysis data is stored in a version-specific directory, making it easy to manage results for multiple kernel versions.
π data/
βββ π <kernel_version>/
βββ π acquire_generic_lock_funcs_details/ # Detailed analysis of generic lock acquisitions
βββ π acquire_non_generic_lock_funcs_details/ # Detailed analysis of non-generic lock acquisitions
βββ π ast/ # Abstract Syntax Trees for source files
βββ π document_symbols/ # Symbol information extracted from kernel files
βββ π functions_document_symbol_db/ # Function-specific symbol data
βββ π global_locks/ # Database of globally defined locks
βββ π incoming_calls_lock_acquire/ # Caller analysis for lock acquisition functions
βββ π lock_defs/ # Definitions for generic vs. non-generic locks
βββ π lock_struct_map_db/ # Mappings of locks to their containing structs
βββ π locks_db/ # Main database of identified locks
βββ π locks_struct_map_headers_db/ # Lock-to-struct mappings found in header files
βββ π outgoing_calls_db/ # Database of function call relationships
βββ π struct_db/ # Struct boundary and definition information
The matching process uses the following fallback chain:
- Direct Global Match: Check if the lock is globally defined (
global_locks/) - Struct Member Match: Look up the lock in the struct-lock mapping database (
lock_struct_map_db/) - Generic Lock Analysis: For variables named simply
lock, use AST + definition lookup to resolve the containing struct
The data generated by running the static analyzer on Linux 6.1 (as used in the paper) is included in this repository under data/linux-6.1/. This includes the complete lock databases, document symbol dumps, AST files, outgoing/incoming call graphs, and the final lock-to-object mappings. You can use this data directly without re-running the analysis.
| Metric | Value |
|---|---|
Global locks indexed (global_locks/) |
2,488 |
Lock wrapper functions found (incoming_calls_lock_acquire/) |
335 |
| Unique locks analyzed from dynamic traces | 984 |
| Locks successfully resolved to kernel objects | ~85% |
The end-to-end lock-to-object mapping that feeds into the dynamic analysis:
file object_name lock_type lock_name
fs/ext4/ext4.h ext4_inode_info rw_semaphore i_data_sem
global_locks/global_locks.csv β globally defined locks:
lock_name lock_type file
tasklist_lock rwlock_t include/linux/sched/task.h
lock_struct_map_db/lock_struct_map.csv β locks embedded in structs:
file struct_name lock_type lock_name start_line end_line
mm/slab.h kmem_cache_node spinlock_t list_lock 751 779
acquire_generic_lock_funcs_details/ β generic lock acquisition sites (from run_get_def.py):
func,path,acquire_function,from_line,lock_name,lock_line,start_char,end_char,lock_def_line,lock_def_path
kvm_vfio_group_add,virt/kvm/vfio.c,mutex_lock_nested,163.0,&kv->lock,162.0,13.0,21.0,33.0,/virt/kvm/vfio.c
cpu_stop_queue_work,kernel/stop_machine.c,_raw_spin_lock_irqsave,101.0,"&stopper->lock, flags",100.0,24.0,37.0,39.0,/kernel/stop_machine.c| Column | Description |
|---|---|
func |
Function where the lock is acquired |
path |
Kernel-relative source file |
acquire_function |
Lock primitive called |
lock_name |
Lock variable expression |
lock_line / start_char / end_char |
Precise source location of the lock argument |
lock_def_line / lock_def_path |
Where the lock is defined in the kernel struct |
To analyze additional kernel subsystems:
- Modify the directory iteration in
handleDocumentSymbol() - Add new lock primitive patterns in
utils.py - Update lock type detection regex patterns
Add new lock types by updating:
lock_primitives = ["struct mutex", "your_lock_type", ...]
global_locks_primitives = ["DEFINE_YOUR_LOCK", ...]- Adjust sleep timeouts based on system performance
- Filter analysis scope to specific kernel subsystems
- Language Server Protocol Specification
- clangd Documentation
- Linux Kernel Locking Documentation
- LSP Symbol Kinds
Note: This tool is specifically designed for Linux kernel analysis and requires substantial system resources for complete kernel analysis. Consider analyzing kernel subsystems individually for optimal performance.