Skip to content

multifacet/lima-static-analyzer

Repository files navigation

LIMA Logo

LIMA Static Analyzer

This repository contains the LIMA Static Analyzer β€” the static analysis component of LIMA (Lock Interference Mapping Analyzer), a tool for measuring kernel-level data isolation across isolation platforms such as LXC, gVisor, and Firecracker.

LIMA measures isolation by observing which kernel locks are shared across co-running workloads, then maps those locks to the underlying kernel data structures they protect. It is built around the insight that the OS synchronizes all access to shared kernel objects, so lock acquisitions serve as a proxy for object-level sharing.

LIMA has two complementary components:

  1. Dynamic Tracer (separate repository): traces kernel lock acquisitions at runtime using eBPF, identifying which locks are shared across co-running workloads.
  2. Static Analyzer (this repository): uses the clangd language server to resolve lock names from dynamic traces to the kernel structs they protect, completing the lock-to-object mapping.

The static analyzer was used to analyze Linux 6.1 running on Ubuntu 22.04. It produced databases covering 2,488 global locks, 335 lock wrapper functions, and resolved approximately 85% of the 984 unique locks observed in traces to their containing kernel objects.

A Note on Kernel Versions

This repository is designed to be version-agnostic. While the paper used linux-6.1, the toolchain can run on any kernel version by modifying config.yaml. All generated data is stored in a version-specific directory (e.g., data/linux-6.2/).

Architecture & Components

Core Components

  • clangd.py: Main LSP client. All paths are resolved through config_loader.py β€” no hardcoded directories. A single main() parses --kernel-version, builds a cfg object, then dispatches to analysis functions via a closure-based interact callback.
  • ls_intereact.py: LSP message types and request/response handling
  • common.py: JSON-RPC protocol implementation with logging and debugging
  • config_loader.py: Singleton config loader. Reads config.yaml and exposes get_source_dir(), get_output_dir(key), get_clangd_command(), etc. so every script resolves paths through one place.
  • utils.py: Lock analysis engine and symbol processing utilities
  • run_get_def.py: Batch runner that iterates a CSV of lock sites and calls clangd.py --choice getDefinition per row to resolve lock variable definitions.

Setup πŸš€

Follow these steps to prepare the analysis environment.

Step 1: Configure Your Environment

Open config.yaml and edit the paths under the setup section to match your system:

  • build_dir: The absolute path where dependencies like the LLVM project will be downloaded and built (e.g., /home/user/lima_build).
  • kernel_source_dir: The absolute path where the Linux kernel source code will be downloaded and stored (e.g., /home/user/lima_kernels).
  • kernel_build_dir: The absolute path where the kernel build outputs will be placed (e.g., /home/user/lima_builds).

You can also change the kernel_source_url if you wish to analyze a different kernel version.

Step 2: Run the Setup Script

Make the script executable and run it:

chmod +x setup.sh
./setup.sh

This script will prepare the entire environment for you.

What the Setup Script Does

The setup.sh script performs the following steps:

  1. Installs Dependencies: Installs all required apt packages (like g++, bear, cmake) and relevant Python packages.
  2. Builds Custom Clangd: Downloads the LLVM/Clangd source code, applies a necessary patch, and builds it from source.
  3. Prepares Linux Kernel: Downloads the kernel version specified in config.yaml, extracts it, and copies the appropriate kernel .config file.
  4. Generates Compilation Database: Runs bear make within the kernel source directory to create the compile_commands.json file. This database is essential for clangd to provide semantic analysis. For modern versions of Bear (3.0+), the equivalent command is bear -- make.

Once the script finishes, your environment should be ready for analysis.

Usage Guide πŸ› οΈ

All analysis is driven by config.yaml. The general command pattern is:

python3 clangd.py --kernel-version <version_name> --log --choice <ANALYSIS_TYPE>
  • --kernel-version: Kernel to analyze (e.g. linux-6.1). Must match a key in config.yaml. Defaults to the first listed version if omitted.
  • --log: Enables JSON-RPC communication logging to the terminal.
  • --choice: The analysis to run (see modes below).

Recommended Execution Order

Run the modes in this order β€” each stage feeds the next:

hello β†’ documentSymbol β†’ outGoingCalls β†’ inComingCalls β†’ getAST β†’ getDefinition
                                                              ↓
                                                    utils.py post-processing

Analysis Modes

1. Hello (connection check) βœ…

Verifies that clangd starts correctly and can index a file. Run this first to confirm the environment is working before launching a full analysis.

python3 clangd.py --kernel-version linux-6.1 --log --choice hello

Opens include/linux/fs.h, waits for clangd to index it, and prints the document symbols. No output is saved.


2. Document Symbol Analysis πŸ”

Extracts all symbols (functions, structs, variables) from every .c and .h file under the configured directories. The --subchoice flag selects what to extract:

Prerequisite: None β€” this is the first real step.

# subchoice 0 (default): dump all symbols as JSON β€” required for all later stages
python3 clangd.py --kernel-version linux-6.1 --log --choice documentSymbol --subchoice 0

# subchoice 5: extract struct-to-lock mappings
python3 clangd.py --kernel-version linux-6.1 --log --choice documentSymbol --subchoice 5

# subchoice 12: extract function symbols only
python3 clangd.py --kernel-version linux-6.1 --log --choice documentSymbol --subchoice 12
--subchoice What it does Output
0 Full JSON dump of all symbols per file data/linux-6.1/document_symbols/
5 Struct definitions that contain lock fields data/linux-6.1/lock_defs/<dir>_locks.csv
12 Function symbols only data/linux-6.1/functions_document_symbol_db/document_symbol_<dir>.csv

Note: Subchoices 5 and 12 re-query clangd for the same data that 0 already saves. If subchoice 0 has already been run, the same outputs can be generated without running clangd again β€” use create_lock_struct_maps(cfg) for subchoice 5 and create_function_symbol_db(cfg) for subchoice 12 (see Post-Processing section).


3. Outgoing Call Analysis πŸ“ž

Builds a call graph by querying clangd for every function's outgoing calls.

Prerequisite: documentSymbol must have been run. Reads from data/linux-6.1/functions_document_symbol_db/.

python3 clangd.py --kernel-version linux-6.1 --log --choice outGoingCalls

Output: data/linux-6.1/outgoing_calls_db/ β€” one CSV per subsystem with caller-callee mappings and source locations.


4. Incoming Call Analysis πŸ“₯

Finds all callers of known lock acquisition primitives, used to build the lock wrapper function database.

Prerequisite: documentSymbol must have been run. Requires lock_acquire_functions.csv in the project root.

python3 clangd.py --kernel-version linux-6.1 --log --choice inComingCalls

Output: data/linux-6.1/incoming_calls_lock_acquire/ β€” one CSV per lock type listing every wrapper function found.


5. AST Generation 🌳

Generates Abstract Syntax Trees for source files, used to disambiguate generic lock variables named lock.

Prerequisite: documentSymbol must have been run.

python3 clangd.py --kernel-version linux-6.1 --log --choice getAST

Output: data/linux-6.1/ast/ β€” one JSON AST file per source file, mirroring the kernel directory structure (e.g., data/linux-6.1/ast/kernel/sched/core.c.json).


6. Definition Lookup 🎯

Resolves where a single symbol is defined. Useful for one-off lookups; for batch processing use run_get_def.py (see below).

Prerequisite: getAST must have been run.

python3 clangd.py --kernel-version linux-6.1 --log --choice getDefinition \
  -p <absolute_path_to_file> -l <line> -ch <character> -in <row_index>

Parameters:

  • -p: Absolute path to the source file
  • -l: Line number (1-based)
  • -ch: Character position (1-based)
  • -in: Row index used as the output filename (data/linux-6.1/lock_defs/<row_index>.json)

7. Batch Definition Lookup πŸ“‹

Resolves lock variable definitions for every row in a CSV file. This is the standard way to run getDefinition at scale, as it drives clangd.py once per row and writes results back into a single output CSV.

Prerequisite: getAST and outGoingCalls must have been run. Input CSV comes from acquire_generic_lock_funcs_details/ or acquire_non_generic_lock_funcs_details/.

python3 run_get_def.py <input_csv> <output_csv> getDefinition --kernel-version linux-6.1

Both <input_csv> and <output_csv> are paths relative to the versioned data directory (data/linux-6.1/).

Example β€” resolve definitions for all generic lock sites:

python3 run_get_def.py \
  acquire_generic_lock_funcs_details/lock_acquire_function_details_combined_filtered.csv \
  generic_locks_defs.csv \
  getDefinition \
  --kernel-version linux-6.1

Re-running failed rows β€” if some rows came back with lock_def_line == -1, retry only those:

python3 run_get_def.py \
  acquire_generic_lock_funcs_details/lock_acquire_function_details_combined_filtered.csv \
  generic_locks_defs.csv \
  handleEmpty \
  --kernel-version linux-6.1

Output columns added to the CSV:

Column Description
lock_def_line Line number where the lock variable is defined (-1 if not found)
lock_def_path Kernel-relative path to the file containing the definition (None if not found)

Post-Processing: utils.py πŸ”¬

utils.py is a standalone post-processing module that operates on data already collected by clangd.py. It reads from the document symbol JSON files, AST files, and outgoing call CSVs to produce the lock databases used for downstream analysis. All functions accept a cfg object from config_loader.py.

from utils import create_lock_struct_maps, get_global_locks, get_generic_lock_details
from config_loader import get_config

cfg = get_config('linux-6.1')

create_function_symbol_db(cfg)

Parses the document symbol JSON files already saved by --choice documentSymbol and extracts function symbols (LSP kind 12) into per-subsystem CSVs. Produces the same output as --subchoice 12 without re-querying clangd.

Prerequisite: --choice documentSymbol --subchoice 0 must have been run first.

create_function_symbol_db(cfg)

Output: One CSV per top-level kernel directory in data/linux-6.1/functions_document_symbol_db/

Column Description
name Function name
detail Function signature
kind Always 12 (LSP function kind)
path Absolute path to the source file
range_* Line/character range of the full function body
selectionRange_* Line/character range of the function name

create_lock_struct_maps(cfg)

Walks the document symbol JSON files and builds a CSV mapping every struct to the lock fields it contains. The drivers/ directory is skipped by default.

Prerequisite: Document symbol analysis (--choice documentSymbol) must have been run first.

create_lock_struct_maps(cfg)

Output: data/linux-6.1/lock_struct_map_db/lock_struct_map.csv

Column Description
file Kernel-relative source file path
type Struct type keyword (e.g. struct)
name Struct name
lock_type Lock primitive type (e.g. spinlock_t, struct mutex)
lock_name Name of the lock field within the struct
start_line First line of the struct definition
end_line Last line of the struct definition

An optional output_csv path can be passed to write to a custom location instead of the default.


get_global_locks(cfg)

Scans document symbol JSON files for globally-defined locks β€” variables declared with macros like DEFINE_MUTEX, DEFINE_SPINLOCK, etc., or typed directly with a lock primitive.

Prerequisite: Document symbol analysis must have been run first.

get_global_locks(cfg)

Output: data/linux-6.1/global_locks/global_locks.csv

Column Description
lock_name Name of the global lock variable
lock_type Declaration macro or primitive type
file Kernel-relative source file path

get_generic_lock_details(cfg, include_drivers=False)

Finds every lock acquisition call where the lock argument follows the ->lock pattern (generic locks), then uses AST analysis to pinpoint the exact character position of each lock variable in source. Drivers are excluded by default.

Prerequisites: Outgoing call analysis (--choice outGoingCalls) and AST generation (--choice getAST) must have been run first, and lock_acquire_functions.csv must exist in the project root.

get_generic_lock_details(cfg)

# to include the drivers directory:
get_generic_lock_details(cfg, include_drivers=True)

Output: One CSV per subsystem in data/linux-6.1/acquire_generic_lock_funcs_details/

Column Description
caller_name Function that acquires the lock
caller_path Kernel-relative path to the caller
callee_name Lock primitive called (e.g. mutex_lock)
lock_name Lock variable expression (e.g. &dev->lock)
line Line number of the lock argument (1-based)
start_char Start character position of the lock argument
end_char End character position of the lock argument

Internal helpers β€” traverse_ast() and get_lock_ast() are used internally by get_generic_lock_details() and are not intended to be called directly.


Lock Types & Data Layout

Supported Lock Types

  • Mutex locks: struct mutex
  • Spinlocks: spinlock_t, raw_spinlock_t
  • Reader-Writer locks: struct rw_semaphore, rwlock_t
  • Sequence locks: seqlock_t
  • Wait queues: wait_queue_head_t
  • RT mutex: struct rt_mutex
  • Lock references: struct lockref
  • Write-Write mutex: struct ww_mutex

Data Organization

All analysis data is stored in a version-specific directory, making it easy to manage results for multiple kernel versions.

πŸ“ data/
└── πŸ“ <kernel_version>/
    β”œβ”€β”€ πŸ“‚ acquire_generic_lock_funcs_details/  # Detailed analysis of generic lock acquisitions
    β”œβ”€β”€ πŸ“‚ acquire_non_generic_lock_funcs_details/ # Detailed analysis of non-generic lock acquisitions
    β”œβ”€β”€ πŸ“‚ ast/                                 # Abstract Syntax Trees for source files
    β”œβ”€β”€ πŸ“‚ document_symbols/                    # Symbol information extracted from kernel files
    β”œβ”€β”€ πŸ“‚ functions_document_symbol_db/        # Function-specific symbol data
    β”œβ”€β”€ πŸ“‚ global_locks/                        # Database of globally defined locks
    β”œβ”€β”€ πŸ“‚ incoming_calls_lock_acquire/         # Caller analysis for lock acquisition functions
    β”œβ”€β”€ πŸ“‚ lock_defs/                           # Definitions for generic vs. non-generic locks
    β”œβ”€β”€ πŸ“‚ lock_struct_map_db/                  # Mappings of locks to their containing structs
    β”œβ”€β”€ πŸ“‚ locks_db/                            # Main database of identified locks
    β”œβ”€β”€ πŸ“‚ locks_struct_map_headers_db/         # Lock-to-struct mappings found in header files
    β”œβ”€β”€ πŸ“‚ outgoing_calls_db/                   # Database of function call relationships
    └── πŸ“‚ struct_db/                           # Struct boundary and definition information

Lock Matching Algorithm

The matching process uses the following fallback chain:

  1. Direct Global Match: Check if the lock is globally defined (global_locks/)
  2. Struct Member Match: Look up the lock in the struct-lock mapping database (lock_struct_map_db/)
  3. Generic Lock Analysis: For variables named simply lock, use AST + definition lookup to resolve the containing struct

Output Data & Analysis Results πŸ“Š

The data generated by running the static analyzer on Linux 6.1 (as used in the paper) is included in this repository under data/linux-6.1/. This includes the complete lock databases, document symbol dumps, AST files, outgoing/incoming call graphs, and the final lock-to-object mappings. You can use this data directly without re-running the analysis.

Key Statistics (Linux 6.1)

Metric Value
Global locks indexed (global_locks/) 2,488
Lock wrapper functions found (incoming_calls_lock_acquire/) 335
Unique locks analyzed from dynamic traces 984
Locks successfully resolved to kernel objects ~85%

Final Output Format

The end-to-end lock-to-object mapping that feeds into the dynamic analysis:

file             object_name      lock_type     lock_name
fs/ext4/ext4.h   ext4_inode_info  rw_semaphore  i_data_sem

Intermediate Database Formats

global_locks/global_locks.csv β€” globally defined locks:

lock_name      lock_type  file
tasklist_lock  rwlock_t   include/linux/sched/task.h

lock_struct_map_db/lock_struct_map.csv β€” locks embedded in structs:

file       struct_name      lock_type   lock_name  start_line  end_line
mm/slab.h  kmem_cache_node  spinlock_t  list_lock  751         779

acquire_generic_lock_funcs_details/ β€” generic lock acquisition sites (from run_get_def.py):

func,path,acquire_function,from_line,lock_name,lock_line,start_char,end_char,lock_def_line,lock_def_path
kvm_vfio_group_add,virt/kvm/vfio.c,mutex_lock_nested,163.0,&kv->lock,162.0,13.0,21.0,33.0,/virt/kvm/vfio.c
cpu_stop_queue_work,kernel/stop_machine.c,_raw_spin_lock_irqsave,101.0,"&stopper->lock, flags",100.0,24.0,37.0,39.0,/kernel/stop_machine.c
Column Description
func Function where the lock is acquired
path Kernel-relative source file
acquire_function Lock primitive called
lock_name Lock variable expression
lock_line / start_char / end_char Precise source location of the lock argument
lock_def_line / lock_def_path Where the lock is defined in the kernel struct

Advanced Usage & Customization πŸ”§

Extending Analysis Scope

To analyze additional kernel subsystems:

  1. Modify the directory iteration in handleDocumentSymbol()
  2. Add new lock primitive patterns in utils.py
  3. Update lock type detection regex patterns

Custom Lock Patterns

Add new lock types by updating:

lock_primitives = ["struct mutex", "your_lock_type", ...]
global_locks_primitives = ["DEFINE_YOUR_LOCK", ...]

Performance Optimization

  • Adjust sleep timeouts based on system performance
  • Filter analysis scope to specific kernel subsystems

References & Further Reading πŸ“š


Note: This tool is specifically designed for Linux kernel analysis and requires substantial system resources for complete kernel analysis. Consider analyzing kernel subsystems individually for optimal performance.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors