diff --git a/.gitignore b/.gitignore index b69b1ff..62d1011 100644 --- a/.gitignore +++ b/.gitignore @@ -3,7 +3,9 @@ *.jl.mem .DS_Store Manifest.toml +Manifest-v*.toml /dev/ /docs/build/ /docs/site/ .vscode/ +.gitignore~ diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..fdda920 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,29 @@ +# Understanding tasks + +Feel free to ask questions. + +Do not assume that requested tasks are possible: feel free to inform me that a requested task is not possible in the given form or that there are only tedious workarounds. + +# Julia development + +For Julia code, use the YAS codestyle (https://github.com/jrevels/YASGuide). +Explicit `return` statements are required and the use of `import` is forbidden. +Place all `using` statements in the top-level module file or in `set_up_tests.jl`. + +Always invoke Julia with `--startup-file=no` unless explicitly instructed otherwise. + +You may invoke a specific Julia version with `+VERSION`, e.g. `+1.10` or `+1.12`. This argument must come immediately after `julia` and before any other flags. + +Don't forget to specify the local project with the `--project=.`. + +## Writing tests + +When checking test coverage, you can use LocalCoverage.jl, which writes coverage to `coverage/lcov.info`. + +When running individual tests in Julia, you need to load the test environment. This can be done with `using TestEnv; TestEnv.activate()`. The entire testsuite can be invoked with `using Pkg; Pkg.test()` without activating the test environment beforehand, but you will still need to activate the local project (e.g. with the `--project=.` argument to Julia). + +Do not use `@test_warn`. When testing warnings, use `@test_logs` with the appropriate logging level, e.g. `:warn`. For tests where a warning is issued, use the `@suppressor` macro from Suppressor.jl to hide the warning during testing. + +All shared test state — `using` statements, shared constants, and shared helper functions — belongs in `set_up_tests.jl`. Individual test files should not define their own constants or helpers if they are used across files. + +When creating temporary directories in tests, use the `do`-block form: `mktempdir() do tempdir ... end`. This ensures cleanup even if the test errors. diff --git a/docs/Project.toml b/docs/Project.toml index e4f9ea1..808622a 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -15,7 +15,7 @@ Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c" [compat] -CairoMakie = "0.12" +CairoMakie = "0.15" DataFrameMacros = "0.1" DataFrames = "1" Documenter = "1.3" diff --git a/docs/make.jl b/docs/make.jl index 7f8cfee..69dcf52 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -6,6 +6,7 @@ makedocs(; sitename="MixedModelsSim.jl", authors="Phillip Alday, Douglas Bates, Lisa DeBruine, Reinhold Kliegl", pages=["index.md", "Rapid Start" => "simulation.md", - "Beginner Friendly Tutorial" => "simulation_tutorial.md"]) + "Beginner Friendly Tutorial" => "simulation_tutorial.md", + "API Reference" => "api.md"]) deploydocs(; repo="github.com/RePsychLing/MixedModelsSim.jl", push_preview=true) diff --git a/docs/src/api.md b/docs/src/api.md new file mode 100644 index 0000000..b7a3b31 --- /dev/null +++ b/docs/src/api.md @@ -0,0 +1,18 @@ +# API + +## Public + +```@autodocs +Modules = [MixedModelsSim] +Private = false +``` + +## Internal + +*Internal API is subject to change at any time and without being considered breaking!* + +```@autodocs +Modules = [MixedModelsSim] +Private = true +Public = false +``` diff --git a/docs/src/index.md b/docs/src/index.md index 86f7713..bd1550d 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,8 +1,28 @@ # MixedModelsSim.jl -```@index -``` +MixedModelsSim.jl provides utilities for designing and power-analysing experiments with mixed-effects models. +It is a companion to [MixedModels.jl](https://juliastats.org/MixedModels.jl/stable/). + +## What this package does + +- **Generate crossed experimental designs** — subjects × items × conditions with arbitrary combinations of between-subject, between-item, and within-both factors ([`simdat_crossed`](@ref)) +- **Specify random-effects covariance structure** — create the lower Cholesky factor of a covariance matrix from standard deviations and optional correlations ([`create_re`](@ref)), and convert it to the θ parameter vector MixedModels.jl expects ([`createθ`](@ref)) +- **Run power analyses** — simulate data via `parametricbootstrap` from MixedModels.jl and summarise the results as a power table ([`power_table`](@ref)) + +## Installation -```@autodocs -Modules = [MixedModelsSim] +MixedModelsSim.jl is registered in the Julia package registry: + +```julia +using Pkg +Pkg.add("MixedModelsSim") ``` + +## Getting started + +| Guide | Description | +|:------|:------------| +| [Rapid Start](@ref "Simulating an Experiment from Scratch") | A concise worked example: build a design from scratch, specify parameters, simulate, and compute power. | +| [Tutorial](@ref "Power Analysis and Simulation Tutorial") | A step-by-step tutorial covering four scenarios: bootstrapping from existing data, adapting effect sizes, creating a design from scratch, and power curves over varying sample sizes. | +| [API Reference](@ref "API") | Complete documentation for all exported functions. | + diff --git a/docs/src/simulation.md b/docs/src/simulation.md index dfc4781..a020b01 100644 --- a/docs/src/simulation.md +++ b/docs/src/simulation.md @@ -8,7 +8,7 @@ First, some setup: ```@example Main using DataFrames using MixedModels, MixedModelsSim -using Random +using StableRNGs ``` ## Assemble the Design @@ -34,7 +34,7 @@ both_win = Dict(:context => ["matched", "unmatched"]) and then generate it: ```@example Main -rng = MersenneTwister(42) # specify our random number generator for reproducibility +rng = StableRNG(42) # specify our random number generator for reproducibility design = simdat_crossed(rng, n_subj, n_item; subj_btwn = subj_btwn, item_btwn = item_btwn, @@ -169,12 +169,12 @@ In MixedModels.jl, you can specify different parameter values, such as the ones ```@example Main # typically we would use a lot more simulations # but we want to be quick in this example -sim = parametricbootstrap(MersenneTwister(12321), 20, m0; β=β, σ=σ, θ=θ) +sim = parametricbootstrap(StableRNG(12321), 20, m0; β=β, σ=σ, θ=θ) ``` As mentioned above, the ordering within θ is dependent on the entire design, so if you embed the simulation code in a loop iterating over different numbers of subjects and items, it's probably better to write it as: ```julia -sim = parametricbootstrap(MersenneTwister(12321), 20, m0; +sim = parametricbootstrap(StableRNG(12321), 20, m0; β=β, σ=σ, θ=createθ(m0; subj=re_subj, item=re_item)) ``` @@ -195,3 +195,5 @@ We can of course convert the row table into a DataFrame: ```@example Main DataFrame(ptbl) ``` + +For a more thorough walkthrough — including how to adapt effect sizes from pilot data, create designs from scratch, and loop over subject and item counts for power curves — see the [Beginner Friendly Tutorial](@ref "Power Analysis and Simulation Tutorial"). diff --git a/docs/src/simulation_tutorial.md b/docs/src/simulation_tutorial.md index 92ce9c3..bb3b44a 100644 --- a/docs/src/simulation_tutorial.md +++ b/docs/src/simulation_tutorial.md @@ -40,8 +40,8 @@ Here we define how many model simulations we want to do. A large number will giv It is useful to set it to a low number for testing, and increase it for your final analysis. ```@example Main -# for real power analysis, set this much higher -nsims = 100 +# for real power analysis, set this much higher (e.g. 1000) +nsims = 10 ``` ## Use existing data to simulate new data @@ -105,6 +105,10 @@ You can use the `parametricbootstrap()` function to run `nsims` iterations of da Set up a random seed to make the simulation reproducible. You can use your favourite number. +!!! note "Why `StableRNG`?" + We use `StableRNG` from the [StableRNGs.jl](https://github.com/JuliaRandom/StableRNGs.jl) package rather than `Random.MersenneTwister`. + `StableRNG` guarantees the same sequence of random numbers across Julia versions and platforms, which is important for reproducible simulation results in collaborative or long-running projects. + To use multithreading, you need to set the number of worker threads you want to use. In VS Code, open the settings (gear icon in the lower left corner) and search for "thread". Set `julia.NumThreads` to the number of threads you want to use (at least 1 less than the total number of processor cores available, so that you can continue watching YouTube while the simulation runs). @@ -127,7 +131,7 @@ df = DataFrame(kb07_sim.allpars); first(df, 12) ``` -The dataframe `df` has 4500 rows: 9 parameters, each from 500 iterations. +The dataframe `df` has `nsims * npar` rows, where `npar` is the total number of parameters. ```@example Main nrow(df) ``` @@ -379,8 +383,8 @@ Because of this trick, we need to specify the model along with the random effect We can install these parameter in the `parametricbootstrap()`-function or in the model like this: ```@example Main -# need the fully qualified name here because Makie also defines update! -MixedModelsSim.update!(kb07_m, item=re_item, subj=re_subj) +# MixedModels and Makie both export `update!`, so we use the full module path to disambiguate +MixedModelsSim.update!(kb07_m; item=re_item, subj=re_subj) DisplayAs.Text(ans) # hide ``` @@ -475,7 +479,11 @@ Specify `β`, `σ`, and `θ`, we just made up this parameter: ```@example Main new_beta = [0., 0.25, 0.25, 0.] new_sigma = 2.0 -new_theta = [1.0, 1.0] +# create_re with a single argument specifies the random-effect SD relative to the residual SD +re_subj_simple = create_re(1.0) +re_item_simple = create_re(1.0) +# createθ sorts the random effects into the order MixedModels.jl expects for this model +new_theta = createθ(m1; subj=re_subj_simple, item=re_item_simple) ``` Run nsims iterations: @@ -693,7 +701,7 @@ Define formula: ```@example Main kb07_f = @formula(rt_trunc ~ 1 + spkr + prec + load + (1 | subj) + (1 + prec | item)); ``` -a + Specify `β`, `σ`, and `θ`: ```@example Main @@ -734,7 +742,7 @@ nothing # hide ### Run the loop: ```@example Main -@showprogress for subj_n in sub_ns, item_n in item_ns +@showprogress enabled=isinteractive() for subj_n in sub_ns, item_n in item_ns # Make balanced fully crossed data: fake = simdat_crossed(subj_n, item_n; subj_btwn = subj_btwn, @@ -753,8 +761,8 @@ nothing # hide fake_df = fake_df[idx, :] rename!(fake_df, :dv => :rt_trunc) - # create the model: - fake_m = LinearMixedModel(kb07_f, fake_df, contrasts=contrasts); + # LinearMixedModel creates an unfitted model; parametricbootstrap uses the β, σ, θ we supply + fake_m = LinearMixedModel(kb07_f, fake_df; contrasts=contrasts); local new_theta = createθ(fake_m; item=re_item, subj=re_subj) # Run nsims iterations: fake_sim = parametricbootstrap(rng, nsims, fake_m, @@ -792,7 +800,7 @@ function subplot_power(f::Figure, dat, coefname) return ax end -fig = Figure(; resolution=(1300,300)) +fig = Figure(; size=(1300,300)) for (idx, cn) in enumerate(sort(unique(d.coefname))) fig[1, idx] = subplot_power(fig, d, cn) end @@ -800,6 +808,33 @@ Colorbar(fig[1, end+1]; label="power", vertical=true, colorrange=[0,1], colormap fig ``` +## Troubleshooting + +### `PosDefException: matrix is not Hermitian; Cholesky factorization failed` + +This error from `create_re()` means the values you provided do not form a valid positive-definite covariance matrix. +The most common cause is supplying values that are too precise relative to what the data actually support — remember that these parameters are inherently uncertain estimates. +Round your values to 2–3 significant figures and try again. +If the error persists, check that your correlation matrix is valid (diagonal entries must be 1.0, off-diagonal entries must be in the range (−1, 1), and the matrix must be positive-definite). + +### θ ordering errors in loops + +The internal ordering of the θ vector depends on the full model structure, including the number of subjects and items. +**Always use `createθ(model; name=re, ...)` inside loops** that vary the design — never construct θ by hand or compute it outside the loop and reuse it, as the ordering may differ across iterations. + +### `update!` name conflict with Makie + +Both `MixedModelsSim` and `Makie` export a function called `update!`. +If you have `using CairoMakie` (or any Makie backend) in scope alongside `using MixedModelsSim`, calling `update!` will be ambiguous. +Use the fully-qualified form `MixedModelsSim.update!(model; ...)` to avoid the conflict. + +### Model response must be non-constant before calling `update!` + +`update!` refits the model after installing new random-effects parameters. +If the dependent variable column contains only zeros or a constant, the refit will fail with a `PosDefException`. +This can happen when `simdat_crossed` has just been called and the `dv` column has not yet been populated with meaningful values. +In that case, either fit the model first with `fit(MixedModel, formula, data)` or pass the parameters directly to `parametricbootstrap()` without calling `update!` beforehand. + ## Acknowledgements The text here is based on a tutorial presented at a ZiF workshop by Lisa DeBruine (Feb. 2020) and presented again by Phillip Alday during the SMLP Summer School (Sep. 2020). diff --git a/docs/textchunks/re.md b/docs/textchunks/re.md deleted file mode 100644 index 64def27..0000000 --- a/docs/textchunks/re.md +++ /dev/null @@ -1,67 +0,0 @@ -## Assemble the Random Effects - -The hard part in simulating right now is specifying the random effects. -We're working on making this bit easier, but you need to specify the variance-covariance matrix of the random effects. You can see what this -looks like: - -```@example Main -vc = VarCorr(m0) -``` - -```@example Main -vc.σρ -``` - -In matrix form: - -```@example Main -m0.λ -``` - -```@example Main -m0.λ[1] -``` - -```@example Main -m0.λ[2] -``` - -Note that the entries in `m0.λ` are stored in the same order as the random effects are listed in `VarCorr(m0)`. The off-diagonal elements are covariances and correspond to the correlations (the ρ's). -The on-diagonal elements are just the standard deviations (the σ's). -For this example, we'll just assume all the correlations and thus the covariances are 0 in order to make things simple. -Then we only have to worry about the diagonal elements. - -Let's assume that the variability -- between items - - in the intercept is 1.3 times the residual variability - - in age is 0.35 times the residual variability - - in context is 0.75 times the residual variability -- between subjects - - in the intercept is 1.5 times the residual variability - - in frequency is 0.5 times the residual variability - - in context is 0.75 times the residual variability - -Then we'll have the following λ matrices: - -```@example Main -λitem = LowerTriangular(diagm([1.3, 0.35, 0.75])) -``` - -```@example Main -λsubj = LowerTriangular(diagm([1.5, 0.5, 0.75])) -``` - -For the actual optimization process, MixedModels.jl actually uses a flattened version of these stored in the θ vector. (More precisely, these λ matrices are derived from the θ vector.) - -```@example Main -isapprox(m0.θ, [flatlowertri(m0.λ[1]); flatlowertri(m0.λ[2])]) -``` - -With that in mind, we can assemble our θ vector for simulation: - -```@example Main -# make sure these are in the same order as in the model summary! -θ = [flatlowertri(λitem); flatlowertri(λsubj)] -``` - -We can check that we got these right by installing these parameter values into the model: diff --git a/src/utilities.jl b/src/utilities.jl index 4a8f4eb..2eadae1 100644 --- a/src/utilities.jl +++ b/src/utilities.jl @@ -57,7 +57,7 @@ cyclicshift('a':'d', 8) ``` """ function cyclicshift(v::AbstractVector, nrow) - # The return value i used to counterbalance levels of conditions in [`withinitem`](@ref). + # The return value is used to counterbalance levels of conditions across rows. vlen = length(v) return [v[1 + (i + j) % vlen] for i in 0:(nrow - 1), j in 0:(vlen - 1)] end