ENH: add parameter finder for degrees or freedom for students_t distribution#1385
ENH: add parameter finder for degrees or freedom for students_t distribution#1385dschmitz89 wants to merge 3 commits intoboostorg:developfrom
Conversation
|
CC @JacobHass8 would you have time to help out with a cursory initial review? |
| BOOST_CHECK_CLOSE( | ||
| students_t_distribution<RealType>::invert_probability_with_respect_to_degrees_of_freedom( | ||
| static_cast<RealType>(-6.96455673428326), | ||
| static_cast<RealType>(0.01)), | ||
| static_cast<RealType>(2), | ||
| tol_inv); |
There was a problem hiding this comment.
I see these test values are used in quite a few places. It would be nice to combine all these into a single function which checks every method. This ensures we only have to define the values for the spot check in one place. It also cuts down on the amount of code. Maybe this would be for a separate PR though?
| // | ||
| // Step 1: Edgeworth warm start. | ||
| // F(x; nu) ~ Phi(x) - phi(x)(x + x^3)/(4*nu) + phi(x)(3x + 5x^3 + 7x^5 - 3x^7)/(96*nu^2) | ||
| // Substituting u = 1/nu and setting F(x; nu) = p gives: | ||
| // B*u^2 - A*u + (Phi(x) - p) = 0 | ||
| // where: | ||
| // A = phi(x) * (x + x^3) / 4 | ||
| // B = phi(x) * (3x + 5x^3 + 7x^5 - 3x^7) / 96 | ||
| // | ||
| normal_distribution<RealType, Policy> std_normal(0, 1); | ||
| RealType phi = pdf(std_normal, x_abs); | ||
| RealType Phi = cdf(std_normal, x_abs); | ||
|
|
||
| RealType x2 = x_abs * x_abs; | ||
| RealType x3 = x2 * x_abs; | ||
| RealType x5 = x3 * x2; | ||
| RealType x7 = x5 * x2; | ||
|
|
||
| RealType A = phi * (x_abs + x3) / 4; | ||
| RealType B = phi * (3 * x_abs + 5 * x3 + 7 * x5 - 3 * x7) / 96; | ||
| RealType C = Phi - p_adj; | ||
|
|
||
| RealType hint = static_cast<RealType>(0.01); | ||
| RealType discriminant = A * A - 4 * B * C; | ||
| if (discriminant >= 0 && B != 0) | ||
| { | ||
| RealType sqrt_disc = sqrt(discriminant); | ||
| // Two roots of B*u^2 - A*u + C = 0; pick the smallest positive u (largest nu = 1/u). | ||
| RealType u1 = (A - sqrt_disc) / (2 * B); | ||
| RealType u2 = (A + sqrt_disc) / (2 * B); | ||
| RealType u = -1; | ||
| if (u1 > 0 && u2 > 0) | ||
| u = (u1 < u2) ? u1 : u2; | ||
| else if (u1 > 0) | ||
| u = u1; | ||
| else if (u2 > 0) | ||
| u = u2; | ||
|
|
||
| if (u > 0) | ||
| { | ||
| RealType nu_hat = 1 / u; | ||
| // Step 2: validate by checking relative residual of the exact CDF. | ||
| students_t_distribution<RealType, Policy> t_hat(nu_hat); | ||
| RealType exact_cdf = cdf(t_hat, x_abs); | ||
| RealType residual = (exact_cdf > p_adj) ? (exact_cdf - p_adj) : (p_adj - exact_cdf); | ||
| RealType relative_residual = (p_adj != 0) ? residual / p_adj : residual; | ||
| if (relative_residual <= tools::epsilon<RealType>() * 4) | ||
| return nu_hat; // Edgeworth estimate is already exact to machine precision. | ||
| if (relative_residual <= static_cast<RealType>(0.1)) | ||
| hint = nu_hat; | ||
| } | ||
| } |
There was a problem hiding this comment.
With all this in the function, it was initially hard for me to parse what was going on. Could this be put in a separate function called something like boost::math::detail::edgeworth_approximation? Then you could replace all this with hint = boost::math::detail::edgeworth_approximation(x_abs, p).
Towards #1305
This adds a parameter finder for the student t distribution with respect to the degrees or freedom.
Disclaimer: this PR is heavily LLM supported as C++ is not (yet?) my forte.
I verified the math with a simple Python program before. I also ran a simple sanity script to see that the function actually executes. I am not so confident about the tests though as I cannot really decipher the output of
b2.Approach: We use$df > 1$ where the t distribution behaves similar to the normal distribution but useless for low degrees of freedom (see the plot below). In this case or if it gives a relative error of the CDF worse than 10%, we fall back to an initial guess of 0.01.
bracket_and_solve_rootwith an initial guess from a second order Edgeworth expansion of the CDF. The initial guess is very good forDetailed approach for finding the initial guess
Given a quantile$x$ and a CDF value $p = P(T \leq x)$ , we want to recover the degrees of freedom $\nu$ .
Step 1 — Edgeworth warm start.
We use the 2nd-order Edgeworth expansion of the$t$ CDF in powers of $1/\nu$ :
where
Setting
where
The physically meaningful root (smallest positive$u$ , i.e. largest $\nu$ ) gives a closed-form starting estimate $\hat\nu = 1/u$ .
Step 2 — Validation.
We plug$\hat\nu$ into the exact $t$ CDF. If the relative residual $|F(x;\hat\nu) - p|/|p|$ exceeds 10%, we fall back to a safe low starting value $\nu_0 = 10^{-2}$ .