aroramrinaal

Follow

🧨

Learning

Mrinaal Arora aroramrinaal

🧨

Learning

Follow

25 followers · 6 following

Achievements

Achievements

Highlights

Pro

Pinned Loading

nanbeige4-3b-cold-start-sft nanbeige4-3b-cold-start-sft Public

LoRA cold-start thinking SFT experiments on Nanbeige4-3B-Base using reasoning traces distilled from frontier models

Python
qwen2.5-1.5b-gsm8k-grpo-rlvr qwen2.5-1.5b-gsm8k-grpo-rlvr Public

RLVR experiments on GSM8K using Hugging Face TRL GRPO, Qwen2.5-1.5B-Instruct, and Modal to study reward design in practice.

Python
gsm8k-grpo-prime-lab gsm8k-grpo-prime-lab Public

Prime Intellect Lab scaffold repo for my second RL experiment, mainly centered on the GSM8K GRPO config used in the final run.