Skip to content
View aroramrinaal's full-sized avatar
🧨
Learning
🧨
Learning

Highlights

  • Pro

Block or report aroramrinaal

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. nanbeige4-3b-cold-start-sft nanbeige4-3b-cold-start-sft Public

    LoRA cold-start thinking SFT experiments on Nanbeige4-3B-Base using reasoning traces distilled from frontier models

    Python

  2. qwen2.5-1.5b-gsm8k-grpo-rlvr qwen2.5-1.5b-gsm8k-grpo-rlvr Public

    RLVR experiments on GSM8K using Hugging Face TRL GRPO, Qwen2.5-1.5B-Instruct, and Modal to study reward design in practice.

    Python

  3. gsm8k-grpo-prime-lab gsm8k-grpo-prime-lab Public

    Prime Intellect Lab scaffold repo for my second RL experiment, mainly centered on the GSM8K GRPO config used in the final run.