Language Modeling from Scratch #Reference: https://stanford-cs336.github.io/spring2025/TokenizationTransformerMemory and ComputeMixtures of ExpertsCPU, GPU, CUDAGPU & MLKernels, Triton#