Marlowe Computing Spotlight: Elizabeth Schumann Group
Elizabeth Schumann is an assistant professor of music and the Billie Bennett Achilles Director of Keyboard Programs.
Chengyi Xing is a Stanford alumnus and research assistant in Elizabeth Schumann’s research group.
Leslie Hart is a music specialist at Bing Nursery School at Stanford.
In simple terms, how would you describe the research problem(s) you are trying to solve?
Many high-achieving students arrive at Stanford with immaculate transcripts after years of working in environments where outcomes matter and mistakes can feel costly. They are astonishingly competent. However, the pressure to be perfect can narrow the space for creative risk-taking and self-directed problem solving, even when the underlying talent is extraordinary. The tension between excellence and experimentation is especially visible in performance training.
In the music department, we are fortunate to work with brilliant young musicians who often come to Stanford already playing like professionals. However, some arrive less practiced in the exploratory work and problem solving needed to carry them to the next level of growth in both playing and research.
These habits are shaped long before college. In early childhood, we see that musical learning often flourishes through play, listening, imitation, and small variations supported by caregivers and teachers. At the same time, many common educational models place increasing weight on correctness and product. In some settings, early instruction can lean heavily on notation and error-avoidance in ways that leave less room for playful, ear-led exploration.
How are you addressing these challenges?
Through a grant from the Stanford Accelerator for Learning and the Stanford Institute for Human-Centered Artificial Intelligence (HAI), our goal is to build a tool that supports caregivers and educators and encourages child-led musical exploration through age-appropriate prompts.
These prompts are delivered through a screen-free companion in the form of a cute orange cat named Cheddar. The system is designed to help children revisit musical ideas and to help adults support this process without taking over. It also creates a consistent record of child-adult musical interaction that can enable careful analysis now and, if appropriate, longitudinal study of musical development and related learning processes in naturalistic settings.

Cheddar, screen-free companion in the form of a cute orange cat.
To make musical responses coherent and flexible, we are training a unified text-music model that can generate and transform musical material in practical ways, including accompaniment, source separation, instrument infill, key change, and filtering. This model is fine-tuned on interleaved text-music data and requires multi-machine, multi-GPU optimization, which is why we rely on Marlowe for scalable training runs.
How has Marlowe benefitted this work?
Marlowe makes it practical to move from single-GPU prototypes to multi-machine, multi-GPU training for large multimodal models. That scale is essential for iterating on model architecture, training recipes, and data pipelines without slowing progress.
Our north star is an interaction that feels natural, musical, and supportive. Marlowe supports the experimentation needed to reach that level of responsiveness while keeping training runs organized and reproducible. The flexible storage environment and integration with tools such as Globus also reduce friction in day-to-day workflow.

What advice would you give other Stanford researchers who might be interested in using Marlowe or HPC?
Benchmark and profile before scaling up. For many multimodal workloads, data loading and preprocessing become the bottleneck before compute. Time spent measuring and fixing throughput early saves substantial GPU time later.
Is there anything else you discovered or wanted to share about the experience using Marlowe and working with the staff?
We would love to see the Marlowe community develop and share more documentation and worked examples focused on distributed training, performance engineering, and best practices for complex multimodal data. Practical guidance on data layout, efficient streaming from storage, and multi-node debugging patterns would be especially valuable.
–
Marlowe is managed and supported by Stanford Data Science, the Vice Provost and Dean of Research, and Stanford Research Computing.
Learn more about the systems managed and supported by Stanford Research Computing.
DISCLAIMER: UIT News is accurate on the publication date. We do not update information in past news items. We do make every effort to keep our service information pages up-to-date. Please search our service pages at uit.stanford.edu/search.
What to read next:
Coming Soon: Email Warning Tags
Historic Tower House Gets a Modern AV Upgrade
