Ray Bai's Avatar

Ray Bai

@raybai07

Assistant Professor of Statistics at George Mason University, French bulldog owner, and pop culture enthusiast. Any views expressed on here are my own, not my employer's. πŸ‘¨β€πŸ«πŸ“Š πŸ“šπŸ‘¨β€πŸ³πŸ³οΈβ€πŸŒˆ raybai.net

209
Followers
210
Following
87
Posts
19.12.2023
Joined
Posts Following

Latest posts by Ray Bai @raybai07

I am pleased to share that my paper "VCBART: Bayesian Trees for Varying Coefficients" (with Sameer Deshpande, Cecilia Balocchi, Jennifer Sterling, and Jordan Weiss) has been published in the latest issue of Bayesian Analysis!

Read it here: doi.org/10.1214/24-B...

10.03.2026 03:23 πŸ‘ 4 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Seven Major Directions and Trends in Modern Statistics – Ray Bai

New blog post: "Seven Major Directions and Trends in Modern Statistics"! In this post, I summarize a few of the latest trends and prominent areas in the field of statistics.

raybai.net/seven-major-...

03.03.2026 18:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I often explain deep learning and DGMs to non-experts & students who are not familiar but are interested in exploring this area. I find it's very helpful to start by framing linear regression and logistic regression as special cases of neural networks with a single output layer.

25.02.2026 17:56 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Happening tomorrow at UMBC! Excited for my visit

19.02.2026 18:33 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(2/2) Never thought of myself as much of a probabilist either, but my recent work on DGMs delved into functional inequalities in probability theory to characterize transport maps. You just never know when these things will pop up or when you'll use them!

15.02.2026 19:22 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

(1/2) It's always a bit wild to me when something I learned many years ago comes up again. I wasn't sure I'd ever use differential equations again, but now with flow matching and diffusion models being the current state-of-the-art generative models, I'm reviewing a bit of ODEs.

15.02.2026 19:21 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

A bit late but group pic from the Maryland Statistics Symposium at Brinn Mathematics Research Center this past Dec! Left to right: Jianhui Zhou, Gemma Moran, Lizhen Lin, Alden Green, Ray Bai, Anindya Roy, Cindy Rush, Yubai Yuan, Yun Yang, Yang Feng, Anderson Ye Zhang, Yanyuan Ma

13.02.2026 15:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I hope Sinners wins the Academy Award for Best Picture this year. Not just because it was an incredible movie, but because as a longtime horror aficionado, this would signal a broader appetite for horror & other genre-bending films in the academy (justice for Get Out!).

12.02.2026 14:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

I'm giving a talk "Deep Generative Models for Statistical Problems: Methods, Computation, and Theory" at the UMBC Mathematics and Statistics Dept next Friday, Feb. 20 from 11:00 am-12:00 pm! Come join if you're in the area. mathstat.umbc.edu/events/event...

10.02.2026 16:31 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 1

😍

09.02.2026 14:40 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This Super Bowl game is fairly boring, but absolutely loved the Halftime Show and the other musical performances! Green Day, Lady Gaga, Bad Bunny ❀️❀️

09.02.2026 02:21 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Congrats to my collaborator and former student Qingyang Liu (I taught him in 2 classes, served on his dissertation committee, and have co-authored several papers with him)! He will be joining @wakeforeststats.bsky.social as an Assistant Professor in July. πŸ₯³Great department!

03.02.2026 23:44 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Open-Rank, Tenured/Tenure-Track Statistics Faculty - Fairfax, VA, Virginia, United States Department: Col of Engineering and Computing Classification: 9-month Instructional Faculty Job Category:Β Instructional Faculty Job Type:Β Full-Time Work Schedule:Β Full-time (1.0 FTE, 40 hrs/wk) Locatio...

To anyone who is on the job market in Statistics this academic year: the George Mason University (GMU) Department of Statistics is hiring for open-rank, tenure-track or tenured positions!

For full consideration, apply by January 14 at this link: tinyurl.com/6mjs8fye

08.01.2026 03:14 πŸ‘ 1 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

So maddening what happened at the University of Nebraska-Lincoln

magazine.amstat.org/blog/2026/01...

06.01.2026 17:49 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Our paper "Quantifying predictive uncertainty of aphasia severity in stroke patients with sparse heteroscedastic Bayesian high-dimensional regression" was published in the most recent issue of Computational Statistics. Read the paper here: doi.org/10.1007/s001...

02.01.2026 13:57 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Will you incorporate LLMs and AI prompting into the course in the future?
No.

Why won’t you incorporate LLMs and AI prompting into the course?
These tools are useful for coding (see this for my personal take on this).

However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

Will you incorporate LLMs and AI prompting into the course in the future? No. Why won’t you incorporate LLMs and AI prompting into the course? These tools are useful for coding (see this for my personal take on this). However, they’re only useful if you know what you’re doing first. If you skip the learning-the-process-of-writing-code step and just copy/paste output from ChatGPT, you will not learn. You cannot learn. You cannot improve. You will not understand the code.

In that post, it warns that you cannot use it as a beginner:

…to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability.

There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability.

The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma.

This isn’t a form of programming hazing, like β€œI had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

In that post, it warns that you cannot use it as a beginner: …to use Databot effectively and safely, you still need the skills of a data scientist: background and domain knowledge, data analysis expertise, and coding ability. There is no LLM-based shortcut to those skills. You cannot LLM your way into domain knowledge, data analysis expertise, or coding ability. The only way to gain domain knowledge, data analysis expertise, and coding ability is to struggle. To get errors. To google those errors. To look over the documentation. To copy/paste your own code and adapt it for different purposes. To explore messy datasets. To struggle to clean those datasets. To spend an hour looking for a missing comma. This isn’t a form of programming hazing, like β€œI had to walk to school uphill both ways in the snow and now you must too.” It’s the actual process of learning and growing and developing and improving. You’ve gotta struggle.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too):

Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that β€œwon’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.)

It’s hard, but struggling is the only way to learn anything.

This Tumblr post puts it well (it’s about art specifically, but it applies to coding and data analysis too): Contrary to popular belief the biggest beginner’s roadblock to art isn’t even technical skill it’s frustration tolerance, especially in the age of social media. It hurts and the frustration is endless but you must build the frustration tolerance equivalent to a roach’s capacity to survive a nuclear explosion. That’s how you build on the technical skill. Throw that β€œwon’t even start because I’m afraid it won’t be perfect” shit out the window. Just do it. Just start. Good luck. (The original post has disappeared, but here’s a reblog.) It’s hard, but struggling is the only way to learn anything.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy.

As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible:

To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use.

Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled.

So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting.

You’ve got to learn first.

You might not enjoy code as much as Williams does (or I do), but there’s still value in maintaining codings skills as you improve and learn more. You don’t want your skills to atrophy. As I discuss here, when I do use LLMs for coding-related tasks, I purposely throw as much friction into the process as possible: To avoid falling into over-reliance on LLM-assisted code help, I add as much friction into my workflow as possible. I only use GitHub Copilot and Claude in the browser, not through the chat sidebar in Positron or Visual Studio Code. I treat the code it generates like random answers from StackOverflow or blog posts and generally rewrite it completely. I disable the inline LLM-based auto complete in text editors. For routine tasks like generating {roxygen2} documentation scaffolding for functions, I use the {chores} package, which requires a bunch of pointing and clicking to use. Even though I use Positron, I purposely do not use either Positron Assistant or Databot. I have them disabled. So in the end, for pedagogical reasons, I don’t foresee me incorporating LLMs into this class. I’m pedagogically opposed to it. I’m facing all sorts of external pressure to do it, but I’m resisting. You’ve got to learn first.

Some closing thoughts for my students this semester on LLMs and learning #rstats datavizf25.classes.andrewheiss.com/news/2025-12...

09.12.2025 20:17 πŸ‘ 331 πŸ” 99 πŸ’¬ 14 πŸ“Œ 31
Exam question: After you have explained 97% confidence to Bob, he responds, "I see. 97% is pretty good, but it could be great if we can make a 100% confidence interval." What is your response to this?

Student's answer: "Bob, you are a fool amongst fools. Truly, I pity you. A 100% confidence interval would be useful as it would give us a result of all real numbers. Taht's the only way to be 100% sure our true mean is in the interval; if every number could be included."

Exam question: After you have explained 97% confidence to Bob, he responds, "I see. 97% is pretty good, but it could be great if we can make a 100% confidence interval." What is your response to this? Student's answer: "Bob, you are a fool amongst fools. Truly, I pity you. A 100% confidence interval would be useful as it would give us a result of all real numbers. Taht's the only way to be 100% sure our true mean is in the interval; if every number could be included."

Grading my final exams for undergrad probability & statistics, and this response to one of my questions seriously made me laugh out loud for minutes. Should I give Extra Credit for the student's response? "Bob, you are a fool amongst fools." πŸ˜‚πŸ˜‚πŸ˜‚

12.12.2025 18:16 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Our R package for VCBART, or fitting BART-based varying coefficient models, is now available on CRAN! Useful for flexible regression modeling + can be used to estimate heterogeneous treatment effects in causal inference by specifying X and Z appropriately. Check it cran.r-project.org/web/packages...

10.12.2025 04:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Colleges Are Preparing to Self-Lobotomize The skills that students will need in an age of automation are precisely those that are eroded by inserting AI into the educational process.

Yes. "... the skills that future graduates will most need in the AI eraβ€”creative thinking, the capacity to learn new things, flexible modes of analysisβ€”are precisely those that are likely to be eroded by inserting AI into the educational process."

08.12.2025 06:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Yesterday was a very sad day for all of Statistics: The University of Nebraska Board of Regents voted 9-1 (with 2 abstentions) to eliminate its Department of Statistics (https://lnkd.in/gJzJ_yki)… | C... Yesterday was a very sad day for all of Statistics: The University of Nebraska Board of Regents voted 9-1 (with 2 abstentions) to eliminate its Department of Statistics (https://lnkd.in/gJzJ_yki). The...

A sad day for the statistics community. U. of Nebraska Board of Regents voted to eliminate UNL's Department of Statistics.

06.12.2025 16:23 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image

I'm at the Brin Mathematics Research Center today for the Maryland Statistics Symposium! Presenting my work on generative quantile regression w/ former PhD student Dr. Shijie Wang (U. South Carolina '24) and Dr. Minsuk Shin of Yonsei U. (published in JCGS last year).

05.12.2025 14:20 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Maryland Statistical Symposium | Brin Mathematics Research Center

The Maryland Statistical Symposium looks awesome! brinmrc.umd.edu/fall25-mss/

So honored to be invited to speak at this event alongside many outstanding researchers, some of whose work I have followed and admired for years!

24.11.2025 17:12 πŸ‘ 1 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0
University of Nebraska-Lincoln Department of Statistics seminar "The Metrics" on November 6, 2025
University of Nebraska-Lincoln Department of Statistics seminar "The Metrics" on November 6, 2025 YouTube video by Chris Bilder

If you're following the #UNL #statistics saga (proposed for elimination based on bad stats), you might find the seminar we gave yesterday interesting... youtu.be/fUk2R0UYWpA

It was weird to rail against someone for an hour, but strangely cathartic, and the #datavis seems to have been effective?

07.11.2025 14:33 πŸ‘ 9 πŸ” 8 πŸ’¬ 1 πŸ“Œ 1
Post image

Congrats to my student Leah Wood for successfully defending her Senior honors thesis "Spatiotemporal Modeling of Maternal Mortality in South Carolina 2018-2023"! Leah will pursue a Masters in Biostatistics next.

This was on par with an excellent Masters thesis, tbh. Great job!

05.11.2025 20:06 πŸ‘ 6 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image Post image Post image Post image

Having a great time visiting Columbia, SC and catching up with old friends and coworkers! I will always be grateful to @uofscstatistics.bsky.social for helping me to launch my career!

05.11.2025 02:54 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

nailed it!

04.11.2025 06:36 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

One week till my trip to Columbia, SC to see my Honors student Leah defend her senior thesis! She did an excellent job on Bayesian spatiotemporal modeling of maternal mortality in South Carolina from 2018-2023. She coded up the model in Stan & R and produced some very nice maps!

29.10.2025 00:40 πŸ‘ 3 πŸ” 1 πŸ’¬ 0 πŸ“Œ 0

Excited to give a talk at the Maryland Statistical Symposium at the Brin Mathematics Research Center this December! Looking forward to connecting with many outstanding statistics researchers in the DMV area and the mid-Atlantic region.

21.10.2025 15:15 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Bayesian group regularization in generalized linear models with a continuous spike-and-slab prior - Annals of the Institute of Statistical Mathematics We study Bayesian group-regularized estimation in high-dimensional generalized linear models (GLMs) under a continuous spike-and-slab prior. Our framework covers both canonical and non-canonical link ...

My AISM paper "Bayesian group regularization in generalized linear models with a continuous spike-and-slab prior" is now online! I really appreciated the feedback from reviewers who wrote very thorough, high-quality reviews. A+ experience submitting here. tinyurl.com/yc6phfwv

18.10.2025 22:13 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Post image

Today is my 40th birthday, and I had a very special treat for it -- getting to meet one of my idols, Dr. Jianqin Fan! Dr. Fan's papers on SCAD penalty and sure independence screening for high-dimensional data were among the first papers I read as a PhD student. So inspiring!

26.09.2025 20:15 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0