Nic Crane's Avatar

Nic Crane

@niccrane

Independent R consultant. Apache Arrow PMC Member & #rstats πŸ“¦ maintainer. Arrow course launching early 2026: https://big-data-r.thinkific.com/ More of my stuff at https://niccrane.com/

2,508
Followers
157
Following
103
Posts
27.09.2023
Joined
Posts Following

Latest posts by Nic Crane @niccrane

Preview
LLM-Assisted Issue Triage for Open Source Maintainers Nic Crane

I built a GitHub issue classifier for Apache Arrow issue language using {ellmer} - super simple and almost 100% accuracy. Blog post: niccrane.com/posts/llm-issue-triage/

#rstats #ai #llms

10.03.2026 14:00 πŸ‘ 8 πŸ” 3 πŸ’¬ 0 πŸ“Œ 0

In the shower thinking "wouldn't it be cool to combine LLM tool calls and have them run code but in a constrained way" & then "it needs some kind of intermediate representation; how would we validate whatever it produces?" & then realised my idea wasn't novel & just the motivation for text-to-sql πŸ˜…

09.03.2026 06:43 πŸ‘ 3 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

I remember at posit::conf last year there was mention of posit::conf Europe 2026 - anyone know if this is still a thing? #rstats #positconf #posit

06.03.2026 11:03 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Huge thanks to the organisational team for putting on such an excellent event! πŸ’œπŸŒˆ

25.02.2026 21:32 πŸ‘ 5 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
Preview
Schedule – rainbowR conference

Excited for all of the talks tomorrow, check out the schedule here if you havent' seen it! conference.rainbowr.org/schedule.html

25.02.2026 21:32 πŸ‘ 3 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Whew, and it's done! Thanks to everyone who came to my RainbowR workshop on LLMs for Data Analysis in #rstats! First time with that content in front of an audience, so I appreciate the excellent questions folks asked (and double thanks to everyone who filled in the feedback forms!)

25.02.2026 21:32 πŸ‘ 18 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0
Video thumbnail

"Working with agents is a lot more productive, but a lot less fun." Charlie Marsh on the weird world of building software right now. Full conversation on The Test Set.

24.02.2026 16:10 πŸ‘ 19 πŸ” 5 πŸ’¬ 0 πŸ“Œ 2

Sounds interesting, how well does it work for R code?

24.02.2026 08:04 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0



It's still experimental, so potentially some rough edges, but I think it's a great example of making sure the LLM benefits are tempered with what actually makes sense for *people*.

23.02.2026 15:15 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Instead of generating a load of comment, you get suggestions one at a time, which you can then choose to accept or reject, before it moves on to the next suggestion. It generates suggestions as it goes, so if you accept some changes but reject others, its suggestions change on the basis of the code.

23.02.2026 15:15 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Post image

There's promise in using LLMs for code review, but it's tricky things to make sure it's not overwhelming.

I was looking at this new experimental package by Simon Couch and I really love how it allows you to review code iteratively. #rstats #ai #llms

github.com/simonpcouch/...

23.02.2026 15:15 πŸ‘ 26 πŸ” 6 πŸ’¬ 3 πŸ“Œ 0
How Generative and Agentic AI Shift Concern from Technical Debt to Cognitive Debt This piece by Margaret-Anne Storey is the best explanation of the term cognitive debt I've seen so far. Cognitive debt, a term gaining traction recently, instead communicates the notion that …

Short musings on "cognitive debt" - I'm seeing this in my own work, where excessive unreviewed AI-generated code leads me to lose a firm mental model of what I've built, which then makes it harder to confidently make future decisions simonwillison.net/2026/Feb/15/...

15.02.2026 05:22 πŸ‘ 465 πŸ” 88 πŸ’¬ 42 πŸ“Œ 20

Should be there shortly!

29.01.2026 04:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Let's talk contributors! This release saw 44 contributors to the codebase! 38 worked on the C++ library, 3 on the R πŸ“¦, & 3 on both. 23 people made their first contribution! πŸŽ‰

Thanks to everyone who was involved!

28.01.2026 16:55 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Writing partitioned datasets on S3 no longer requires ListBucket permissions; useful if you have write-only access to a bucket.

28.01.2026 16:55 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
The following reproducible example:
library(arrow)                          
library(dplyr)
library(stringr)

df <- arrow_table(x = c("Apache", "Arrow", "23.0.0"))                                                                       
df |> 
  filter(str_ilike(x, "ARROW")) |> collect() 
#> # A tibble: 1 Γ— 1
#>   x    
#>   <chr>
#> 1 Arrow

The following reproducible example: library(arrow) library(dplyr) library(stringr) df <- arrow_table(x = c("Apache", "Arrow", "23.0.0")) df |> filter(str_ilike(x, "ARROW")) |> collect() #> # A tibble: 1 Γ— 1 #> x #> <chr> #> 1 Arrow

We've added support for stringr::str_ilike() for case-insensitive pattern matching.

28.01.2026 16:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Changelog

We're excited to announce the release of {arrow} 23.0.0 πŸΉπŸ“¦

Here's a roundup of the new features and changes in a 🧡

Full details can be found at arrow.apache.org/docs/r/news/

#rstats #apachearrow

28.01.2026 16:55 πŸ‘ 26 πŸ” 3 πŸ’¬ 2 πŸ“Œ 0

I mean, you could say the same thing about any R function; just a toy example - feel free to replace it with something more useful! πŸ˜‰

28.01.2026 02:46 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Yeah, there's some irony in the fact that I randomly chose that specific example, and then the results even showed the new features including the web fetch thing making my example redundant! πŸ˜† I shall have to think up a new example for when I'm teaching, but YAY, awesome new feature! πŸŽ‰

27.01.2026 23:27 πŸ‘ 2 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
First part of code - full code can be found at https://gist.github.com/thisisnic/eae09dbd4594e2cff75d156a8bab3f59

First part of code - full code can be found at https://gist.github.com/thisisnic/eae09dbd4594e2cff75d156a8bab3f59

First part of code - full code can be found at https://gist.github.com/thisisnic/eae09dbd4594e2cff75d156a8bab3f59

First part of code - full code can be found at https://gist.github.com/thisisnic/eae09dbd4594e2cff75d156a8bab3f59

Tool calling lets LLMs run R functions; in this example I let an LLM ask my R session to check the latest {ellmer} updates by scraping the news page and when I ask the LLM "what's new in ellmer?", it works with what comes back.

{ellmer} website: ellmer.tidyverse.org

#rstats #llms #ai #datascience

27.01.2026 18:04 πŸ‘ 4 πŸ” 0 πŸ’¬ 2 πŸ“Œ 0
Line chart showing percent correct on the y-axis and three conditions on the x-axis: Baseline, Intuitive, and Mocked. Three lines represent GPT-5.2, Claude Opus 4.5, and Gemini 2.5 Pro. All three models score between 93-98% on baseline, then drop on intuitive and mocked conditions. All three perform the worst on the mocked condition.

Line chart showing percent correct on the y-axis and three conditions on the x-axis: Baseline, Intuitive, and Mocked. Three lines represent GPT-5.2, Claude Opus 4.5, and Gemini 2.5 Pro. All three models score between 93-98% on baseline, then drop on intuitive and mocked conditions. All three perform the worst on the mocked condition.

More on LLMs and plot interpretation: they do fine in normal conditions, but struggle when the plot conflicts strongly with their priors.

@simonpcouch.com and I investigated why and what might help: posit.co/blog/llm-plo...

22.01.2026 21:31 πŸ‘ 24 πŸ” 5 πŸ’¬ 1 πŸ“Œ 0
Code in which text from wikipedia article being passed into chat_structured method to extract dates and events

Code in which text from wikipedia article being passed into chat_structured method to extract dates and events

I love "structured output" as a way of extracting data from text as data frame. 🎯

Image shows using the {ellmer} package and how using type_array(type_object(...)) automatically returns a data frame in R πŸ”§

{ellmer} website: ellmer.tidyverse.org

#rstats #llms #ai #datascience

22.01.2026 15:10 πŸ‘ 15 πŸ” 2 πŸ’¬ 0 πŸ“Œ 0
Preview
Posit::conf(2026) Call for Talks - Posit posit::conf(2026) is coming September 14-16 to Houston, TX, and we're looking for talks!

posit::conf(2026) call for talks is now open! If you're an #RStats or #Python user, have a great DS workflow to share, or have some lessons learned, we'd love to hear from you.

πŸ”— posit.co/blog/posit-c...

22.01.2026 14:58 πŸ‘ 10 πŸ” 3 πŸ’¬ 1 πŸ“Œ 0

Oh, fascinating. I'm imagining that it stops it from being over-reliant on the retrieved information or interpreting it too literally maybe? Would love to hear more about how that ends up working out!

20.01.2026 18:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0
vitals logo (a teddy bear with a stethoscope), above output from running evaluation on a dataset, with 2 correct responses from the LLM and 1 incorrect response

vitals logo (a teddy bear with a stethoscope), above output from running evaluation on a dataset, with 2 correct responses from the LLM and 1 incorrect response

RAG doesn't guarantee reliability πŸ€”

Built a RAG chatbot for a course & tested the same question+model three times with {vitals} - one run got it wrong. ❌

This is why evals matter: catch inconsistencies you'd miss manually. πŸ”

vitals.tidyverse.org

#rstats #llms #ai #datascience

20.01.2026 14:54 πŸ‘ 10 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

Excellent post about Claude Code's actual energy usage πŸ’‘

Most AI energy posts only look at single queries. Simon breaks down full coding sessions - much higher, but still about the same as running a dishwasher once a day.

simonpcouch.com/blog/2026-01-20-cc-impact/

#rstats #ai #llms

20.01.2026 14:48 πŸ‘ 13 πŸ” 0 πŸ’¬ 1 πŸ“Œ 1
main image for Scaling Up Data Analysis in R with Arrow R Consortium webinar

main image for Scaling Up Data Analysis in R with Arrow R Consortium webinar

R Consortium webinar: Scaling up data analysis in R with Arrow. Learn larger-than-memory workflows, why Parquet matters, and where DuckDB fitsβ€”w/ Dr Nic Crane (Arrow R maintainer; Apache Arrow PMC). Register: r-consortium.org/webinars/sca... #rstats #arrow @niccrane.bsky.social

20.01.2026 00:54 πŸ‘ 19 πŸ” 8 πŸ’¬ 0 πŸ“Œ 0
Post image

Speak at posit::conf(2026) and share your R & Python stories!

Accepted speakers get:
✨ Travel & lodging help
✨ Free conference pass
✨ Professional coaching

Apply by Feb 6 to join us Sept 14-16 in Houston, TX!

Submit here: pos.it/conf-talk-2026

#positconf2026 #rstats #pydata

15.01.2026 15:40 πŸ‘ 30 πŸ” 12 πŸ’¬ 0 πŸ“Œ 2
Preview
Survey on Learning Goals for LLMs in Data

Building LLM training materials for R users (inc a RainbowR conference workshop) and want to know what topics people care about. ~10 min survey, results shared openly!

forms.gle/uAtVwpRhKFVU...

Boosts appreciated πŸ€– #rstats #llms #ai

15.01.2026 13:58 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0