Howβd it do?
Howβd it do?
Qwen is probably best out there right now: ollama.com/library/qwen...
If you want a nicer UI, check out OpenWebUI. It presents a nice ChatGPT-esque web UI with history and etcβ¦.
Super excited about PydanticAI. Looking forward to taking it out for a spin.
My hunch is that they can write machine code right now well enough. I've never seen any evals on it, though.
One thing to consider is portability. Machine code is denser than source code, but I'd bet cross compiling source code to 50 distros is far cheaper from a compute perspective.
But yeah I guess bottom line, rag can get you far. Wonβt know where it breaks until it does unfortunately. I look forward to a world where RAG systems can monitor themselves and signal to a user βhey it might be time to do some fine tunings!β
Depends on the use case. If the query is βwhat is my most controversial opinion across all my notes?β then rag can easily fall over unless you anticipated it ahead of time in the indexing pipeline. Thatβs admittedly an extreme example, but the spectrum between that & simple fact retrieval is blurry
Yeah I get what youβre saying. But Iβd caution against dismissing people because they donβt speak for _everyone_.
I am an expert π and while I trust LLMs for many things, me and most of my friends very much would not trust an LLM machine code output.
What is total dataset size in bytes? If complex reasoning across the whole set of notes is required for your use case β it could be! β RAG will fall over on you.
As an aside to the broader goal of the below thread, this is a question so many people have right now: βwhen do I start fine tuning?β
Iβve yet to see good answers.
We are still so early!!
Have you done any experiments with your benchmarks going from 1 to 100 examples to see if accuracy regresses?
I think it can[1] but we donβt do it because:
1) we donβt trust the LLM enough. We want to review the code.
2) high level languages give you a higher density of expression per token. i.e. it takes less tokens so you get faster answers
[1] chatgpt.com/share/674db3...
In Context Learning is underrated.
Transcript of Hard Fork ep 111: Yeah. And I could talk for an hour about transformers and why they are so important. But I think it's important to say that they were inspired by the alien language in the film Arrival, which had just recently come out. And a group of researchers at Google, one researcher in particular, who was part of that original team, was inspired by watching Arrival and seeing that the aliens in the movie had this language which represented entire sentences with a single symbol. And they thought, hey, what if we did that inside of a neural network? So rather than processing all of the inputs that you would give to one of these systems one word at a time, you could have this thing called an attention mechanism, which paid attention to all of it simultaneously. That would allow you to process much more information much faster. And that insight sparked the creation of the transformer, which led to all the stuff we see in Al today.
Did you know that attention across the whole input span was inspired by the time-negating alien language in Arrival? Crazy anecdote from the latest Hard Fork podcast (by @kevinroose.com and @caseynewton.bsky.social). HT nwbrownboi on Threads for the lead.
Playing around with @anthropic.com MCP stuff, and found out the hard way that www.claudedesktop.com is not claude.ai/download π¬
Anyway it's working now!
#llm #genai
I work in it so Iβm in a bit of a bubble. What are some of the most egregious lies you see?
Ever wondered how AI autocomplete works? In this thread Iβll walk you through how we work with LLMs at @continuedev.bsky.social to decipher user intent and provide them with useful completions.
Continue is open source so Iβll post links to relevant code on Github at the end of this thread.
Ok very cool.
Do you run any benchmarks against your default prompt templates, and have you published them so others can compare different models or prompt/template tweaks?
Do you fine tune any of your models much or do you just work with prompt templating?
Long story short I think the change is a 10 year horizon. Not 2.
Only just recently have the models with long enough context length and recall across context to make retrieval work.
Itβs completely transformed how I work: writing code, tests, design docs; less time scouring stackoverflow or fighting with plantuml/mermaid making diagrams. Iβm far more productive.
But Iβm a special case.
I think the real unlock is going to be agents. This promise still hasnβt been realized.
How this? Canβt tell if itβs underdone.
Flowers enjoyed. Very nice flowers.
ππ»ββοΈ
@dannynewman.bsky.social dude what do we do here?
I should probably post something so I donβt look like a noob.
Ah. Feeds. Got it.
Joined up and said I was interested in tech, science, programming but Discover feed is none of that!
How do I find my genAI nerds?!