Just enrolled in the 2025 ππππ ππ§π π’π§πππ«π’π§π ππ¨π¨π¦πππ¦π© by DataTalksClub! π
Can't wait to explore data engineering and grow with an amazing cohort. Big shoutout to DataTalksClub for this awesome opportunity!
#DataEngineering #LearningInPublic
14.01.2025 03:12
π 1
π 0
π¬ 0
π 0
For those who donβt feel like they fit into my Grumpy Machine Learners list (which I still need to update based on 100+ requests) Iβve created another starter pack:
go.bsky.app/Js7ka12
(Self) nominations welcome.
22.11.2024 18:40
π 79
π 13
π¬ 37
π 1
The Ultimate Guide to PyTorch Contributions
Tensors and Dynamic neural networks in Python with strong GPU acceleration - pytorch/pytorch
For those who wonder about the best way to start contributing to pytorch or open-source projects, here are the top three pointers I'd share:
1. The Ultimate Guide to PyTorch Contributions github.com/pytorch/pyto...
For pytorch core that should be the n1 item on your list.
23.11.2024 14:13
π 18
π 3
π¬ 2
π 0
Training variance is a thing and no one measures it because research models get trained once to beat the benchmark by 0.2 AP or whatever and then never trained again.
In prod one of the first things we do is train (the same model) a ton over different shuffled splits of the data in order to⦠1/3
22.11.2024 22:00
π 46
π 6
π¬ 2
π 3
You know the "πΉAI Overview" you get on Google Search?
I discovered today that it's repeating as fact something I made up 7 years ago as a joke.
"Kyloren syndrome" is a fictional disease I invented as part of a sting operation to prove that you can publish any nonsense in predatory journals...
22.11.2024 16:06
π 4607
π 1743
π¬ 123
π 109
Using Excel for optimization problems
YouTube video by Jeremy Howard
Here's a walk-through of a general-purpose approach to solving many types of optimization problem. It's often not the most efficient way, but it is often fast enough, and it doesn't require using different methods for different problems.
youtu.be/U2b5Cacertc
19.11.2024 23:13
π 128
π 5
π¬ 2
π 1
Book outline
Over the past decade, embeddings β numerical representations of
machine learning features used as input to deep learning models β have
become a foundational data structure in industrial machine learning
systems. TF-IDF, PCA, and one-hot encoding have always been key tools
in machine learning systems as ways to compress and make sense of
large amounts of textual data. However, traditional approaches were
limited in the amount of context they could reason about with increasing
amounts of data. As the volume, velocity, and variety of data captured
by modern applications has exploded, creating approaches specifically
tailored to scale has become increasingly important.
Googleβs Word2Vec paper made an important step in moving from
simple statistical representations to semantic meaning of words. The
subsequent rise of the Transformer architecture and transfer learning, as
well as the latest surge in generative methods has enabled the growth
of embeddings as a foundational machine learning data structure. This
survey paper aims to provide a deep dive into what embeddings are,
their history, and usage patterns in industry.
Cover image
Just realized BlueSky allows sharing valuable stuff cause it doesn't punish links. π€©
Let's start with "What are embeddings" by @vickiboykis.com
The book is a great summary of embeddings, from history to modern approaches.
The best part: it's free.
Link: vickiboykis.com/what_are_emb...
22.11.2024 11:13
π 652
π 101
π¬ 22
π 6