Prompting is temporary.
Structure is permanent.
When your repo is organized this way, Claude stops behaving like a chatbot…
…and starts acting like a project-native engineer.
Prompting is temporary.
Structure is permanent.
When your repo is organized this way, Claude stops behaving like a chatbot…
…and starts acting like a project-native engineer.
5️⃣ Local CLAUDE.md for risky modules
Put small files near sharp edges:
src/auth/CLAUDE.md
src/persistence/CLAUDE.md
infra/CLAUDE.md
Now Claude sees the gotchas exactly when it works there.
4️⃣ docs/ = Progressive Context
Don’t bloat prompts.
Claude just needs to know where truth lives:
• architecture overview
• ADRs (engineering decisions)
• operational runbooks
3️⃣ .claude/hooks/ = Guardrails
Models forget.
Hooks don’t.
Use them for things that must be deterministic:
• run formatter after edits
• run tests on core changes
• block unsafe directories (auth, billing, migrations)
2️⃣ .claude/skills/ = Reusable Expert Modes
Stop rewriting instructions.
Turn common workflows into skills:
• code review checklist
• refactor playbook
• release procedure
• debugging flow
Result:
Consistency across sessions and teammates.
1️⃣ CLAUDE.md = Repo Memory (keep it short)
This is the north star file.
Not a knowledge dump. Just:
• Purpose (WHY)
• Repo map (WHAT)
• Rules + commands (HOW)
If it gets too long, the model starts missing important context.
Claude needs 4 things at all times:
• the why → what the system does
• the map → where things live
• the rules → what’s allowed / not allowed
• the workflows → how work gets done
The Anatomy of a Claude Code Project 👇
Most people treat CLAUDE.md like a prompt file.
That’s the mistake.
If you want Claude Code to feel like a senior engineer living inside your repo, your project needs structure.
#Agentic #AI #Claude
If you enjoy system design, infrastructure, and data flow — engineering may suit you.
If you enjoy analysis, modeling, and problem-solving with algorithms — science may be your path.
A Data Scientist analyzes data, builds models, applies statistics, and translates patterns into actionable insights. They focus on prediction, experimentation, and business impact.
A Data Engineer designs pipelines, manages large-scale systems, ensures data reliability, and works heavily with cloud and distributed frameworks. They focus on performance, scalability, and architecture.
Data Engineer vs. Data Scientist: What’s the Difference?
One builds the data foundation.
The other turns data into intelligence.
- Using coding agents to increase the speed at which they build pipelines
- Crushing data siloes with data lakehouse architectures like Iceberg and Delta. Getting the entire company to agree upon business definitions
Data engineering is one of the few "safe" roles in the coming decade!
Data engineers in 2030 are:
- Able to handle all types of data: structured, semi-structured, and unstructured
- Integrating private data into AI in a privacy-compliant and efficient way using multi-tenant architectures
Things like Claude Code will make "building pipelines" easier, but data engineering is so much more than building pipelines!
Data engineering is projected to grow faster than AI engineering over the next decade, according to the World Economic Forum!
AI is not going to replace data engineering; it will make it increasingly more valuable!
- Typically 30–60% fewer tokens than JSON1
- Explicit lengths and fields enable validation
- Removes redundant punctuation (braces, brackets, most quotes)
- Indentation-based structure, like YAML, uses whitespace instead of braces
- Tabular arrays: declare keys once, stream data as rows
JSON:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
TOON:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Token-Oriented Object Notation (TOON) is a compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage. It's intended for LLM input as a lossless, drop-in representation of JSON data.
#dataengineering #llm
RAG is not just an integration problem. It’s a design problem. Each layer of this stack requires deliberate choices that impact latency, quality, explainability, and cost.
If you're serious about GenAI, it's time to think in terms of stacks—not just models.
Evaluation
Tools like Ragas, Trulens, and Giskard bring much-needed observability—measuring hallucinations, relevance, grounding, and model behavior under pressure.
Text Embeddings
The quality of retrieval starts here. Open-source models (Nomic, SBERT, BGE) are gaining ground, but proprietary offerings (OpenAI, Google, Cohere) still dominate enterprise use.
Open LLM Access
Platforms like Hugging Face, Ollama, Groq, and Together AI abstract away infra complexity and speed up experimentation across models.
Data Extraction (Web + Docs)
Whether you're crawling the web (Crawl4AI, FireCrawl) or parsing PDFs (LlamaParse, Docling), raw data access is non-negotiable. No context means no quality answers.
Vector Database
Chroma, Qdrant, Weaviate, Milvus, and others power the retrieval engine behind every RAG system. Low-latency search, hybrid scoring, and scalable indexing are key to relevance.
Frameworks
LangChain, LlamaIndex, Haystack, and txtai are now essential for building orchestrated, multi-step AI workflows. These tools handle chaining, memory, routing, and tool-use logic behind the scenes.
LLMs (Open vs Closed)
Open models like LLaMA 3, Phi-4, and Mistral offer control and customization. Closed models (OpenAI, Claude, Gemini) bring powerful performance with less overhead. Your tradeoff: flexibility vs convenience.
RAG Stack
Building with Retrieval-Augmented Generation (RAG) isn't just about choosing the right LLM. It's about assembling an entire stack—one that's modular, scalable, and future-proof.
#ai #rag #dataengineering
EtLT (Extract, transform, Load, Transform) (2/2)
Best for scenarios requiring strict data security/compliance (pre-load masking) while still benefiting from the speed and flexibility of cloud data warehouse transformations.
EtLT (Extract, transform, Load, Transform) (1/2)
Attempts to balance the data governance of ETL with the speed and flexibility of ELT. A minimal transformation step is performed before loading. Essential tasks like data cleaning, basic formatting, masking sensitive data for immediate compliance.