Aakash Gupta (@skylord999)

Gaming an Agentic Benchmark – DABStep Leaderboard | Hacker News

Gaming an Agentic AI Benchmark

We gamed an agentic benchmark and hit #1 on the leaderboard
Scientific benchmarks optimize for transparency and reproducibility. But this creates a fundamental vulnerability...

news.ycombinator.com/item?id=4627...

15.12.2025 18:11 👍 1 🔁 0 💬 0 📌 0

Sharing our recent whitepaper in collaboration with MLCommons As Large language models become embedded into various applications and agents. There is a likelihood of them becoming a security risk… | ... Sharing our recent whitepaper in collaboration with MLCommons As Large language models become embedded into various applications and agents. There is a likelihood of them becoming a security risk. In...

Our recently published paper on LLM Safety
The study shows that ~90% of the available models (open LLMs + paid ones) will degrade under attack.
This opens up a Pandora's box of unanswered questions -- How safe are enterprise grade apps with LLM integrations?

www.linkedin.com/feed/update/...

13.11.2025 06:04 👍 1 🔁 0 💬 0 📌 0

Members violating the group protocols are sent a warning message and their violating messages are deleted.

27.10.2025 06:58 👍 0 🔁 0 💬 0 📌 0

You create a WhatsApp group, it even gains a lot of engagement. But then you struggle with moderation. Users simply dont want to follow the WhatsApp group rules.

We vibe coded a WhatsApp agent which moderates messages across groups.

27.10.2025 06:57 👍 1 🔁 0 💬 1 📌 0

Voice Recorder

you can test the voice cloning capacity at:

mimiclabs.thinkevolvelabs.com

21.10.2025 11:00 👍 0 🔁 0 💬 0 📌 0

I have created a microsite where you can do the same. The Best part is that its completely local and the opensource package can run on a cpu instance. So no need to run any complex workloads on a GPU cloud, but a voice can be cloned and replicated on your laptop.

youtu.be/XTSp0Q-90bA

21.10.2025 10:59 👍 0 🔁 0 💬 1 📌 0

Gemini 270m Fine tuning with MIRIAD dataset YouTube video by Think Evolve Consultancy

🚀 New video: Fine-tuning a <150MB LLM on 5.8M+ medical Q&A samples.
Runs on mobile or laptop — no GPU required!
Watch here 👉
youtu.be/GOQRKzrM3gA

29.08.2025 11:42 👍 1 🔁 0 💬 0 📌 0

LLMs are Language models. The latest version of ChatGPT5 appears to be experiencing a hallucination issue when asked a simple query.

The solution is to encourage them to think more critically and provide logical steps to their response.

09.08.2025 04:49 👍 0 🔁 0 💬 0 📌 0

A query on Sam Altman's investment gets stalled on Gemini. What could be the reason?

30.05.2025 02:44 👍 0 🔁 0 💬 0 📌 0

I often feel socially anxious while speaking, so I keep notes—but glancing down can feel awkward. Inspired by pro teleprompters, I built one for video calls. "Smooth Teleprompter" is a free Chrome extension we made with Replit, using our playful “vibe coding” approach to dev.

19.05.2025 10:26 👍 0 🔁 0 💬 0 📌 0

Rendering Blade Runner 2049 final scene

And in that final breath,
A machine finds something almost human.
Snow drifts through the poisoned sky, silent and slow,
a quiet witness to grace where none was expected.
It falls on steel and sorrow, soft as forgiveness...

24.04.2025 10:15 👍 0 🔁 0 💬 0 📌 0

Interestingly, as I write this, the landscape is rapidly evolving—a startup focused on MCP technology just received its first seed funding from HubSpot's CEO. The messaging middleware space continues to develop quickly in today's fast-paced tech environment. (3/3)

22.04.2025 10:56 👍 0 🔁 0 💬 0 📌 0

Model Context Protocol (MCP) is like a USB-C port for applications. It allows LLMs to connect with different apps on your local machine. This can be superbly useful for automation of intelligent tasks. MCP has been designed by Anthropic (2/3)

22.04.2025 10:56 👍 0 🔁 0 💬 1 📌 0

WhatsAPP MCP Windows Installation YouTube video by Think Evolve Consultancy

I decided to experiment with setting up WhatsApp Model Context Protocol (MCP) on a Windows system. Though Windows isn't ideally supported by the Model Context protocol, I wanted to create a comprehensive guide to help others navigate this process. (1/3)

www.youtube.com/watch?v=-B5x...

22.04.2025 10:56 👍 0 🔁 0 💬 1 📌 0

DeepSeek R1 Prevent Output in Chinese YouTube video by Think Evolve Consultancy

Agentic Systems exhibit autonomy, decision-making, and adaptability in achieving goals. They can analyze data, take actions, and refine their approach based on feedback, often functioning with minimal human intervention.

#DeepSeek #PersonalAssistant #AIforall

youtu.be/JaZvkpgnXck

27.02.2025 08:39 👍 2 🔁 1 💬 0 📌 0

The seemingly "simple" problem statements, their clarity masking decades of complexity. Unraveled, layer by layer, with each genuine interaction.

26.02.2025 16:30 👍 0 🔁 0 💬 0 📌 0

In summary, fine-tuning bridges the gap between general-purpose capabilities and task-specific excellence, enabling LLMs to deliver tailored, efficient, and high-performance solutions across diverse industries. 🚀
(10/10)

24.02.2025 03:56 👍 0 🔁 0 💬 0 📌 0

7️⃣ Faster Convergence

Starting with pre-trained language patterns accelerates training and ensures quicker convergence to optimal performance.

- Example: Fine-tuning a base model for academic paper summarization to assist researchers.

(9/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

6️⃣ Domain-Specific Performance
Customizes the model to the unique characteristics and language of a specific domain, ensuring accuracy and relevance.

- Example: Fine-tuning for financial risk analysis using historical market data and reports.
(8/n)

24.02.2025 03:56 👍 1 🔁 0 💬 1 📌 0

5️⃣ Task Adaptability

Enables broad versatility by adapting a single model to a range of tasks without requiring additional architectures.

- Example: Using the same base model for text summarization and sentiment analysis by fine-tuning it separately for each task.
(7/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

4️⃣ Efficient Deployment

Fine-tuned models are optimized for specific applications, ensuring faster and more accurate results in production environments.

- Example: Deploying a fine-tuned model for e-commerce product recommendations, tailored to user behavior.
(6/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

3️⃣ Improved Generalization

Enhances the model’s ability to perform well on specialized tasks by refining its understanding of nuanced requirements.

- Example: Training a model to excel in legal document review for compliance purposes.
(5/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

2️⃣ Reduced Data Needs

Instead of requiring massive datasets, fine-tuning focuses on smaller, task-specific datasets, making it practical even for resource-constrained scenarios.

- Example: Fine-tuning a model for medical diagnosis using a curated set of clinical notes and case studies.
(4/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

Example : Adapting GPT models for use in customer support chatbots, leverages their broad language understanding to quickly specialize in handling product-specific inquiries.

(3/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

1️⃣ Transfer Learning Efficiency

Fine-tuning builds upon the foundational knowledge of pre-trained models, significantly reducing computational and time requirements compared to training from scratch.
(2/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

The Importance of Fine-Tuning Large Language Models (LLMs)

Fine-tuning is a crucial step that unlocks the full potential of pre-trained models by adapting them to specific tasks and domains. Here’s why it matters, along with some practical examples:

(1/n)

24.02.2025 03:56 👍 0 🔁 0 💬 1 📌 0

Iteration 0

24.02.2025 00:15 👍 0 🔁 0 💬 0 📌 0

Iteration 2

24.02.2025 00:15 👍 0 🔁 0 💬 1 📌 0

Built a thread simulation in Replit in 30 minutes, despite no physics or coding background. Results are basic but eerily impressive. The last iteration to the first

Iteration 3

24.02.2025 00:15 👍 2 🔁 2 💬 1 📌 0

These two stages work hand-in-hand to create AI models that are both broad in understanding and precise in application. 🚀 (7/7)

22.02.2025 20:42 👍 0 🔁 0 💬 0 📌 0

Aakash Gupta

Latest posts by Aakash Gupta @skylord999