Mark Pors πŸ¦–'s Avatar

Mark Pors πŸ¦–

@pors

AI engineer. Previously co-founder and CTO at WatchMouse. Building https://paperzilla.ai

57
Followers
282
Following
100
Posts
07.11.2023
Joined
Posts Following

Latest posts by Mark Pors πŸ¦– @pors

arXiv: arxiv.org/abs/2603.107...

12.03.2026 10:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This paper showed up in my Paperzilla "Fast diffusion via flows/consistency" feed: paperzilla.ai/digest/ddce3...

12.03.2026 10:11 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Overall architecture of AlphaFlowTSE. Given a mixture waveform y and an enrollment utterance e, we compute complex STFT features and form the mixture feature Y and enrollment feature E (real/imaginary concatenation). During training, the backbone takes the current state feature zt; during inference we initialize z0 = Y . The enrollment feature is concatenated as a temporal prefix, yielding [Eβˆ₯zt] (or [Eβˆ₯z0] at inference), which is fed to the UDiT backbone. The backbone is conditioned via AdaLN on the absolute time t and the interval length βˆ† = r βˆ’ t (with r = 1 at inference), and predicts the mean velocity for finite-interval transport, denoted uΞΈ(t, r, [Eβˆ₯zt]). One-step inference (NFE= 1) produces an estimated complex STFT SΛ† = (SΛ†Re, SΛ†Im), which is converted to the
target waveform sˆ by iSTFT. The dashed module is an optional mixing-ratio predictor used only in the background-to-target ablation to predict the start coordinate

Overall architecture of AlphaFlowTSE. Given a mixture waveform y and an enrollment utterance e, we compute complex STFT features and form the mixture feature Y and enrollment feature E (real/imaginary concatenation). During training, the backbone takes the current state feature zt; during inference we initialize z0 = Y . The enrollment feature is concatenated as a temporal prefix, yielding [Eβˆ₯zt] (or [Eβˆ₯z0] at inference), which is fed to the UDiT backbone. The backbone is conditioned via AdaLN on the absolute time t and the interval length βˆ† = r βˆ’ t (with r = 1 at inference), and predicts the mean velocity for finite-interval transport, denoted uΞΈ(t, r, [Eβˆ₯zt]). One-step inference (NFE= 1) produces an estimated complex STFT SΛ† = (SΛ†Re, SΛ†Im), which is converted to the target waveform sΛ† by iSTFT. The dashed module is an optional mixing-ratio predictor used only in the background-to-target ablation to predict the start coordinate

Imagine a noisy group call where 3 people talk at once.

This paper builds a model that can focus on a single speaker (using a short voice sample) and extract that voice.

This cleanup results in better, faster audio transcription.

Summary and full paper πŸ‘‡

#AudioML #SpeechToText

12.03.2026 10:10 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
AgentOS: From Application Silos to a Natural Language-Driven Data Ecosystem The rapid emergence of open-source, locally hosted intelligent agents marks a critical inflection point in human-computer interaction. Systems such as OpenClaw demonstrate that Large Language Model (L...

arXiv: arxiv.org/abs/2603.089...

11.03.2026 11:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This paper showed up in my Paperzilla #openclaw feed: paperzilla.ai/digest/95413...

11.03.2026 11:52 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
 Paradigm Shift from GUI-Based Operating Systems to AgentOS with Multi-Agent Orchestration and Natural Language Interface.

Paradigm Shift from GUI-Based Operating Systems to AgentOS with Multi-Agent Orchestration and Natural Language Interface.

Another fun paper showed up in my feed. A theoretical proposal for an OS for AI agents.

No more GUI, enter the NLDUI: the natural language-driven interface.

Details below πŸ‘‡

11.03.2026 11:52 πŸ‘ 2 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
In-app feed - Paperzilla Docs Portal Use the in-app project feed to triage papers, filter the list, and teach Paperzilla what belongs in future recommendations.

And the docs are here: docs.paperzilla.ai/guides/feed-...

11.03.2026 10:28 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

News article with all the details: paperzilla.ai/news/better-...

11.03.2026 10:28 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Screenshot of the Paperzilla paper feed workflow

Screenshot of the Paperzilla paper feed workflow

Better research paper recommendations start with explicit user feedback.

So that is what we are building with Paperzilla.

Today, we launch the first phase of that. Read more in the article below.

11.03.2026 10:27 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
A Minimal Agent for Automated Theorem Proving We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements the core features shared among state-of-the-art ...

arXiv: arxiv.org/abs/2602.242...
Source code: github.com/Axiomatic-AI...

10.03.2026 10:04 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This paper showed up in my Paperzilla #AI4Math feed:

paperzilla.ai/digest/0fbbf...

10.03.2026 10:03 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Minimal agent for theorem proving. The design focuses on three main aspects: iterative proof refinement, memory system, and access to tools. A proposer agent writes Lean code to prove a given theorem. Then, a compiler verifies if the proposed proof works. If so, it is double-checked by a reviewer agent to prevent any form of cheating. If the code does not compile, or the reviewer objects, the feedback is sent to the memory module, and the cycle starts over with the proposer refining its previous proof. In addition, the proposer can be given access to tools such as library search or web search, and call them a fixed number of times before giving its proposal.

Minimal agent for theorem proving. The design focuses on three main aspects: iterative proof refinement, memory system, and access to tools. A proposer agent writes Lean code to prove a given theorem. Then, a compiler verifies if the proposed proof works. If so, it is double-checked by a reviewer agent to prevent any form of cheating. If the code does not compile, or the reviewer objects, the feedback is sent to the memory module, and the cycle starts over with the proposer refining its previous proof. In addition, the proposer can be given access to tools such as library search or web search, and call them a fixed number of times before giving its proposal.

As Karpathy's autoresearch takes the spotlight, here's a comparable lightweight agent setup that overperforms via feedback loops and readily available LLMs:

ax-prover, a minimal agent for automated theorem proving.

Source code and paper πŸ‘‡

10.03.2026 10:01 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
Query Disambiguation via Answer-Free Context: Doubling Performance on Humanity's Last Exam How carefully and unambiguously a question is phrased has a profound impact on the quality of the response, for Language Models (LMs) as well as people. While model capabilities continue to advance, t...

Resources:

arXiv: arxiv.org/abs/2603.04454

Github repo: github.com/mmajurski/lm...

08.03.2026 18:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

This paper showed up in my Paperzilla "RAG, Retrieval, and Semantic search" feed: paperzilla.ai/digest/d530c...

08.03.2026 18:05 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
When RAG systems surface relevant information, LM performance can be enhanced by rewriting the initial query using contextβ€”added information that, without providing the answer, gives relevant background
knowledge and direction.

When RAG systems surface relevant information, LM performance can be enhanced by rewriting the initial query using contextβ€”added information that, without providing the answer, gives relevant background knowledge and direction.

New paper shows that rewriting a question using retrieved answer-free context into a clearer question before the final LLM call improves accuracy a lot.

In practice, that means you can offload rewrite to a smaller, cheaper model and get better and cheaper results.

Paper + resources πŸ‘‡

08.03.2026 18:03 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

PS, let me know what preprint/open sources you are interested in, and I can add them

08.03.2026 10:06 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Please have a look at the alternative for Google scholar alerts I've built. It reduces the amount of papers for you to evaluate significantly. Can be consumed as email digests, RSS feed, or via API/MCP. paperzilla.ai

08.03.2026 10:05 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Preview
SciDER: Scientific Data-centric End-to-end Researcher Automated scientific discovery with large language models is transforming the research lifecycle from ideation to experimentation, yet existing agents struggle to autonomously process raw data collect...

Paper: arxiv.org/abs/2603.014...
Repo: github.com/leonardodali...
Demo: huggingface.co/spaces/AI4Re...

05.03.2026 12:59 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Paperzilla found this paper for me, here's the summary: paperzilla.ai/digest/9cee0...

05.03.2026 12:57 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

13 domain experts (PhDs, professors, industry researchers) rated SciDER 4.85/5 for "helpfulness" in reducing workflow and achieving data-grounded accuracy.

Also: An astrophysicist used SciDER to analyze the Kepler Exoplanet Dataset and achieved 98% F1 score on exoplanet detection.

Read on πŸ‘‡

05.03.2026 12:55 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
SciDER: Scientific Data-centric End-to-end Researcher.  Flowchart of expert-based and LLM-based research lifecycle.

SciDER: Scientific Data-centric End-to-end Researcher. Flowchart of expert-based and LLM-based research lifecycle.

This project (paper, demo, and open-source repo) seems like a promising step toward having an AI scientist on your team.

The framework, SciDER, actually does science. Data β†’ experiments β†’ results. End-to-end.

Read on πŸ‘‡

#ai4science

05.03.2026 12:53 πŸ‘ 0 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

Just give it a try: paperzilla.ai and let me know if it is useful for you. If not, let me know as well :)

04.03.2026 19:46 πŸ‘ 1 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

In Paperzilla, it starts with the user's expression of their research topic. That results in a project that covers categories from multiple sources (e.g. both ChemRxiv and arXiv). The output is a feed that will be consumed by OpenClaw (or another agent). It will work with a narrow relevant context.

04.03.2026 19:46 πŸ‘ 1 πŸ” 0 πŸ’¬ 4 πŸ“Œ 0

paperzilla.ai/news/chemrxi... cc @openclaw-x.bsky.social

04.03.2026 19:14 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
ChemRxiv coverage and a Paperzilla skill for OpenClaw

ChemRxiv coverage and a Paperzilla skill for OpenClaw

The weekly Paperzilla improvements are here: ChemRxiv coverage in beta and a Paperzilla skill for OpenClaw

Worlds are colliding!

Link to full news item in comment πŸ‘‡

04.03.2026 19:13 πŸ‘ 1 πŸ” 0 πŸ’¬ 3 πŸ“Œ 0

arXiv: arxiv.org/abs/2602.233...

03.03.2026 09:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 0 πŸ“Œ 0

Paperzilla found this paper for me and created this summary: paperzilla.ai/digest/df041...

03.03.2026 09:23 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0

They found that AI agents using simple keyword search tools can answer questions nearly as well as complex, expensive vector databases. So, cheaper, easier to maintain and nearly as good.

Paper & summary in the comments πŸ‘‡

03.03.2026 09:22 πŸ‘ 1 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
Comparison between RAG (red) and agent-based (blue) pipelines for document QnA

Comparison between RAG (red) and agent-based (blue) pipelines for document QnA

A while ago I found a paper that showed that BM25 search sometimes beats RAG. Here is another paper, by @awscloud.bsky.social, showing that agentic keyword search also often is the better choice.

Read on πŸ‘‡

03.03.2026 09:22 πŸ‘ 0 πŸ” 0 πŸ’¬ 1 πŸ“Œ 0
 Scholar-Skill: 21 Skills Organized by Research Stage

Scholar-Skill: 21 Skills Organized by Research Stage

New interesting paper: a scientist at @stonybrooku.bsky.social created scholar-skill, a 21-skill plugin for claude code. He used this to automate the social science pipeline. Pretty cool! Read the paper below πŸ‘‡

01.03.2026 08:52 πŸ‘ 2 πŸ” 1 πŸ’¬ 1 πŸ“Œ 0