Trending
Spandana Gella's Avatar

Spandana Gella

@spandanagella

Sr Mgr & Research Scientist @ServiceNowRSRCH, Montreal

134
Followers
208
Following
7
Posts
12.11.2024
Joined
Posts Following

Latest posts by Spandana Gella @spandanagella

Post image

๐ŸšจNew Paper!๐Ÿšจ How do reasoning LLMs handle inferences that have no deterministic answer? We find that they diverge from humans in some significant ways, and fail to reflect human uncertaintyโ€ฆ ๐Ÿงต(1/10)

04.03.2026 16:13 ๐Ÿ‘ 56 ๐Ÿ” 20 ๐Ÿ’ฌ 3 ๐Ÿ“Œ 1

Our team is hiring an intern discrete diffusion of text and/or code. Please apply!

17.06.2025 14:32 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Video thumbnail

๐Ÿš€ New paper from our team at @servicenowresearch.bsky.social!โฃ
โฃ
๐Ÿ’ซ๐’๐ญ๐š๐ซ๐…๐ฅ๐จ๐ฐ: ๐†๐ž๐ง๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐’๐ญ๐ซ๐ฎ๐œ๐ญ๐ฎ๐ซ๐ž๐ ๐–๐จ๐ซ๐ค๐Ÿ๐ฅ๐จ๐ฐ ๐Ž๐ฎ๐ญ๐ฉ๐ฎ๐ญ๐ฌ ๐…๐ซ๐จ๐ฆ ๐’๐ค๐ž๐ญ๐œ๐ก ๐ˆ๐ฆ๐š๐ ๐ž๐ฌโฃ
We use VLMs to turn ๐˜ฉ๐˜ข๐˜ฏ๐˜ฅ-๐˜ฅ๐˜ณ๐˜ข๐˜ธ๐˜ฏ ๐˜ด๐˜ฌ๐˜ฆ๐˜ต๐˜ค๐˜ฉ๐˜ฆ๐˜ด and diagrams into executable workflows ๐Ÿ–๏ธโ†’โš™๏ธโฃ
โฃ
๐Ÿ”— arxiv.org/abs/2503.218...
๐Ÿ“ tinyurl.com/3utdbn97%E2%...
#Sketch2Flow #AI #VLM

29.05.2025 03:34 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

๐Ÿš€ Excited to share that UI-Vision has been accepted at ICML 2025! ๐ŸŽ‰

We have also released the UI-Vision grounding datasets. Test your agents on it now! ๐Ÿš€

๐Ÿค— Dataset: huggingface.co/datasets/Ser...

#ICML2025 #AI #DatasetRelease #Agents

15.05.2025 14:14 ๐Ÿ‘ 0 ๐Ÿ” 1 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Very excited to announce our GUI benchmarking dataset UI-Vision : uivision.github.io

Our evals reveal current GUI-models struggle with grounding small elements, dense UIs and has limited domain/spatial/motion understanding.

Watch out this space for more exciting stuff from us!

24.03.2025 17:17 ๐Ÿ‘ 3 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Web agents powered by LLMs can solve complex tasks, but our analysis shows that they can also be easily misused to automate harmful tasks.

See the thread below for more details on our new web agent safety benchmark: SafeArena and Agent Risk Assessment framework (ARIA).

10.03.2025 20:11 ๐Ÿ‘ 5 ๐Ÿ” 2 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0
Post image

๐Ÿ“ขNew Paper Alert!๐Ÿš€

Human alignment balances social expectations, economic incentives, and legal frameworks. What if LLM alignment worked the same way?๐Ÿค”

Our latest work explores how social, economic, and contractual alignment can address incomplete contracts in LLM alignment๐Ÿงต

04.03.2025 16:08 ๐Ÿ‘ 28 ๐Ÿ” 13 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 3
Post image

๐Ÿšจ Excited to introduce PairBench! ๐Ÿšจ

๐Ÿ’ก TL;DR: VLM-judges can fail at data comparison!

โœ… PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllabilityโ€”ensuring reliable auto-evaluation.

๐Ÿ“„ Paper: arxiv.org/abs/2502.15210

๐Ÿงต Thread: ๐Ÿ‘‡

27.02.2025 19:50 ๐Ÿ‘ 1 ๐Ÿ” 2 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

Weโ€™re really excited to release this large collaborative work for unifying web agent benchmarks under the same roof.

In this TMLR paper, we dive in-depth into #BrowserGym and #AgentLab. We also present some unexpected performances from Claude 3.5-Sonnet

12.12.2024 17:55 ๐Ÿ‘ 20 ๐Ÿ” 11 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 2

If you want to know all about the exciting stuff we do with web agents @servicenowresearch.bsky.social register here and interact with our team including the amazing @alex-lacoste.bsky.social and @adrouinenv.bsky.social

12.12.2024 17:28 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

We would be delighted to come and see you ;)

12.12.2024 12:52 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

Me :)

12.12.2024 08:38 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

Thrilled to launch BigDocsโ€”an open multimodal dataset set to transform document understanding! Our contribution to VLM community, supporting transparency in multimodal document reasoning. Proud to work with the most passionate and amazing team @servicenowresearch.bsky.social !

10.12.2024 20:08 ๐Ÿ‘ 4 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
AgentLab diagram.

The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights:

Core Agent Features:

Dynamic Prompting and a Unified LLM API for interacting with large language models.
BrowserGym Platform:

A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others.
Key Features:

Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces.
Blue elements represent AgentLab components.

AgentLab diagram. The image describes AgentLab, a framework for efficient parallel experiments with agents. It highlights: Core Agent Features: Dynamic Prompting and a Unified LLM API for interacting with large language models. BrowserGym Platform: A tool for testing agents on benchmarks like WebArena, WorkArena, MiniWoB, and others. Key Features: Reproducibility, a Unified Leaderboard, an analysis tool called Xray, and a Dataset for sharing agent traces. Blue elements represent AgentLab components.

๐Ÿงต-1
We are thrilled to release #AgentLab, a new open-source package for developing and evaluating web agents. This builds on the new #BrowserGym package which supports 10 different benchmarks, including #WebArena.

03.12.2024 21:02 ๐Ÿ‘ 18 ๐Ÿ” 15 ๐Ÿ’ฌ 2 ๐Ÿ“Œ 0