Junhong Shen's Avatar

Junhong Shen

@junhongshen1

PhD Student in Machine Learning @CMU | BS @UCLA | Interning @Meta | Interned @MSFTResearch @DeterminedAI

76
Followers
32
Following
10
Posts
25.11.2024
Joined
Posts Following

Latest posts by Junhong Shen @junhongshen1

๐Ÿ™‹

03.12.2024 19:50 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Preview
GitHub - colonylabs/ScribeAgent Contribute to colonylabs/ScribeAgent development by creating an account on GitHub.

9/ Stay tuned for more updates!
๐Ÿ”— Paper: arxiv.org/abs/2411.150...
๐ŸŒ Blog:ย http://scribehow.com/library/scribe-agent
๐Ÿ’ป Code: github.com/colonylabs/S...
๐Ÿ‘ฅ Team: @junhongshen1.bsky.social Atishay Jain, Zedian Xiao, Ishan Amlekar, Mouad Hadji, Aaron Podolny @atalwalkar.bsky.social

03.12.2024 17:21 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 0 ๐Ÿ“Œ 0

8/ What's next? The possibilities are vastโ€”from integrating advanced reasoning and planning modules to exploring multi-modal systems. ScribeAgent highlights the potential of production-scale training data, paving the way for future web agents that are both powerful and cost-effective.

03.12.2024 17:21 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

7/ Beyond performance, ScribeAgent models also provide efficiency gains relative to most proprietary baselines, which are typically larger in size and slower at inference time. This makes ScribeAgent an attractive option in terms of accuracy, latency, and cost.

03.12.2024 17:21 ๐Ÿ‘ 2 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image Post image Post image

6/ Our results? ScribeAgent outperforms GPT-4o on our internal dataset and achieves state-of-the-art direct generation performance on the public benchmark Mind2Web. Our multi-agent system integrating GPT-4o also improves the best task success rate for text-only agents by 14.1% on WebArena.

03.12.2024 17:21 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0

5/ Combining next-step prediction with effective HTML preprocessing, we fine-tune two versions of ScribeAgent. The cost-efficient ๐—ฆ๐—ฐ๐—ฟ๐—ถ๐—ฏ๐—ฒ๐—”๐—ด๐—ฒ๐—ป๐˜-๐—ฆ๐—บ๐—ฎ๐—น๐—น is based on 7B Qwen2, while the better-performing ๐—ฆ๐—ฐ๐—ฟ๐—ถ๐—ฏ๐—ฒ๐—”๐—ด๐—ฒ๐—ป๐˜-๐—Ÿ๐—ฎ๐—ฟ๐—ด๐—ฒ is based on 32B Qwen2.5.

03.12.2024 17:21 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

4/ Data is the key! We leverage Scribe scribehow.com/, an AI documentation software that streamlines the creation of step-by-step guides for web tasks, to collect large-scale action data executed by real users on over 250 web domains. See scribehow.com/shared for example workflows.

03.12.2024 17:20 ๐Ÿ‘ 1 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

3/ Most existing web agents rely heavily on prompting general-purpose proprietary models like GPT-4. However, LLMs like GPT-4 are not specifically trained to parse languages like HTML, limiting the agent's ability to plan and reason. In contrast, ScribeAgent adapts the LLM itself for web navigation.

03.12.2024 17:20 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

2/ Web agents navigate through websites to solve real-world tasks. After the user defines a high-level objective, the agent outputs step-by-step actions based on the objective, observation, and interaction history. For text-based agents, the observation typically includes the website's URL and HTML.

03.12.2024 17:20 ๐Ÿ‘ 0 ๐Ÿ” 0 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 0
Post image

1/ Introducing ScribeAgent ๐Ÿค–! Using ๐—ฟ๐—ฒ๐—ฎ๐—น-๐˜„๐—ผ๐—ฟ๐—น๐—ฑ ๐˜„๐—ฒ๐—ฏ ๐˜„๐—ผ๐—ฟ๐—ธ๐—ณ๐—น๐—ผ๐˜„ ๐—ฑ๐—ฎ๐˜๐—ฎ, we at @scsatcmu.bsky.social and Scribe scribehow.com/ have adapted ๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐—น-๐—ฝ๐˜‚๐—ฟ๐—ฝ๐—ผ๐˜€๐—ฒ ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—Ÿ๐—Ÿ๐— ๐˜€ into ๐˜€๐—ฝ๐—ฒ๐—ฐ๐—ถ๐—ฎ๐—น๐—ถ๐˜‡๐—ฒ๐—ฑ ๐˜„๐—ฒ๐—ฏ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€, outperforming agents that rely on proprietary models like GPT-4 and o1-preview. More in ๐Ÿงต.

03.12.2024 17:20 ๐Ÿ‘ 18 ๐Ÿ” 5 ๐Ÿ’ฌ 1 ๐Ÿ“Œ 4