Building AI Agents

single aggregation service

AWS Bedrock
Google Vertex AI
Microsoft Foundry

Topics

Overview of LLMs and AI agent benchmark datasets

Running your own small LLMs on your laptop and Google CoLab

Agent control flows (HF Pipelines, HF smolagents, LangChain/LangGraph, Toolformer, Model Context Protocol)

Few-shot learning AKA in-context learning

Chain of thought reasoning (CoT) and self-refinement (Self-Refine)

Search-augmented generation AKA live RAG AKA internet-augmented dialog generation

Vector-database Retrieval-augmented generation (RAG)

Action Models (ReAct)

Vision-language models (VLMs)

Multi-agent systems (Generative Agents, ToolOrchestra, Magnetic-One)

Weight fine-tuning (QLORA)

Context Management (AgentFold)

Papers

Masterman, Tula, Sandi Besen, Mason Sawtell, and Alex Chao. “The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey.” arXiv preprint arXiv:2404.11584 (2024). https://arxiv.org/abs/2404.11584.
Training AI Co-Scientists Using Rubric Rewards. Shashwat Goel, Rishi Hazra, Dulhan Jayalath, Timon Willi, Parag Jain, William F. Shen, Ilias Leontiadis, Francesco Barbieri, Yoram Bachrach, Jonas Geiping, Chenxi Whitehouse. https://arxiv.org/abs/2512.23707
QLoRA: Efficient Finetuning of Quantized LLMs, Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023. https://arxiv.org/abs/2305.14314
A generative model of memory construction and consolidation, Eleanor Spens & Neil Burges, 2023. https://www.nature.com/articles/s41562-023-01799-z.pdf
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks. Microsoft Research AI Frontiers. 2024. https://arxiv.org/html/2411.04468v1
Generative Agents: Interactive Simulacra of Human Behavior. Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein, 2023 https://arxiv.org/abs/2304.03442
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, Wei, J., et al. (2022). https://arxiv:2201.11903.
Internet-Augmented Dialogue Generation, Komeili et al., 2021. from Meta AI Research. This was one of the first papers to systematically explore augmenting conversational AI with real-time web search. https://arxiv.org/abs/2107.07566
Toolformer: Language Models Can Teach Themselves to Use Tools. Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom, 2023. https://arxiv.org/abs/2302.04761
Language Models are Few-Shot Learners, Brown et al., 2020. https://arxiv.org/abs/2005.14165
A Survey on In-Context Learning, Dong, Q. et al. (2024). https://arxiv.org/abs/2301.00234
Formalizes ICL, relates it to meta-learning and prompting, and surveys techniques, analyses, and applications specifically for LLMs.
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning, 2024, Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion. https://arxiv.org/abs/2402.04833
Mathematical exploration and discovery at scale, Bogdan Georgiev, Javier Gómez-Serrano, Terence Tao, Adam Zsolt Wagner, 2025. https://arxiv.org/abs/2511.02864
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration, 2025. https://arxiv.org/abs/2511.21689
From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence, 2025. https://arxiv.org/abs/2511.18538
Small Language Models are the Future of Agentic AI. Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Don, Saurav Muralidhara, Yingyan Celine Lin, Pavlo Molchanov. https://arxiv.org/pdf/2506.02153
AgentFold: Long-Horizon Web Agents with Proactive Context Management. Rui Ye, Zhongwang Zhang, Kuan Li, Huifeng Yin, Zhengwei Tao, Yida Zhao, Liangcai Su, Liwen Zhang, Zile Qiao, Xinyu Wang, Pengjun Xie, Fei Huang, Siheng Chen, Jingren Zhou, Yong Jiang. https://arxiv.org/abs/2510.24699
ReAct: Synergizing Reasoning and Acting in Language Models. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao. https://arxiv.org/abs/2210.03629
Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean. Peiyang Song, Kaiyu Yang, Anima Anandkumar. https://arxiv.org/abs/2404.12534
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv preprint arXiv:2305.10601. https://arxiv.org/abs/2305.10601
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. 2023. Self-Refine: Iterative Refinement with Self-Feedback. arXiv preprint arXiv:2303.17651. https://arxiv.org/abs/2303.17651
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv preprint arXiv:2210.03629. https://arxiv.org/abs/2210.03629
Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, and Enhong Chen. 2024. A survey on multimodal large language models. Natl. Sci. Rev. 11, 12 (November 2024), nwae403. https://doi.org/10.1093/nsr/nwae403
Hanjia Lyu, Jinfa Huang, Daoan Zhang, Yongsheng Yu, Xinyi Mou, Jinsheng Pan, Zhengyuan Yang, Zhongyu Wei, Jiebo Luo. GPT-4V(ision) as A Social Media Analysis Engine. https://arxiv.org/abs/2311.07547
Jiacheng Miao, Joe R. Davis, Jonathan K. Pritchard, and James Zou. 2025. Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents. arXiv preprint arXiv:2509.06917. https://arxiv.org/abs/2509.06917
Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling, Annu Rana, Gaurav Kumar, 2025. https://arxiv.org/abs/2512.14474

Boyang Yan

Explorer

Building AI Agents

single aggregation service

Topics

Papers

Graph View

Table of Contents