By Josh Proto
Aug 12, 2025

Scaling Personalized AI at Backrs: How We Build A Serverless LLM App With AWS Lambda

Adapted from the Code & Cognition Podcast by Olio Apps At Olio Apps, we’re always exploring how real-world AI applications are built, deployed, and scaled. In a recent episode of the Code and Cognition podcast, we sat down with Zach, a software engineer at Backrs, to unpack how his team built and scaled an LLM-powered platform that delivers hyper-personalized career guidance to students and how AWS Lambda made it possible. From the architectural choices to the streaming gotchas and database bottlenecks, this post explores what it really takes to build a serverless GenAI app at scale.

The Problem: Usage Spikes alongside Personalized Output

Backrs’ mission is to help students and young professionals figure out what they want to do and how to get there. With its mission to help support students and democratize their access to opportunities, Backrs created an AI-driven success coach. Think career coaching, resume feedback, and college recommendations, all delivered in a personalized and context-aware interface. However, this presented two core challenges:
  1. Usage Spikes Whenever large user cohorts would onboard, the application's perforance would degrade and costs would unexpectedly increase.
  2. Consistant Personalized Experience Each student needed to receive individualized LLM powered recommendations and content generation.
To deliver this at scale without incurring cloud costs from idle compute, the Backrs team turned to AWS Lambda.

Why AWS Lambda Was the Right Fit

Initially, the prototype ran on a single EC2 instance. This worked fine until more users began to onboard. With these new users, performance tanked under load and the team knew they needed a scalable solution. Lambda was the answer. AWS Lambda gave Backrs instant horizontal scalability and cost-efficiency. Most importantly, they didn't have to pre-provision a fleet of EC2 instances just for them to sit idle between school hours. Lambda’s ability to spin up hundreds of instances in parallel, then spin them back down to zero, was ideal for Backrs' high-burst, low-persistence usage patterns.

AI Personalization with LLMs

LLMs are at the core of Backrs' personalization engine. When a student logs in, they provide some personal information by
  • Uploading their resume
  • Answering questions about their interests and location
  • Choosing desired job fields, majors, and goals
The platform then uses OpenAI's GPT API to generate personalized content like:
  • A college short list with reach, target, and safety schools
  • Custom essay prompts
  • Tailored advice for learning paths and internships
These are all presented within a polished, notebook-style UI that feels educational and trustworthy, perfectly aligned for their user.

The Challenge of Streaming Responses from Lambda

LLMs like GPT-4 take time to generate structured outputs, especially when multiple steps (agents, context fetching, summarization) are involved. For responsiveness, streaming is critical. But streaming from AWS Lambda is not straightforward for several reasons. For starters, API Gateway doesn't support long lived connections well, as it has a 30 second timeout. Also, AWS Lambda's default behavior is to buffer output, where it has certain payload requirements to meet before it sends its output. Structured outputs (like JSON with chatbot replies + suggested follow-ups) arrive in chunks that must be reassembled on the frontend. To get around this, the team decided to utilize Lambda response streaming by progressively streaming them through Lambda function URLs. They also replaced their typical Python Lambda adapter, Magnum, with AWS Lambda Web Adapter, a little known but powerful AWS maintained tool. They also dockerized their Python Lambda using FastAPI + the Web Adapter, enabling real-time streaming of LLM outputs.

OpenAI API Rate Limits: Why Backrs Spent $200 on Fake Stories

Even once the infrastructure was sound, there were API rate limits to contend with. OpenAI doesn’t let you pay to unlock higher throughput. Instead, you have to use the API enough to earn higher tiers. So Zach ran a script that generated thousands of silly stories about his coworkers overnight just to bump their usage volume. “We had to spend $200 generating completely useless content just to hit the threshold.” It worked, it was fun, and Backrs finally unlocked the next usage tier and scaled up without rate limit failures.

Database Bottlenecks and the RDS Proxy Fix

Once concurrency increased to 100+ users, a new problem emerged: the database couldn’t handle the load. Each Lambda function was initiating a fresh DB connection and opening a pool of 20. Multiply that by 100 concurrent Lambdas and... the RDS instance fell over. The fix? AWS RDS Proxy. The proxy sits in front of the database and manages connection pooling across Lambdas. Now every Lambda borrows from a shared pool, and the DB remains stable, even under 100 concurrent sessions.

Configuring Lambda for Long-Running AI Tasks

By default, Lambda is meant for short-lived jobs, yet LLM-powered agents often need 20+ seconds to do complete their tasks. To support this:
  • The Lambda timeout was extended to the 15-minute max (via function settings).
  • A heartbeat job was created to ping Lambdas periodically to keep them warm before anticipated traffic (e.g., classroom sessions).
  • Reserved concurrency was configured for critical paths (like logging in), ensuring they’re always responsive.

Streaming LLM Responses in the Frontend

The Backrs app parses streamed JSON chunks into structured UI elements:
  • Chatbot responses
  • Suggested next questions
  • Embedded videos and articles
This makes the product feel fast and intelligent, even when the backend is doing heavy lifting. They also run multiple AI agents concurrently, fetching context and resources from internal knowledge bases and user history. This way a seamless user experience is created with the LLM responses tailored to each student’s profile, leveraging their real data as context.

Real-World GenAI Architecture: What We Used

In summary, this was the backend and frontend stack that made Backrs' Personalization Engine Possible Backend Stack
  • AWS Lambda with function URLs
  • FastAPI + AWS Lambda Web Adapter (Python)
  • Dockerized Lambdas for dependency-heavy apps
  • OpenAI GPT-4 API for LLM inference
  • AWS RDS + RDS Proxy for database operations
Frontend
  • Custom-built UI using React
  • JSON chunk parser for streaming responses
  • Component architecture for chatbot, resource cards, and context-aware updates
Now Backrs' personalized coach is able to more directly guide, mentor, and problem solve with students who are looking to make the most of their education and career journies.

Final Thoughts: Lessons from Scaling GenAI with AWS at Backrs

  1. Don't underestimate cold starts and concurrency limits These are show stoppers and require planning ahead to make sure your application can continue working smoothly.
  2. LLM usage quotas are real OpenAI requires you to "earn" throughput with real use, you can't just request a rate increase like a model on Amazon Bedrock. So factor this in if you plan to use OpenAI's models.
  3. Structured streaming from Lambda is possible Though it requires it's own set of architectural considerations and the adaptor pattern utilizing function URLs.
  4. An RDS Proxy is essential This allows you to connect AWS Lambda to databases at scale.
  5. GenAI is more software engineering than data science The 'magic' in an LLM powered application lies in its orchestration. These applications need to be properly operationalized.
These have been our major takaways from listening how Backrs created a personalized LLM Mentor on AWS. If you're trying to build something similar, we hope this can serve as a blueprint and save you some time. If you have any specific questions or would like any hellp with your LLM applicaiton, please don't hesidate to reach out.
Josh Proto
Cloud Strategist

Josh is a Cloud Strategist passionate about helping engineers and business leaders navigate how emerging technologies like AI can be skillfully used in their organizations. In his free time you'll find him rescuing pigeons with his non-profit or singing Hindustani & Nepali Classical Music.

Share This Post

Join our newsletter!

To get news on Gen AI

Development on AWS.

Don't worry, we don't spam