Part 3 - The Art of the Possible: Generating Strategic Test Candidates with Dwij's RCP Engine

CodeClowns Editorial TeamJuly 8, 202511 min read

A system design deep-dive into Dwij's Recommendation Candidate Pool (RCP) Generator, the crucial first stage of our recommender that filters millions of possibilities into a small, strategic, and context-aware set of test options.

Imagine a master film director trying to cast a lead role. Do they personally watch audition tapes from every actor in the world? No. That would be an impossible and inefficient task. Instead, they rely on a casting director to first create a high-quality shortlist—a curated pool of actors who are relevant, available, and fit the role's core requirements. Only then does the director apply their deep analysis to make the final choice.

In the world of high-performance recommendation systems, the same principle applies. It is computationally infeasible and strategically foolish to score and rank every single piece of content for every user on every interaction. The key to building a fast, intelligent, and scalable system is a two-stage approach. This article, the third in our engineering series, dissects the crucial first stage of our system: the **Recommendation Candidate Pool (RCP) Generator**. This is our "casting director"—an intelligent engine that filters the universe of possible tests down to a small, strategic set of options worthy of our final, rigorous analysis.

Why You Can't Score Everything: The Two-Stage Recommender Paradigm

To appreciate the role of the RCP, it's essential to understand a core challenge in recommendation system design: the trade-off between accuracy and latency. A single-stage system that tries to apply a complex, computationally expensive scoring model to millions of items for thousands of concurrent users would be incredibly slow and costly.

The Candidate Generation → Ranking Funnel

The industry-standard solution is a two-stage (or multi-stage) funnel. This is the exact architecture Dwij employs, sitting between the User Context Layer and our final decision-making optimizers.

  • Stage 1: Candidate Generation (The RCP): This stage is designed for speed and recall. Its job is to use fast, efficient heuristics and simple models to quickly narrow down millions of potential items to a few hundred relevant candidates. This is what our RCP engine does.
  • Stage 2: Scoring & Ranking (The Optimizer): This stage is designed for precision. It takes the small pool of high-quality candidates from Stage 1 and applies complex, computationally expensive machine learning models to score and rank them, producing the final, hyper-personalized recommendation.

The RCP is our system's first line of defense against noise. It ensures that our powerful scoring engine only spends its compute resources on options that are already strategically viable. Without a smart RCP, even the best scoring model is wasting its time.

[The RCP is a primary consumer of the data from our first layer. Read about it here: "The Brain of the Coach: A Deep Dive into Dwij's User Context Layer"]

The Anatomy of a Test Candidate

The RCP doesn't just output a list of test IDs. It generates a rich set of test objects, each annotated with crucial metadata that the downstream Scoring Engine needs to function. This "pre-computation" of features is a key optimization. A typical test object emitted by the RCP looks like this:

{
                "testId": "chem_2025_qz13",
                "topics": ["Thermodynamics", "Organic Reactions"],
                "type": "revision",
                "target": "weakness_targeting",
                "difficulty": "medium",
                "fatigueAdjusted": true,
                "expectedTime": 15,
                "recentnessScore": 8
              }

Each piece of metadata serves a purpose. The `target` tells the scoring engine the strategic intent of this test. The `fatigueAdjusted` flag signals its suitability for a tired user. The `expectedTime` informs pacing calculations. This structured output ensures that the context from the RCP is passed cleanly to the next stage.

The Heuristic Pipeline: How the RCP Intelligently Filters

The RCP is a pipeline of sequential filters and generators, each applying a specific set of rules based on the real-time User Context.

Heuristic 1: Dynamic Syllabus Filter

This is the first and broadest filter. It queries the user's `performanceMap` and `retentionModel` to identify academically relevant topics. It generates candidates primarily from syllabus areas where the student has low accuracy, few attempts, or a high "forgetting score." If a student has 95% accuracy in 'Modern History', this filter will deprioritize generating more tests on that topic, instead focusing on their 44% accuracy in 'Polity'. This immediately attunes the pool to the user's learning needs.

Heuristic 2: The Fatigue-Aware Test Sampler

This layer acts as the system's empathy. It reads the `fatigueScore` from the User Context. If the score is above a certain threshold (e.g., 0.65), this heuristic applies strict rules: it will reject all high-effort tests, such as full-length mocks or tests longer than 20 minutes. It may still allow short quizzes or revision sets to pass through, but tags them as `fatigueAdjusted: true`. This prevents the system from pushing a tired student into burnout and is critical for maintaining long-term engagement.

Heuristic 3: Persona-Driven Biasing

Here, the RCP adapts its generation strategy based on the user's learning `persona`. For a 'Grinder' who thrives on challenges, it will bias the pool towards weakness-targeting retry tests. For an 'Explorer' who prefers variety, it will ensure a wider range of topics, even those the user is good at. For an 'Avoider' who consistently skips a subject, this layer won't generate a difficult mock on that topic. Instead, it will strategically generate a very low-difficulty, confidence-boosting quiz to gently reintroduce the subject.

Heuristic 4: The Freshness & Deduplication Layer

Finally, this layer ensures the candidate pool feels fresh and intelligent. It applies a simple rule to filter out any test shown within a recent window (e.g., 4 days) to avoid repetition. It also generates candidates from special pools, such as a "retry pool" constructed from questions the user previously answered incorrectly in a mock test. This transforms past failures into concrete, actionable learning opportunities.

Example: The RCP in Action for "Riya"

Let’s trace how the RCP generates a candidate pool for Riya, a student whose User Context snapshot shows high fatigue and a tendency to avoid Chemistry.

  • Input Context: `fatigueScore: 0.78`, `persona: 'Avoider'` for Chemistry, strong in History, goal is `Syllabus Coverage`.
  • Step 1 (Syllabus Filter): The filter sees her goal is coverage and her weakness is Chemistry. It generates a large set of potential tests, heavily weighted towards Chemistry and other unattempted topics.
  • Step 2 (Fatigue Sampler): This layer immediately acts. Seeing the high fatigue score, it rejects all 5 full-length mocks from the initial set. It also rejects any standard test longer than 20 minutes, drastically reducing the pool.
  • Step 3 (Persona Biasing): The 'Avoider' logic kicks in. It sees the remaining hard Chemistry tests and rejects them, knowing they would likely be skipped. Instead, it specifically generates and adds a new, 5-question easy quiz on a fundamental Chemistry topic, tagging it as a `confidence_boost`.
  • Step 4 (Freshness Layer): The layer scans the remaining candidates and removes a History test Riya took two days ago, replacing it with a fresh one.

Final RCP Output: The result is a small, smart pool of ~25 candidates. It contains no overwhelming mocks, several short revision quizzes for her weak areas, some tests for her strong areas to maintain confidence, and one very low-stress Chemistry quiz. This pool is now ready for the computationally expensive Scoring Engine, confident that every option is a strategically sound one.

Your Training Starts Now

Be the first to get access. Join the waitlist and help us build the perfect practice tool for you.


Up Next: Scoring What Matters

Generating a high-quality candidate pool is only half the battle. Now that we have a curated set of strategic options, how do we choose the single best one? In our next article, we will finally enter the second stage of our recommender system and explore the **Multi-Layered Scoring Engine**. We’ll break down how we assign a precise numerical score to each candidate test across multiple dimensions to find the perfect recommendation.

[Checkout the next Blog of this series. Read about it here: "The Brain of the Coach: A Deep Dive into Dwij's User Context Layer"]

Preparing for CAT, SSC, CUET or IELTS? Dwij gives you mock tests, AI-generated questions, and personalized planners — everything you need to practice smarter and get exam-ready.

#engineering blog#dwij#ai#system design#recommendation systems#edtech#data engineering#two-stage recommender