NotebookLM Case Study: How a Medical School Study Group Improved Practice Exam Scores by 23% Using AI-Powered Study Notebooks

Background: The Medical Education Information Challenge

Medical education generates an extraordinary volume of material that students must absorb, integrate, and apply. A single preclinical year can involve thousands of pages of textbook content, hundreds of lecture slide decks, dozens of clinical practice guidelines, and a growing corpus of primary research articles that inform evidence-based medicine. The challenge is not merely reading this material but synthesizing it across disciplines. Understanding a disease requires simultaneously integrating knowledge from anatomy, physiology, pathology, pharmacology, microbiology, and clinical medicine.

Study groups have long been a cornerstone of medical education precisely because they distribute this cognitive load. Each member can specialize in certain topics and teach others, but the fundamental problem of cross-referencing and integration remains. A student studying the pharmacology of antihypertensives needs to simultaneously understand the pathophysiology of hypertension, the anatomy of the renal system, and the clinical guidelines for diagnosis and management. Traditional study methods, including handwritten notes, flashcard decks, and group discussion, struggle to bridge these disciplinary boundaries efficiently.

This case study documents how a study group of six second-year medical students at a mid-sized US medical school adopted Google’s NotebookLM as their primary study integration tool over one academic semester. The results include a 23% improvement in average practice exam scores and a 40% reduction in the time spent searching for specific information across study materials.

The Study Group and Their Problem

Group Composition

The study group consisted of six M2 students preparing for their preclinical comprehensive examinations and beginning their preparation for USMLE Step 1. The group met three times per week, with each session lasting approximately two hours. Members had varying learning preferences: two were primarily visual learners who relied heavily on diagrams and flowcharts, two preferred reading-intensive approaches, and two were auditory learners who benefited most from discussion and verbal explanation.

The Problem They Faced

By the midpoint of their M2 year, the group had accumulated study materials across multiple platforms. Lecture slides were stored in their learning management system. Textbook content came from digital editions of First Aid for the USMLE Step 1, Pathoma, Costanzo Physiology, and Robbins Pathologic Basis of Disease. Clinical guidelines were scattered across UpToDate, AHA, and various specialty society websites. Pharmacology references were split between Pharmacology textbooks and drug reference databases.

The core problem was fragmentation. When studying a specific disease, such as heart failure, the group needed to pull information from six or more separate sources, manually cross-reference drug mechanisms with pathophysiology, and verify that their understanding aligned with current clinical guidelines. A single study session on heart failure might require toggling between four different applications and three different textbook chapters.

The group estimated they spent approximately 35% of their study session time locating and cross-referencing information rather than actually learning and discussing it.

Implementation: Building the NotebookLM System

Phase 1: Notebook Architecture (Week 1-2)

The group spent two weeks establishing their notebook structure before uploading significant content. They debated between organ-system-based notebooks and disease-based notebooks, ultimately choosing a hybrid approach.

They created primary notebooks organized by organ system, matching the structure of their school’s curriculum: Cardiovascular, Respiratory, Renal, Gastrointestinal, Endocrine, Neurology, Musculoskeletal, Hematology/Oncology, Infectious Disease, and Immunology. Each organ system notebook contained the relevant textbook chapters, lecture slides, and clinical guidelines for all diseases within that system.

They also created cross-cutting notebooks for topics that span multiple organ systems: Pharmacology Principles, Biostatistics and Epidemiology, and High-Yield Integrative Cases. The Pharmacology Principles notebook contained drug class summaries that could be referenced alongside any organ system notebook.

Phase 2: Source Upload and Organization (Week 2-4)

Each group member was assigned responsibility for populating specific notebooks. The upload process followed strict guidelines the group established.

For textbook content, they uploaded chapter PDFs with clear file naming: Robbins_Ch12_Heart_Failure_and_Cardiomyopathy.pdf or Costanzo_Ch4_Cardiovascular_Physiology.pdf. For lecture slides, they converted PowerPoint files to PDF and named them by date and topic: Lecture_20260115_HF_Pathophysiology.pdf. Clinical guidelines were saved as PDFs from their source websites with attribution: AHA_2024_Heart_Failure_Guidelines.pdf.

Each notebook received a “Study Guide” note that established context for the AI:

STUDY CONTEXT Subject: Cardiovascular System Exam focus: Preclinical comprehensive exam + USMLE Step 1 preparation Key topics: Heart failure (systolic vs diastolic), valvular disease, arrhythmias, coronary artery disease, congenital heart disease, vascular disease, cardiac pharmacology Learning objectives: Understand pathophysiology, recognize clinical presentations, know first-line treatments, identify drug mechanisms and side effects Cross-reference priority: Link pharmacology mechanisms to disease pathophysiology. Connect histology findings to clinical manifestations.

Phase 3: Active Study Integration (Week 4 onward)

Once notebooks were populated, the group integrated NotebookLM into their regular study sessions. Each session followed a structured format.

The first 15 minutes involved individual querying, where each member used NotebookLM to review the topic they had been assigned to present. The next 60 to 75 minutes were dedicated to group discussion, during which one member’s laptop running NotebookLM served as the group’s reference tool. When questions arose during discussion, they queried the relevant notebook immediately rather than deferring the question or spending time searching manually. The final 30 minutes were spent on practice questions, where the group used NotebookLM to generate practice questions and then discussed each answer.

Key Use Cases and Techniques

Cross-Referencing Pharmacology with Pathology

The most valuable application of NotebookLM for the group was bridging pharmacology and pathophysiology. Medical education traditionally teaches these subjects in separate courses, but clinical reasoning requires integrating them.

The group developed a standard set of cross-referencing prompts. For disease-to-drug queries: “Based on the uploaded sources, what is the pathophysiological mechanism of systolic heart failure, and which drug classes target each step in that mechanism? For each drug class, cite the specific mechanism from the pharmacology sources and explain how it addresses the pathological process described in the pathology sources.”

For drug-to-disease queries: “For ACE inhibitors, list every condition mentioned in the uploaded sources where this drug class is indicated. For each indication, explain the pathophysiological rationale from the uploaded textbook chapters.”

For side-effect cross-referencing: “Based on the uploaded pharmacology and pathology sources, explain why ACE inhibitors cause hyperkalemia. Connect the drug mechanism to the renal physiology of potassium handling described in the renal system sources.”

These queries produced integrated explanations that would have required reading three or four separate chapters to assemble manually.

Building Disease-Specific Study Notes

For each major disease topic, the group used NotebookLM to generate comprehensive study notes that synthesized across all uploaded sources.

Their standard prompt: “Create a comprehensive study note on heart failure with reduced ejection fraction using all uploaded sources. Include: (1) Definition and diagnostic criteria from the clinical guidelines, (2) Pathophysiology from the pathology and physiology textbooks, (3) Clinical presentation including symptoms, signs, and staging, (4) Key laboratory and imaging findings, (5) First-line and second-line pharmacotherapy with mechanisms, (6) Important complications, (7) High-yield facts for board examination preparation. Cite the specific source for each fact.”

The resulting notes served as a single-document study resource that integrated information from five or six separate sources. Group members annotated these generated notes with their own additions and corrections, creating personalized study documents.

Practice Question Generation

The group discovered that NotebookLM could generate clinically-oriented practice questions from uploaded materials. They used this extensively in the final 30 minutes of each study session.

Their question generation prompt: “Generate five USMLE-style multiple choice questions based on the uploaded sources for the cardiovascular notebook. Each question should present a clinical vignette, include five answer choices, and have a detailed explanation citing the relevant uploaded source. Mix difficulty levels and cover pathophysiology, pharmacology, and clinical management.”

The AI-generated questions were not a substitute for validated question banks like UWorld, but they provided a useful supplement for testing comprehension of the specific materials the group was studying. The group found that approximately 70% of the generated questions were clinically accurate and educationally useful. The remaining 30% contained errors that the group could identify and discuss, which itself became a learning exercise in critical evaluation.

Audio Overview for Commute Study

Two members of the group had 45-minute commutes each way. They used NotebookLM’s Audio Overview feature to generate podcast-style reviews of upcoming study topics.

Before each study session, one member would generate an audio overview from the relevant notebook with a customization prompt: “Create a review focused on the key pathophysiological mechanisms and drug treatments for heart failure. Emphasize the connections between the pathology and the pharmacology. Target medical students preparing for board exams.”

The commuting students reported that listening to these audio overviews before study sessions significantly improved their preparation. They arrived at sessions with a stronger conceptual framework, allowing the group to spend more time on challenging questions and less time on basic review.

Over the semester, the group generated approximately 40 audio overviews covering the major topics in their curriculum. They maintained a shared playlist organized by organ system.

Results and Measurement

Practice Exam Score Improvement

The group tracked their performance on practice examinations throughout the semester. They used three measurement points: scores on their school’s practice exams administered at regular intervals, scores on commercially available USMLE practice question sets completed during the same period, and performance on their school’s end-of-block examinations.

The baseline was established from their M2 Block 1 examinations, completed before adopting NotebookLM. Performance was then tracked through Blocks 2, 3, and 4.

Average practice exam scores improved from 71.3% at baseline to 87.7% by Block 4, representing a 23% relative improvement. The improvement was not uniform across all group members. The two auditory learners showed the largest gains, likely due to heavy use of the Audio Overview feature. The visual learners showed moderate gains and reported that while NotebookLM’s text-based output was helpful, they still needed to create their own diagrams and flowcharts.

Study Material Search Time Reduction

The group measured search time by logging how long it took to locate specific information during study sessions. Before NotebookLM, when a question arose about a specific drug interaction or pathophysiological mechanism, the group would spend an average of 4.2 minutes locating the relevant information across their various sources.

After integrating NotebookLM, the same type of query was answered in an average of 1.1 minutes, including the time to type the query and read the response. Follow-up verification against the original source added approximately 1.5 minutes, bringing the total to 2.6 minutes. This represented a 38% reduction in search time, which the group rounded to 40% accounting for the queries that did not require follow-up verification.

Over a two-hour study session, this time savings translated to approximately 20 additional minutes of actual learning and discussion time. Across three sessions per week over a 14-week semester, this amounted to roughly 14 additional hours of productive study time.

Qualitative Observations

Beyond the quantitative metrics, the group reported several qualitative benefits. Integration across subjects was dramatically improved. The ability to query across pharmacology and pathology simultaneously eliminated the mental overhead of switching between sources. Study session focus improved because questions that previously derailed sessions for five to ten minutes while someone looked up an answer were resolved in seconds.

Group members felt more confident that their understanding was complete because they could verify claims against multiple sources quickly. The Audio Overview feature made previously unproductive commute time into effective study time. Practice question generation provided a low-stakes way to test understanding immediately after studying a topic.

The group also noted limitations. NotebookLM occasionally generated explanations that conflated details from different diseases or drug classes. These errors were usually subtle, such as attributing a side effect of one drug to a different drug in the same class. The group established a rule that any NotebookLM-generated content used for study had to be verified against at least one original source.

Methodology Notes and Limitations

This Is Not a Controlled Study

This case study documents the experience of a single study group and does not represent a controlled experiment. The 23% improvement in practice exam scores cannot be attributed solely to NotebookLM adoption. Other factors that may have contributed include the natural progression of study skills over the academic year, increased familiarity with exam format and question styles, the motivation effect of participating in a structured study group, and the Hawthorne effect of tracking and measuring study outcomes.

Sample Size

Six students is too small a sample to draw generalizable conclusions. The results are presented as a detailed qualitative account with quantitative context, not as statistical evidence of efficacy.

Reproducibility Considerations

Other study groups attempting to replicate this approach should note that the initial setup investment of two to four weeks is significant and may feel unproductive in the short term. The quality of results depends heavily on the quality and completeness of uploaded source materials. NotebookLM’s source limit of 50 documents per notebook requires strategic curation, particularly for organ systems with extensive content. AI-generated practice questions require verification and should supplement, not replace, validated question banks. Not all learning styles benefit equally from text-based AI tools.

Implementation Recommendations for Other Medical Students

Start with One Organ System

Rather than attempting to build notebooks for the entire curriculum at once, start with the organ system currently being studied. This provides immediate utility while allowing the group to refine their approach before scaling.

Assign Notebook Ownership

Each group member should be responsible for maintaining specific notebooks. This distributes the upload and curation workload and ensures that each notebook has a consistent level of quality.

Establish Verification Norms

Create a group norm that any AI-generated content must be verified against original sources before being relied upon for exam preparation. Medical education requires accuracy, and AI tools can produce plausible-sounding but incorrect explanations.

Use Audio Overview Strategically

Audio Overview is most effective for review and reinforcement rather than initial learning. Generate audio overviews after the group has discussed a topic, then use the audio for review during commutes or exercise.

Integrate with Existing Tools

NotebookLM does not replace Anki for spaced repetition, UWorld for practice questions, or Sketchy for visual mnemonics. It serves a specific function as an integration and cross-referencing tool that complements these established resources.

Track Your Results

Measure practice exam scores before and after adoption to assess whether the tool is providing value for your specific study group. If scores are not improving after four to six weeks of use, re-evaluate your approach or consider whether NotebookLM fits your group’s learning style.

Cost and Access Considerations

NotebookLM is available at no cost through a Google account, making it accessible to medical students without additional financial burden. The tool runs in a web browser and does not require installation of additional software. Source materials must be legally obtained. Students should use their own textbook purchases, institution-provided lecture materials, and freely available clinical guidelines.

The group’s total time investment for initial setup was approximately 20 hours distributed across six members, or roughly 3.3 hours per person. Ongoing maintenance, including uploading new materials and curating existing notebooks, required approximately one hour per week per member.

Conclusion

This study group’s experience demonstrates that NotebookLM can serve as an effective integration layer in medical education, bridging the gap between separately taught subjects and enabling rapid cross-referencing that traditional study methods struggle to achieve. The measurable improvements in practice exam scores and study efficiency suggest that source-grounded AI tools have a meaningful role to play in medical exam preparation. However, the tool requires careful setup, ongoing source curation, consistent verification practices, and realistic expectations about what AI can and cannot do in the context of medical education. It is a study accelerant, not a study replacement, and its value scales with the quality of the materials and the discipline of the students using it.

Explore More Tools

Grok Best Practices for Academic Research and Literature Discovery: Leveraging X/Twitter for Scholarly Intelligence Best Practices Grok Best Practices for Content Strategy: Identify Trending Topics Before They Peak and Create Content That Captures Demand Best Practices Grok Case Study: How a DTC Beauty Brand Used Real-Time Social Listening to Save Their Product Launch Case Study Grok Case Study: How a Pharma Company Tracked Patient Sentiment During a Drug Launch and Caught a Safety Signal 48 Hours Before the FDA Case Study Grok Case Study: How a Disaster Relief Nonprofit Used Real-Time X/Twitter Monitoring to Coordinate Emergency Response 3x Faster Case Study Grok Case Study: How a Political Campaign Used X/Twitter Sentiment Analysis to Reshape Messaging and Win a Swing District Case Study How to Use Grok for Competitive Intelligence: Track Product Launches, Pricing Changes, and Market Positioning in Real Time How-To Grok vs Perplexity vs ChatGPT Search for Real-Time Information: Which AI Search Tool Is Most Accurate in 2026? Comparison How to Use Grok for Crisis Communication Monitoring: Detect, Assess, and Respond to PR Emergencies in Real Time How-To How to Use Grok for Product Improvement: Extract Customer Feedback Signals from X/Twitter That Your Support Team Misses How-To How to Use Grok for Conference Live Monitoring: Extract Event Insights and Identify Networking Opportunities in Real Time How-To How to Use Grok for Influencer Marketing: Discover, Vet, and Track Influencer Partnerships Using Real X/Twitter Data How-To How to Use Grok for Job Market Analysis: Track Industry Hiring Trends, Layoff Signals, and Salary Discussions on X/Twitter How-To How to Use Grok for Investor Relations: Track Earnings Sentiment, Analyst Reactions, and Shareholder Concerns in Real Time How-To How to Use Grok for Recruitment and Talent Intelligence: Identifying Hiring Signals from X/Twitter Data How-To How to Use Grok for Startup Fundraising Intelligence: Track Investor Sentiment, VC Activity, and Funding Trends on X/Twitter How-To How to Use Grok for Regulatory Compliance Monitoring: Real-Time Policy Tracking Across Industries How-To NotebookLM Best Practices for Financial Analysts: Due Diligence, Investment Research & Risk Factor Analysis Across SEC Filings Best Practices NotebookLM Best Practices for Teachers: Build Curriculum-Aligned Lesson Plans, Study Guides, and Assessment Materials from Your Own Resources Best Practices NotebookLM Case Study: How an Insurance Company Built a Claims Processing Training System That Cut Errors by 35% Case Study