How to Use AppExtractAI

A step-by-step guide for faculty and program coordinators to get the most accurate results from your application extraction.

Overview

AppExtractAI works in four steps: (1) create or select an extraction template that defines what data you want, (2) upload application PDFs, (3) preview the output on a single application, and (4) process your full batch and download the results as an Excel spreadsheet.

The quality of your results depends heavily on how clearly you define your extraction fields. This guide will help you write effective field descriptions and avoid common mistakes.

Step 1

Create Your Extraction Template

A template defines the list of fields you want extracted from each application. Your account comes pre-loaded with a standard ophthalmology template, but you can create custom templates or modify the existing one.

Each field has two parts: a title (the column header in your Excel export) and a description (the instructions the AI follows when extracting that field).

The most important principle

Write field descriptions as if you were giving instructions to a thorough but literal research assistant who has never reviewed applications before. Be explicit about what counts, what does not count, and how to format the output. The more specific your instructions, the more accurate and consistent your results will be.

Writing effective field descriptions

Example: Counting publications

Vague

"Number of publications"

Problem: Does not specify which publications count. The AI may include in-press papers, conference abstracts, book chapters, or papers without PubMed IDs.

Specific

"Count ONLY papers where the applicant is explicitly identified as first author AND a PubMed ID is clearly listed (only numerical ID is acceptable as PubMed ID, not web address) AND the paper is marked as 'Published' (not 'In Press' or 'Accepted'). Include co-first authorships where the applicant is one of the designated first authors. Do not include peer-reviewed book chapters. Return only the numerical count or 0 if none found."

This version specifies author position, PubMed ID format, publication status, co-first author handling, exclusions, and output format.

Example: Binary yes/no fields

Vague

"AOA status"

Problem: The AI may return "Yes", "No", "N/A", "Member", "true", or a full sentence. Inconsistent output is hard to sort in a spreadsheet.

Specific

"Status of whether applicant was awarded AOA (1 if yes, 0 if no)"

Returning 1 or 0 makes it easy to sum, sort, and filter in Excel. Use this pattern for any yes/no field.

Example: Complex extraction with context

Well-written

"Extract the applicant's percentile ranking or equivalent designation if explicitly stated in their Medical Student Performance Evaluation (MSPE). Look for specific percentage values, quartile rankings, or special designations that indicate relative standing (such as 'top 10%', 'highest quartile', 'designation of excellent', etc.). If a designation is provided instead of a direct percentile, include any information that helps interpret this designation (e.g., 'earned designation of excellent, given to approximately 55% of students'). Include both the ranking/designation and any context explaining what this means within their school's evaluation system. If the MSPE does not include percentile information or equivalent designation, indicate 'n/a'."

This description tells the AI where to look (MSPE), what formats to expect (percentile, quartile, designation), how to handle ambiguity (include context), and what to return when the information is not found (n/a).

Checklist for every field

Specify exactly what to include and what to exclude
Define the output format (number, 1/0, semicolon-separated list, free text, n/a if not found)
Think about edge cases: What if the information is ambiguous? What if it is missing entirely?
For publication counts, specify author position, PubMed ID requirement, and publication status
For grades, specify which section of the application to look in (e.g., MSPE)
Test your description on a few applications using the preview feature before running a full batch

Step 2

Upload Applications

Select your template, then upload the PDF application files you want to process. You can upload as many files as you want in a single batch (up to your plan's application limit). Each file should be a standard SFMatch application PDF.

Files are uploaded to encrypted private storage and automatically deleted after 3 days.

Step 3

Preview Before Processing

Before processing your entire batch, AppExtractAI runs the extraction on a single application and shows you a preview of the results. This is your opportunity to check that the extracted data looks correct before committing to a full run.

Use the preview to catch issues early

Review the preview carefully. If a field is returning unexpected results (for example, counting papers that should be excluded, or returning "Yes" instead of "1"), go back and adjust your field description before processing the full batch. It is much easier to fix a field description and re-preview than to reprocess hundreds of applications.

If the preview looks correct, click "Accept Preview" to proceed with uploading and processing your full batch.

Step 4

Process and Download Results

Click "Process" to begin extraction on all uploaded applications. You can track progress in real time as each application is processed. When complete, download a ZIP file containing a master Excel spreadsheet with all extracted data.

Start with a small batch

Even after a successful preview, we recommend running your first batch on a small set of 5-10 applications. Check the Excel output to confirm that the data is being extracted correctly and consistently across multiple applications. Once you are satisfied, run the full set. This helps catch any edge cases that a single preview might miss.

Results are available for download for 3 days, after which they are automatically deleted. Download your Excel file promptly after processing.

Quick Tips

Use 1/0 instead of Yes/No for binary fields. It makes sorting and filtering in Excel much easier.
For list fields (like journal names), specify a separator like semicolons so the data stays in one cell.
Always specify what to return when information is not found (e.g., "n/a" or "0").
If a field is consistently inaccurate, the description likely needs to be more specific. Revise and re-preview.
The pre-configured ophthalmology template is a good starting point. You can customize fields for your program's needs.
Remember to download your results within 3 days. After that, data is automatically deleted.

Need help?

If you have questions about using AppExtractAI or need help writing effective field descriptions, contact us at mac.singer@appextractai.com