PDF reference extractor guide

PDF reference extractor features and how-to guide

Use this guide to understand what LumaCite extracts from a research PDF, how the review workspace is organized, and when to resolve or export references.

Before upload

Start from the extractor control panel

The first screen is intentionally simple: choose a PDF, try the example workflow, and check the processing path before the workspace opens.

Start with a text-based research PDF. The workflow captures use Chen et al., Cell 2021, a David R. Liu prime-editing paper.

Core features

What you get from the PDF reference extractor

LumaCite is designed for the practical job after upload: turn a bibliography into records you can verify, correct, and move into the next tool.

01

Reference-list detection

Finds the bibliography or references section, even when the PDF has multi-column text, numbering, or wrapped citation lines.

Find the boundary
02

Citation row splitting

Separates the reference list into individual records so you can review one citation at a time instead of cleaning copied PDF text manually.

Create review rows
03

Structured fields

Extracts title, authors, year, journal or source, volume, issue, pages, DOI, PMID, PMCID, arXiv, ISBN, ISSN, and URLs when present.

Fill citation fields
04

Source-aware review

Keeps the PDF evidence, extracted record, and editable citation fields in one workspace so review work stays anchored to the document.

Check against source
05

Metadata help

Uses Auto-fetch and Resolve for records that need scholarly metadata lookup or a stricter matching pass before export.

Resolve uncertain rows
06

Review-first exports

Exports selected references to reference managers, citation processors, spreadsheets, Word workflows, or an audit report for handoff.

Export with context

How to use it

A short workflow from upload to export

The goal is not to click every control. The goal is to know which view answers which question while you move toward a clean export.

1

Choose a text-based research PDF

Use a PDF where text can be selected. The example below uses Chen et al., Cell 2021, a David R. Liu prime-editing paper with a clear reference list.

2

Read the extraction summary first

The summary gives the reference count, how many rows are ready, which rows need checking, and whether clean export is available. Start here before editing individual records.

3

Use the three-pane workspace

The left pane keeps source evidence visible, the middle pane organizes extracted records, and the right pane lets you inspect and edit the selected reference.

4

Resolve only the rows that need help

Use Auto-fetch for metadata lookup on uncertain records. Use Resolve when a row still needs matching review. This keeps review focused instead of reprocessing clean records.

5

Export after review

Export selected references when the rows you need are checked. If clean export is blocked, use the audit report to see what still needs attention.

Workspace orientation

Know what each pane is for

The workspace is intentionally dense. It is built for review, not decoration, so each area answers a different question.

Left: Source PDF
Use it to compare the reference row with the original document evidence when source preview is available.
Middle: Extracted records
Use it as the work queue. Status labels show which rows are verified, need checking, or cannot be verified yet.
Right: Review reference
Use it for the selected row: raw extracted text, editable citation fields, identifiers, and metadata notes.
The workspace keeps evidence, records, and editable fields close together so review decisions are visible.

First checkpoint

Use the extraction summary before editing rows

Open this panel first when the workspace loads. It tells you whether the reference set is complete enough to review, resolve, or export.

Reference count
Compare it with the paper when you need a complete bibliography.
Rows to check
Focus review on records with missing fields, weak identifiers, or metadata differences.
Export status
If clean export is blocked, use the audit report to see why.
Use the extraction summary as the first quality checkpoint after processing.

Export

Choose the format based on where the work continues

The export drawer separates citation style, copy actions, downloadable files, and previews so users can choose the right handoff without hunting through the workspace.

RIS

Reference managers

Move reviewed records into Zotero, Mendeley, and many library tools.

Manager handoff
XML

EndNote libraries

Use EndNote XML when the next step is a shared EndNote library.

EndNote workflow
BIB

LaTeX manuscripts

Export BibTeX for Overleaf, LaTeX, and technical writing pipelines.

Technical writing
CSL

Citation processors

Use CSL-JSON for structured citation processors and downstream tools.

Structured data
CSV

Spreadsheet review

Send selected references into QA sheets, screening work, or team review.

Review table
AUD

Audit report

Keep warnings, source checks, and export safety notes with the handoff.

Review trail
The export drawer separates copy, download, preview, and audit actions so you can choose the right handoff.

Review cautions

What to review before trusting an export

LumaCite is built to make review faster, but the final citation set should still be checked when a row is uncertain.

Scanned or low-quality PDFs

If the PDF is image-based, OCR quality can affect row splitting and field extraction. Use the source pane and audit notes more carefully.

Rows marked Check details

These rows often need a human look because a field is missing, a title is ambiguous, or metadata sources do not fully agree.

Clean export blocks

When LumaCite blocks a clean export, it is usually protecting the output from count, boundary, duplicate, or metadata risks. The audit report is the right next step.

FAQ

Common questions

What PDFs work best?

Text-based research PDFs with selectable text and a clear reference list work best. Scanned PDFs may need OCR before upload.

What are Auto-fetch and Resolve for?

Auto-fetch looks for scholarly metadata for uncertain rows. Resolve runs a stricter matching review for records that still need help.

Should I export everything?

Export the references you need after reviewing status labels and the summary. For team review, export the audit report alongside clean formats.

Where should I go for DOCX, TXT, or pasted bibliographies?

Use the Text Citation Extractor for copied reference lists, DOCX, TXT, RTF, RIS, BibTeX, CSV, or Markdown input.

Ready to try it

Extract references from a research PDF

Upload a PDF, review the extracted rows, and export only after the records you need are ready.

Open the extractor

Screenshot preview

Screenshot