PDF reference extraction

Extract references from research PDFs

Q: What should I use for DOCX, TXT, or pasted references?

Use LumaCite's Text Citation Extractor for DOCX, TXT, RTF, RIS, BibTeX, CSV, Markdown or bibliography text copied from another document.

Upload a paper, manuscript, CV, or research report saved as a PDF. LumaCite finds the reference list, extracts citation fields, helps you review the records, and prepares clean exports.

Source PDF stays visible Review and edit records Export in multiple formats

Upload a PDF

Choose a paper, manuscript, CV, or research report saved as a PDF.

Drop a PDF here, or choose a file.

PDF only Standard processing accepts files up to 60 MB. File handling Server-side temporary PDFs are deleted when processing finishes. Using DOCX, TXT, or pasted text? Open the Text Citation Extractor.

Ready when you are Upload a PDF or try the sample PDF.

0:00

Waiting for PDF Large PDFs can take 1-3 minutes.

1Read
2Find
3Split
4Identify
5Check
6Review

Upload to review

What happens after you upload a PDF

LumaCite turns a document reference list into records you can inspect, correct, and export.

Find the reference list

Locates bibliography headings, numbered lists, and reference blocks in the PDF.

Separate references

Turns individual citations into reviewable rows and separates unrelated text.

Extract citation fields

Finds titles, authors, years, journals, DOIs, PMIDs, PMCIDs, arXiv IDs, and URLs.

Compare metadata with the PDF

Opens each record beside its highlighted citation text on the source PDF page.

Prepare review

Checks missing fields, conflicts, and duplicates against scholarly sources.

Export reviewed references

Downloads records in formats compatible with most reference managers.

Research workflows

Why researchers use LumaCite for PDF references

Turn a PDF bibliography into records you can verify, complete, style, and export.

Useful when a reference list needs to become reliable data

Extract references without copying bibliography entries by hand.
Check records against scholarly sources and source PDF text.
Complete missing fields such as titles, authors, years, journals, and identifiers.
Change citation style and export to reference managers or writing workflows.

Frequently asked questions

PDF reference extraction: common questions

Practical answers about file requirements, processing, review, metadata, and reference-manager exports.

What PDFs work best?

Text-based PDFs with selectable text and a recognizable reference list work best because LumaCite can read and separate each reference more reliably. Papers, manuscripts, CVs, and reports are supported when saved as PDFs.

How large can the PDF be?

The standard extraction endpoint accepts PDFs up to 60 MB. If a file reaches the standard route's size or time limit, LumaCite attempts background processing. Platform upload limits can still apply to very large files.

Are uploaded PDFs stored?

The extraction service uses a temporary server-side PDF file and deletes that file when processing finishes, including when processing fails. Download or export the citation records you want to keep.

Which metadata sources does LumaCite check?

Depending on the identifier and record type, LumaCite may check Crossref, PubMed, Europe PMC, DataCite, OpenAlex, Semantic Scholar, arXiv, Open Library, Google Books, and Unpaywall.

What happens with scanned PDFs?

Scanned or image-based PDFs may need OCR before extraction. LumaCite can identify some scanned files and attempt OCR processing, but unclear scans, low-resolution pages, and poor text layers should be reviewed carefully.

When should I manually review a reference?

Review records with missing fields, no strong identifier, conflicting metadata, unclear source text, possible duplicates, or a Check details status. Before downloading, compare the source text, citation fields, identifiers, and metadata evidence for the records you plan to export.

Can I import exports into Zotero, EndNote, or Mendeley?

Yes. Use RIS for broad reference-manager compatibility, EndNote XML for EndNote, or BibTeX and CSL-JSON for compatible tools and writing workflows.

What should I use for DOCX, TXT, or pasted references?

Use the Text Citation Extractor for DOCX, TXT, RTF, RIS, BibTeX, CSV, Markdown, or bibliography text copied from another document.

Learn more

Explore LumaCite's extraction methods and results.

Extractor features and guideSee extraction features and workflow guidance Public benchmarkReview test results and documented limitations

Reviewing

Uploaded PDF

Source evidence

Original PDF

No PDF

Fit

Upload a PDF to inspect its source pages here.

REFERENCE LIST

The selected citation will be highlighted on its original PDF page.

Extracted records

References

Extraction quality and auditWaiting for extraction Waiting

Quality and auditReview extraction safety, identifiers, sources, warnings, and integrity checks.