Compiling collaborators for NSF Collaborators and Other Affiliations (COA) form

November 08, 2024
Authors: Beth Cimini, John Kitchin, Andrew Rosen, and Anne Carpenter 

What is the COA form? 

NSF requires you to report all “Co-authors on any book, article, report, abstract or paper with collaboration in the last 48 months” in Table 4 of the COA form, for certain people (e.g. PI, Co-PI, Senior/Key Persons). This is used by NSF to manage the selection of reviewers in order to avoid conflicts of interest; reviewers themselves do not see this form. It has several other sections that are easy to compile manually, which won’t be covered here.

NSF provides a template in the form of an Excel file (https://www.nsf.gov/bfa/dias/policy/coa/coa_template.xlsx), with instructions for it included in the file. 

Do I really need to list all of my co-authors??

Yep (generally)! Not just those you worked closely or directly with. Frequently Asked Questions on the NSF COA form can be found here. It says: “For co-authors in the last 48 months, does NSF want just senior investigators or all co-authors?” The answer is: “Regardless of position and/or title, all co-authors in the last 48 months should be listed.“

Now it also says: “If a person has co-authored publications or collaborated with a substantial number of colleagues in the past four years, do all the colleagues need to be listed? Is there an acceptable "filter" that can be used, such that only the closest or top collaborators are listed? In such situations, individuals should follow the instructions in the solicitation to which they are submitting or check with a cognizant NSF Program Officer.” 

Listing all your collaborators manually would be an absolute nightmare for those of us publishing a lot of collaborative research. Some of us have over a thousand co-authors in a given four year period! Still, if you have tons of collaborations, filtering your collaborators mentally and negotiating with your PO may be even harder than just listing them all. So it may be simpler to just automate creating this table - that’s what you will learn here!

Historical tidbit: In the old days you were required to list your collaborators on the biosketch, rather than a separate document. But the biosketch had a font size limit and a page limit, such that some of us could not fit our hundreds of people and institutions without breaking the font/page rules and/or having zero other content in our biosketch! So thankfully NSF switched to a unlimited-length excel file.

All right, how do I do this?

Thankfully there are now a few tools out there to help create this document! The big picture steps are:

  1. Retrieve lists of authors and affiliations from publications
  2. (Optional) Use another tool to get additional papers and/or additional information on people (e.g. better affiliations)
  3. Convert/merge lists
  4. Manually add/edit the list
  5. Paste the list into the official Excel template COA form

Most tools desribed below offer no technical support, being made and shared by the generosity of the creators. So, try your best to find local help from friends if you run into trouble rather than bothering the creators with troubleshooting.

Step 1: Options for retrieving lists of authors from publications

This is the key decision: pick a source for data that’s already pretty up to date (ideally including preprints) so you don’t spend a ton of time manually adding recent papers/preprints. Options we have covered below include Pubmed, your CV, your website, a citation/reference manager, Google Scholar, and ORCID. It would be nice if one could use MyBibliography or ScienCV, for those of us forced to use those systems and who keep them up to date, but we’ve not seen mechanisms to do so yet.

OPTION A) Conflicts of Shiny (using Pubmed as the source)

This works great if your name is unique enough that you can search for just yourself (and all of your publications are in Pubmed). Type in your name and you’re all set! https://dobbs-onc-jhmi.shinyapps.io/ConflictsOfShinyApp

The tool allows you to exclude a particular variant of the name and institution name if you have just a few people in the world whose name overlaps yours (you can do a Pubmed search to see if this is the case). Credit to Elana Fertig and Michael Considine for creating this great tool that works for most people!

Limitations:

- The tool is not actively maintained and, per the authors, may stop functioning at any time. The code is here https://github.com/ejfertig/NSFBiosketch (at least, the R code, the Shiny pieces may not be)

- It misses papers that are not in Pubmed, such as recent papers not there yet, or preprints from arxiv/biorxiv - though this is in flux: currently NLM is in a pilot phase that makes preprints (bioRxiv, medRxiv, arXiv, and Research Square) available via PubMed Central (PMC) and, by extension, PubMed - but ONLY for papers resulting from research funded by the National Institutes of Health.

- For people with common names, the filtering is insufficient to narrow down papers to just your own. That said, you can use this tool to cast a wide net of papers that you’re sure includes yours (even if it captures others’ papers) and then use another list (e.g. Reference Manager BibTeX) as the "source of truth" about which papers to use in the merging step. In other words, you would use the Shiny app list only as a source of affiliations, such that you want the widest possible results to look for matches in.

- For people with a name change (in the past 48 months), you will need to run both names and merge the results.

OPTION B) NSF Coauthor Wrangler (using your CV or website publication list as the source)

Copy the relevant section of your CV or website into a spreadsheet (here's an example), then use the required cell (“Run this cell to install dependencies and mount your Google Drive”) plus the first optional cell (“Turn a Google Sheet OR a tab-delimited text sheet into a CSV with a list of all authors and what paper they were pulled from”) of this Colab notebook to convert to a CSV file: NSF Coauthor Wrangler.ipynb (or take and use the underlying Python code yourself). This notebook was graciously created by Dr Beth Cimini at the Broad Institute.

Limitations: 

- It requires that the list you start from is well-structured; things must always be in the same format, with no exceptions or mistakes.

- It’s only as up-to-date as your CV/source, but given you need to keep your CV up to date too, it’s no extra work.

- It does not have author affiliations, so these either need to be added manually or merged with a source that does in Step 3, like Conflicts of Shiny.

OPTION C) Citation manager export (using a citation manager as the source)

Go into your citation manager, search for your name, and then export as bibtex. In Step 3, you will convert this file into the needed format. We have used Paperpile, but any citation manager that can export to bibtex should work. 

Then, use the required cell (“Run this cell to install dependencies and mount your Google Drive”) plus the second optional cell (“turn a Bibtex file into a CSV with a list of all authors”) of this Colab notebook to convert to a CSV file: NSF Coauthor Wrangler.ipynb (or harvest the underlying Python code yourself). This notebook was graciously created by Dr Beth Cimini at the Broad Institute.

Limitations: 

- It’s only as up-to-date as your Citation Manager library, so if you aren’t keeping a full inventory of your papers here, it’s extra work.

- For people with common names (or a name change), the filtering is insufficient to narrow down papers to just your own, though you can add a tag to your own papers manually as part of your general process for importing your own papers. And most people are likely to have very few extraneous papers in their personal library with another author of the same name.

- This won't add author affiliations, so you'll have to add them later (or merge with a source that does in Step 3, like Conflicts of Shiny or ORG-REF)

OPTION D) coapy (using Google Scholar as the source)

You can use Andrew Rosen’s code to get publications from Google Scholar, via Scholarly (https://pypi.org/project/scholarly/). If on a Mac, for example, you start by opening the application called Terminal and follow the installation instructions provided. See the documentation page (https://andrew-s-rosen.github.io/coapy/) for instructions and code. 

You can also use the required cell (“Run this cell to install dependencies and mount your Google Drive”) plus the third optional cell (“Get a list of all authors from your Google Scholar page using coapy”) of this Colab notebook (or harvest the underlying Python code yourself) to convert to a CSV file:  NSF Coauthor Wrangler.ipynb

Limitations: 

- Some deduplication still must be done manually. For instance, sometimes a publication’s metadata might have the name “Smith, Bob” but for a separate publication it might have “Smith, B”. The code currently does not try to de-duplicate such instances. This is pretty easy to address by simply pasting the names in a spreadsheet, sorting alphabetically, and deleting the obvious duplicates.

- It’s only as up to date as your Google Scholar page, so if you aren’t keeping a full inventory of your papers here, it’s extra work (you can edit Google Scholar’s listing for yourself). Google Scholar can sometimes not catch updated versions of papers for a while, so if you have a paper where many authors are added in revision, they may get missed.

- This won't add author affiliations, so you'll have to add them later (or merge with a source that does in Step 3, like Conflicts of Shiny or ORG-REF).

OPTION E) ORG-REF (using your ORCID identifier in openalex.org as the source)

There are two ways to use this; let’s start with the easiest: 

1) Join John Kitchin’s Discord server by clicking this link and making an account if needed: https://discord.gg/upZuWP4hdz 

Then go to the #sandbox channel and paste in your ORCID, like this (but use your own ORCID! This is Anne’s):
  /coa https://orcid.org/0000-0003-1555-8261 

In a minute, you will receive a direct message from “coa bot” that has your file! (He also has an openalex bot that lets you follow people and papers and sends you new papers and citations for them!)

2) Use the code here (https://github.com/jkitchin/org-ref) to run this line in Emacs:

   M-x oa-coa
You can download Emacs here (https://www.gnu.org/software/emacs) if you don’t have it installed already. The code’s GitHub page has further instructions for how to access the code if you’ve not done this kind of thing before. Once the file is created, you paste the results in the COA xlsx file template from NSF. 

This method uses your ORCID, which is nice because ORCID includes preprints and many researchers with common names (or names that changed over time) keep their ORCID listing of papers up to date.

Limitations:

- Many affiliations are missing in the output. That said, the NSF template says to provide this “if it is known”.

- Some papers may be missing. This uses “Open Alex” (https://openalex.org) for the list of papers which is not always entirely up to date.

- A few more caveats from the creator are listed here (https://github.com/jkitchin/org-ref/blob/cc4b2b8777c5b54f01a451aae295fb5e23aa1358/openalex.el#L636). As of October 2024, these include: 

  • OpenAlex provides the name in the Firstname Initial Lastname form. I assume this can be split into spaces, and the last word is the last name. That is not always correct, so some manual name fixing may be required.
  • The Institutions are not always reliable. I use the most recent institution if an author is listed multiple times.
  • There may be duplicates for people who have different names in OpenAlex, e.g. missing initials, and differences in abbreviations, including having a period or not.
  • Your name will be included, you will need to delete this manually in the Excel sheet.

OTHER OPTIONS

Here we overview some other strategies out there that may require more finagling to get working. If you have any updates on these, please let Anne know! Again, like all options, your list will only be as up-to-date and comprehensive as the publication source being accessed.

Python + OpenAlex REST API: An alternative to Emacs in OPTION E is to use Python with the OpenAlex REST API. There isn’t currently any Python package for this yet, but it would be possible to do it like it is done in Emacs (which also uses the REST API).

Python + Scopus: See https://kitchingroup.cheme.cmu.edu/blog/2016/02/20/Generating-an-alphabetized-list-of-collaborators-from-the-past-five-years/ for an example using Python and Scopus. This hasn’t been run since 2016, but lays out a way to use Scopus if you have access. The more modern interface to Scopus is at https://pybliometrics.readthedocs.io/en/stable/.

R + Web of Science: If you have access to Web of Science and keep it updated and are comfortable running scripts; check out: https://github.com/BrunaLab/coauthors_nsf/tree/main 
Web app + Open Alex: If your name is unique, this tool is great: http://bib.experiments.kordinglab.com/nsf-coa

Step 2: (Optional) Use a second tool 

You might want to run a second tool from Step 1, to get additional papers and/or additional information on people (e.g. better/more affiliations).

Step 3: Merge/convert the various sources

Use the required cell (“Run this cell to install dependencies and mount your Google Drive”) plus the last optional cell (“Join any of the previous files to a sheet also containing affiliations”) of this Colab notebook NSF Coauthor Wrangler.ipynb (or take and use the underlying Python code yourself) to merge a CSV without affiliations (the author_table) with one that does have affiliations (but may have too many "extra" rows, as can happen with people with common names using things like Conflicts of Shiny). If it finds a match for an author in the two sources, it brings over the institutional affiliation. This notebook was graciously created by Dr Beth Cimini at the Broad Institute.

For example, for Anne, her procedure is to use the Conflicts of Shiny (Option A) to extract papers with her name (this yields way too many papers that are not hers, given her common name, but at least it has author affiliations), then extract papers from her website (using Option B), which is a comprehensive/accurate list of published papers, then manually update the reference manager Paperpile to make a small library of any preprints and draft manuscripts (using Option C to export the bibtex for those papers). After manually merging Option B & C outputs into a single file, she can then run Step 3 NSF Coauthor Wrangler to merge Option B & C outputs with the Option A output.

These combinations are covered in the notebook:

  • CV/website list for complete list plus Conflicts of Shiny for available affiliations
  • CV list plus ORG-Ref to get available affiliations
  • Citation manager for complete list plus Conflicts of Shiny for available affiliations
  • Citation manager list plus ORG-Ref to get available affiliations

Step 4: Manually add/edit the list

For all of the above methods, you will need to: 

  • Unpublished, ongoing projects: Manually add all collaborators on projects that are not yet turned into papers, including funding, awards, graduate research or others in the last 48 months. 
  • Duplicates: Check through for duplicates that were not already eliminated, e.g. due to slightly different names.
  • Collaborators > 48 months: Select the date cutoff - it’s easiest to delete papers/authors based on publication year (versus 48 exact months as requested by NSF), but if you really care you can drop publications that are >48 months old. NSF specifically says “(publication date may be later)” - for example, if you collaborated through December 2030 and then the paper went through the review process and was published in 2032 (with no interaction/collaboration in the meantime) then you need not list that paper after December 2034. Rarely will it be worth your while to figure out dates so precisely, though!
  • Final check: Do an overall reality check to be sure it looks like nothing is missing. Generally, you want to err on the side of being overly inclusive rather than omitting someone, but it’s also wise to not obsess about this document and waste your time. If you accidentally omit someone from your list and the NSF program officer asks them to review your proposal, they are obligated to report this - so there is a backup plan. But this could be embarrassing!

Step 5: Paste the list into the official template COA form

Voila: copy/paste and you’re done!

Most tools above offer no technical support, being made and shared by the generosity of the creators. So, try your best to find local help from friends if you run into trouble rather than bothering the creators with troubleshooting.

Hope this helps! Please let us know if you have found better solutions!