Scientific word list for spell-checkers/spelling dictionaries

It is amazing how hard it is to find any list, much less a comprehensive one, of scientific terms to add to spelling dictionaries, such as those of Microsoft Word, OpenOffice, LibreOffice, etc. Well, I decided to remedy that by putting my lists online for all to download. I’ve accumulated thousands of terms in my Microsoft Word custom dictionary over my career as a scientific editor, and I added literally hundreds of thousands more from the sources listed below[1].

  • I have one file for American English spellings and another for British English spellings:
    custom_scientific_US.txt (670,116 entries)
    custom_scientific_UK.txt (663,703 entries)
    (Updated January 3, 2014)

    Yes, they each have six hundred thousand entries, so their extreme length tends to slow down the Microsoft Word spell check. I’m not sure how I feel about this. For a several-thousand-word document, this makes spell-checking slower than it might otherwise be, especially when there are lots of in-line citations with surnames that aren’t in the dictionary and aren’t ever going to be. But in some documents, it makes spell-checking much, much faster because, for instance, none of the taxonomic names in the document are flagged. I tend to look at this from more of an optimistic, long-term perspective and assume that spell-checking in Microsoft Word will get faster over the years because of increased RAM and CPU power in our computers. So yeah, the spell checker drags now, but it’s not a problem in every document, and when our hardware and software catch up to handle such large spell-check files, I’ll be happy I have one that’s almost seven hundred thousand entries.

  • The “#LID 1033” at the top of the US English file and the “#LID 2057” at the top of the UK English file are word processor codes for US English and UK English, respectively. They aren’t necessary, but Microsoft Word adds those codes to the top of .dic files when you assign them as custom dictionaries for US or UK English, so I thought it’d be helpful to go ahead and add them so your application wouldn’t accidentally use the US English file for a UK English document or vice-versa.
  • To make a .txt file into a custom spelling dictionary for Microsoft Word, first save the .txt file as a .dic file. This requires a text editor, but with such large files, Notepad drags significantly. It can still handle opening, editing, and saving any size of file, but it’s much more convenient to use EditPad Lite or Notepad++. (In Windows, at least. In Linux, gedit has no problem with them. I’m not sure about OS X’s TextEdit.) So, in some non-Notepad text editor, save the .txt file as a Unicode file with the .txt suffix replaced with .dic. It’s probably best to save it in the folder with all of Microsoft Word’s spelling dictionaries; on my Windows 7 computer, it’s C:\Users\John\AppData\Roaming\Microsoft\UProof.

    Then open Word, go into the Options or Preferences or whatever it’s called in the Word version du jour, find the Proofing or Review or Spelling & Grammar area, and click on Custom Dictionaries. Click on Add…, browse to your new .dic file, and select it. Choose the language(s) that the dictionary will apply to (All languages, English (U.S.), English (U.K.), etc.) from the “Dictionary language” dropdown menu.

    Another way to make a custom dictionary from one of the files above is to go into the spelling and proofing options of Word as above, but click New… instead of Add…. This will prompt you to create a new spelling dictionary, and it will already be navigated to Microsoft Word’s spelling dictionaries folder. Type in whatever name you want, click Save, choose correct language. Then navigate to that folder outside of Word, open the file you’ve just created, and paste the correct list into it.

    Note that you can use multiple spelling dictionaries for any language.

  • About the US/UK spelling differences in those files: Each file should be rid of any spellings from the other dialect. These include differences like ize/ise, yze/yse, hem/haem, esth/aesth, fiber/fibre, gyne/gynae, estr/oestr, aled/alled (as in signaled/signalled), aling/alling (as in signaling/signalling), eled/elled (as in modeled/modelled), eling/elling (as in modeling/modelling), venipuncture/venepuncture, artifact/artefact, titer/titre, and sulf/sulph. But there are a lot of taxonomic names that begin with sulf or have sulf in the middle, so those would be wrong to change to “sulph”.

    The color/colour spellings probably deserve their own paragraph: I made sure the UK file had colour when necessary, such as in colourimetric, but hundreds of species names have the string color in them, including names that can also be regular English words, like monocolor, bicolor, and decolor. In UK English, these everyday words are spelled monocolour, bicolour, decolour, etc. I thought the best way to handle these potential conflicts was to include both the UK spelling (of the regular word) and the correct spelling of the taxonomic name in the UK English file. So on the off chance that you need to use one of those everyday English words in a UK English document and you accidentally leave it spelled the American way, it’s your own fault because you’ve been warned!

    If you find any words that are spelled in the wrong dialect, or any other typos, please please please tell me here.

  • Most word processors prefer or require that the terms be arranged alphabetically and then by case (i.e., A…, a…, B…, b…). If you want to add any items manually to each list, you could either open the file in a text editor and find the exact spot where each term belongs, or you could add them to the end of the list and let the word processor sort ’em out later. Microsoft Word, and possibly other word processors, will sort the words exactly how it wants them anyway whenever you run the spell checker, so it probably shouldn’t matter if you go into EditPad and add a term that is out of place.

    (Correction: Actually, it seems that Microsoft Word sorts the words how it wants them whenever you add a word to the custom dictionary during spell check. So running the spell checker and never clicking “Add to dictionary” doesn’t sort the custom dictionary for you. At least not in Microsoft Word 2010.)

    If you do want to sort words by case “manually” (i.e., using Microsoft Excel) for whatever reason, follow these steps:
    First, open the .txt file with Microsoft Excel and choose tab-delimited. Don’t choose to delimit by any other characters or spaces, especially commas, because this would put portions of individual chemical names in separate columns of the spreadsheet. You could also open the .txt file with a text editor (Notepad will be slow but should eventually cooperate; EditPad Lite and Notepad++ are great to download), select all, copy, go to a new spreadsheet in Excel, click on cell A1, and paste.

    Assuming the terms you want to sort begin in cell A1, add the following formula to cell B1:

    =IF(EXACT(A1,UPPER(A1)),"Upper Case",IF(EXACT(A1,LOWER(A1)),"Lower Case",IF(EXACT(A1,PROPER(A1)),"Proper Case","Other")))

    (Be sure that the quotation marks in that formula are straight quotes or normal quotes or whatever, and not curly quotes!) Push your Enter key to accept the formula. Click in cell B1. Hover the mouse over the bottom-right corner of cell B1 so that a little cross or square appears, and double-click. This will make “Upper Case”, “Proper Case”, “Lower Case”, or “Other” propagate down column B. That itself doesn’t sort anything, but this does:

    Highlight all of column A and column B. Click Sort in the Data menu. To sort them alphabetically and then by case (A…a…B…b…), in the Sort dialog box, sort by column A, sort on Values, with order A to Z. Then add a new level, sort by column B, sort on Values, with order Z to A (Upper, then Proper, then Other, then Lower). If instead you want to sort by case and then alphabetically (all Uppercase from A to Z, then all Proper case from A to Z, then Other, then Lower), sort by column B with order Z to A, add a level, choose column A, and sort from A to Z.

  • One important, tangentially related note about Microsoft Excel: If you run the “Remove Duplicates” function on a list, it ignores case, so for instance the words “Case” and “case” will be treated as duplicates and only the first one will remain. This is retarded, but I know no way around it that doesn’t involve writing some complicated script in the Microsoft Visual Basic Editor. So if you’re going to run the Remove Duplicates command on any (portion of a) spell-check list, don’t run it on your master spell-check list because all of your Proper case/lowercase duplicates will be reduced by half.
  • Both of my documents include several foreign words related to academic/research institutions, such as Recherche and Tecnología, because I encounter them often enough that it was preferable to add them instead of clicking “Ignore all” every damn time and because they aren’t so similar to any English words that they could be typos that would need flagging in an English-language document.
  • Please download these, copy them, share them, spread them, host them, correct them, add to them, and use them! The more such lists exist on the internet (and, eventually, in software programs’ native spell-check files), the better off every scientific writer, editor, researcher, and student is.
  • If you want a smaller US or UK custom dictionary file that won’t cause your word processor to drag, then the old versions of them might be your cup of tea:
    custom_scientific+US_old.txt
    custom_scientific+UK_old.txt

    The main thing that makes my custom US and UK files so large is the taxonomic names, so if you don’t want all those, you can download the pre-packaged chemistry and OpenMedSpel dictionaries (see below) and add their terms to these smaller files if you want.

    Finally, there’s also this old version of the spell-check dictionary with all of the US- and UK-specific spellings removed: custom_scientific.txt (49,552 entries). It’s so much smaller because I made this file before I added most of the sources listed below and because it would be very time-consuming to go through either of the new files and remove all dialect-specific spellings.

Sources
Catalogue of Life 2012 database
The Plant List
International Seed Testing Association
Rawge’s Scientific Names Spell Checker Dictionaries
this list of European species names
Shortgrass Steppe Long-Term Ecological Research, Colorado State University
David A. Kendall: Insects & other Arthropods
The Bugwood Network
Be Wild, Virginia terrestrial insect list [PDF]
Butterflies and Moths of North America
Wikipedia: Lists of spider species
IOC World Bird List
Rxlist.com drug names
OpenMedSpel
Chemistry Dictionary for Word Processors v.20
World Register of Marine Species
some more Wikipedia lists