Mapping U.S. FDA National Drug Codes (NDC) to Drug Classes and Codes
Mapping NDCs to Anatomical Therapeutic Chemical (ATC) Level 5, Veterans' Affairs Drug Classes, MESH Pharmacological Actions, SNOMED Clinical Terms, and other Drug Classification Systems and Terminologies
This script provides the drug class (or classes) from a given drug classification system (e.g. ATC) of each FDA National Drug Code (NDC), if any is available. It does that by querying the online RxNorm API at https://rxnav.nlm.nih.gov/. This script is just a helper to query the API in bulk and write the resposes to a convenient CSV file -- the mappings themselves are maintained and provided for free by RxNorm. The program can read the input NDCs from a flat list (text file, one NDC per line) or from one column in a CSV file.
It is also possible to request:
This work is an update from what was presented as a poster at the 2017 Annual Symposium of the American Medical Informatics Association (AMIA). If you are going to use this script or its NDC <-> drug class maps, please take time to understand the numbers contained in the poster (PDFs are in this repository), because they CAN affect data analyses. At the very minimum, you need to understand the issues regarding coverage (missing codes) and ambiguity (duplication of codes).
I have also published a deeper analysis and comparison of drug classification systems in a paper (Desiderata for Drug Classification Systems for their Use in Analyzing Large Drug Prescription Datasets -- https://github.com/fabkury/ddcs/blob/master/2016-dmmi-fk.pdf). TL;DR: unless your use case is particularly specific, ATC is the best drug classification for most cases of large dataset analyses. The Veterans' Affairs Drug Classes attain the same high level of coverage of NDCs as ATC, but they don't provide a comprehensive and accessible hierarchy like ATC.
This script should work out of the box if you follow the instructions in the .R file under the heading "How to execute this script."
If you cannot run this script yourself for any reason, but need an NDC-to-drug class map for your project, I can offer you two options.
You do not. The script does not consider dates, only NDCs. Although in principle one same NDC can have been recycled, i.e. represented different drugs at different points in time (https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/CFRSearch.cfm?fr=207.35, paragraph 4.ii), the combination of coincidences required for this to happen makes the probability vanishingly small. I have measured myself that NDC "collisions" are negligible even in nation-wide (USA) datasets with over one billion filled prescriptions across 10 years. Moreover, this regulation was changed in 2017 (https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/DrugRegistrationandListing/ucm2007058.htm) and NDCs are no longer being recycled.
This is not necessary because RxNorm natively supports any valid NDC format. If for some other reason you need to convert NDC formats yourself, here is a guide to the equivalencies: https://phpa.health.maryland.gov/OIDEOR/IMMUN/Shared%20Documents/Handout%203%20-%20NDC%20conversion%20to%2011%20digits.pdf
That is not a problem, because RxNorm transparently supports any valid NDC format.
There are extra features that could be implemented. See the TODOs in the .R file.
The RxNav API Terms of Service requires users to make no more than 20 requests per second per IP address (https://rxnav.nlm.nih.gov/TermOfService.html). This script will respect this limit. It will also cache its internet calls in RDS files, so it won't ask the API for the same thing twice, unless you intentionally remove the cache files.
The cache makes the script run faster the closer it gets to the end, because more NDCs map to RxCUIs that have already appeared before, which means it can skip the second online API call, asking the server for details about the RxCUI. Regardless of obeying or not the limit of 20 requests per second, from my experience on a laptop over gigabit wi-fi internet I have seen 12 to 18 hours of run time per 100,000 unique NDCs. That means about 1.5 to 2.3 different NDCs per second. If execution is interrupted, intentionally or not, next time you run the script will start from zero, but the prior work will be in the cache (stored in the RDS files), so the script will progress extremely fast in the beginning, until it reaches novel NDCs, which need online queries, very close to the exact point in the list where it was interrupted.
This script allows you to query for multiple coding systems in the same run (e.g. ATC codes and VA Drug Classes). However, because of the caching, and the limit of queries per second, I do not advise you use that feature because it is unlikely to be faster than doing separate full runs. In fact, it could even be slower, because rows will get duplicated whenever one NDC maps to multiple codes, and the duplications of one coding system will multiply those from other coding systems. For example, if one NDC maps to 3 ATC codes and 2 VA Drug Class codes, there will be 2 * 3 = 6 rows in the results corresponding to all combinations of these codes.
All contents of this repository are under an Attribution-ShareAlike-NonCommercial 4.0 International license. Please see details at http://creativecommons.org/licenses/by-nc-sa/4.0/.
Please feel free to contact me about this work! Reading and using someone else's code can become so much easier after a quick conversation.
Contact me at [email protected]. --Fabrício Kury
Search tags: thesaurus drug class map equivalence correspondence classification