Literal text analysis of death certificates

Using Literal Text From the Death Certificate to Enhance Mortality Statistics: Characterizing Drug Involvement in Deaths – National Vital Statistics Reports – December 20, 2016

Extracting more accurate data from death certificates is critical to the study of drug overdoses and drug suicides.

This report describes the development and use of a method for analyzing the literal text from death certificates to enhance national mortality statistics on drug-involved deaths.

Drug-involved deaths include drug overdose deaths as well as other deaths where, according to death certificate literal text, drugs were associated with or contributed to the death. 


To address this public health concern, many researchers use National Vital Statistics System mortality data (NVSS–M) to describe these trends and to monitor the populations most at risk

The NVSS–M data are based on information from the death certificates filed in the 50 states and the District of Columbia.

The data set includes cause-of-death, demographic, and geographic information extracted from death certificates for all decedents in the United States (5).

The NVSS–M data are coded using a standardized classification system, the International Classification of Diseases and Related Health Problems, Tenth Revision (ICD–10) (6).

While this classification system allows for consistency in identifying the underlying and contributory causes of death, there are limitations in the use of ICD–10-coded data to study drug-involved mortality.

Specifically, in the ICD–10 classification system, only a few drugs (e.g., heroin, methadone, and cocaine) are assigned a unique classification code (T40.1, T40.3, and T40.5, respectively) under certain circumstances (e.g., when the death is an overdose).

Most drugs, however, are assigned to broad categories (e.g., both oxycodone and morphine are categorized to T40.2, Poisoning: Other opioids) (7).

The use of broad categories in ICD–10 makes it difficult to use ICD–10 coded data to monitor trends in deaths involving specific drugs that are not already uniquely classified in ICD–10.

Analysis of literal text has been used to enhance mortality statistics in investigations of sudden infant death syndrome, Creutzfeldt-Jakob disease, influenza and pneumonia, cancer, and drug poisonings (8–13).

The literal text often includes information beyond the general classification captured in an ICD–10 code description.

Data source

This report describes the collaborative efforts of the National Center for Health Statistics (NCHS) and the U.S. Food and Drug Administration (FDA) to develop and assess a method for using literal text from death certificates to identify specific drugs involved in deaths, that is, drug overdose deaths and deaths with other types of drug involvement.

This report accompanies a study that highlights the specific drugs most frequently involved in drug overdose deaths from 2010 through 2014 (14).

NCHS uses a software program to code the literal text from the death certificate according to the rules of ICD–10 (17).

These processes involve the identification of statements from death certificate literal text, such as “MYOCARDIAL INFARCTION” and “DIABETES MELLITUS.”

Some statements, such as “METHADONE INTOXICATION,” refer to drug-involved mortality. The identified statements are translated into ICD–10 codes.

For example, the identified statement “OXYCODONE POISONING” is coded to ICD–10 codes T40.2, Poisoning: other opioids, and X42, Accidental poisoning by and exposure to narcotics and psychodysleptics (hallucinogens), not elsewhere classified.


The analysis method uses search terms to identify drugs mentioned in electronic death certificate literal text (i.e., the cause-of-death statements on the death certificate). Unless contextual information suggested otherwise, drugs mentioned in the death certificate literal text were assumed to be involved in the death. Therefore, the method also analyzes literal text surrounding the identified search terms to determine whether the drugs mentioned were not involved in death (e.g., “METHICILLIN RESISTANT STAPHYLOCOCCUS AUREUS INFECTION”). The processed data resulting from applying the method includes all identified drug mentions and contextual information on drug involvement.

ICD–10 codes reflect the conditions reported on the death certificate. During the coding process, the software program assigns ICD–10 codes to 1 underlying cause and up to 20 multiple causes of death. Records rejected by the software program are reviewed by trained nosologists, and ICD–10 codes are manually assigned. In general, nosologists manually code about one-fifth of the death records. For deaths with an underlying cause of drug overdose (deaths with an underlying cause code of X40– X44, X60–X64, X85, or Y10–Y14), about two-thirds are coded manually (18). Entity axis ICD–10 codes include the ICD–10 code and information on the placement of the coded condition on the death certificate.

View the annotated PDF to see the figures:

Figure 1. U.S. standard death certificate

Figure 3. Example of the application of the DMI program logic to the literal text


The identification of specific drugs provides flexibility in analyses.

Specific drugs can be categorized according to classification schemes different than those of the ICD–10 categories.

Identifying specific drugs also allows comparisons between drugs within a particular class.

In addition, identifying specific drugs allows for more detailed analysis on deaths involving multiple drugs that are classified to the same or even different categories.

The application of the literal text analysis methodology described in this report can be used to enhance mortality statistics by facilitating the identification of specific drugs involved in drug overdose deaths and deaths with other drug involvement.

ICD–10 (6), which has historically been used to classify the drugs involved in the deaths in NVSS–M, is limited in that the vast majority of drugs are classified into broad categories.

For example, oxycodone, hydrocodone, and morphine are all classified to T40.2 (Poisoning: Other opioids) (7).

There are a few notable exceptions, such as heroin (T40.1), methadone (T40.3), and cocaine (T40.5), which are separately coded in the case of a drug overdose death.

In contrast, the methods described in this report allow for the identification of drugs that are not uniquely identified in ICD–10.


This report details a new method that was developed to extract information from the National Vital Statistics System death certificate literal text to improve national monitoring of drug-involved mortality.

The literal text analysis method described in this report leverages existing information on the death certificates for statistical monitoring of drug-involved mortality deaths.

Assessments conducted during the methods development process demonstrate that these methods have high accuracy in identifying the drugs mentioned and involved in mortality as well as the corresponding deaths.

These methods could be applied to analyze mortality data for causes of death classified to broad ICD categories or for emerging causes of death with no ICD code assigned.

Although the methods are limited by the level of drug-specific detail provided in the death certificate literal text, these methods are an enhancement to current ICD–10-coded mortality data.

1 thought on “Literal text analysis of death certificates

  1. Kathy C

    The Death Certificates and ICD Codes are deliberately vague. That way they can leave a lot of things up to interpretation, and obfuscate that Data. The Industries wanted it this way. They also underfunded Coroners, and Medical Investigators. They just find the Facts inconvenient. Older people who die often have their C.O.D misidentified too. They just don’t have the funding to look into why they died. This leaves an even bigger area up to interpretation, in areas like the number of deaths by Diseases and this obscures Medical Mistakes and bad or ineffective Medical care. Like a lot of things these Data Points are left unspecific for a reason.

    Liked by 1 person


Other thoughts?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.