Data Processing


Processing the Emissions data for MapEcos was a major undertaking.  We tried to be as unbiased and informative as possible within the constraints set by the size of the location point bubbles.  We tried also to be careful and accurate, but we are certain that some errors remain.  This page provides more detailed information about how we processed these reported data.

Corporate Ownership Information

Firms report the identity of their corporate ownership to the TRI by reporting a name and a Dun and Bradstreet Identifier.  Unfortunately, the name of the facility is often inconsistently reported and the ID is invalid or does not match the name of the reported corporate owner.  To identify the corporate owner of each facility, we used a computerized matching program to match each facility to the National Establishment Time-Series (Nets) Database database. 

We report our estimates for the most recently released TRI year – 2005.


We obtained location data from the Council on Environment Cooperation, the EPA, and automated requests to Google to geocode street addresses.  We attempted to identify the most accurate location information by comparing data from multiple sources, but location errors undoubtedly persist.  If you would like to report such an error, please click on the “report inaccurate information” link.


Emissions data were calculated from the 2007 release (2005 reporting year)  of the EPA’s Toxic Release Inventory. 


Hazard score in the “summary” tab are calculated as:

  Oral Hazard

=  RSEI* scores for oral toxicity X lbs emitted to water,

  Air Hazard

= RSEI* scores for inhalation toxicity X lbs emitted to air,


= Oral Hazard + Air Hazard

* Risk-Screening Environmental Indicators

Note, we recentered the RSEI scores so that the average chemical in the TRI has a toxicity of 1.  Thus, if a firm has a lower hazard score than its lbs emitted, it is emitting less toxic chemicals. 

Dot color and level

Most facilities in the US have low emissions, while a few facilities have enormous emissions.  In calculating our levels and in setting the color of the facility marker dot, we wanted to take this into account.  As a result, we created an exponential level.  In our level scheme, the bottom 50% of the firms are in level 0, the next 25% in level 1, and so on.  Despite the reducing number of firms in each level, the total emissions from all the facilities in each level actually increase with the level (e.g. emissions from all facilities in level 9 > emissions from facilities in level 8).






< 1 lb



1 to 10



. to100



. to1000



. to10000



. to100000



. to1000000



. to10000000



. to100000000



> 100000000 lbs

Line graphs

To create the line graphs, we first estimated the average for that industry (based on 2 digit Standard Industry Classification (SIC) codes.  We then logged this value to make graphing easier.  The graphs then report this logged “average” facility relative to the focal facility.


To calculate the histogram graphs, we first calculated the log value of emissions for the facility in the comparison group (state, county, industry) with the most emissions.  We divide this number into 6 groups and calculate the number of firms in each group.   




Developed by MapMundi

© MapEcos