Data Critique 📈

The National Walkability Index was generated by weighing four categories from the original SLD (Smart Location Database) and applying these categories to every single block group level in the nation, with the goal of measuring how likely a member of the community is to walk as their main mode of transportation. A block group level is a unit of census geography that is smaller than a census tract, but larger than a census block. The information taken from the SLD was intersection density, proximity to transit stops, employment type mix, and employment and household mix to create a simple formula that calculates walkability on a scale of 1-20 (1 being poor ranking and 20 being high ranking). They are averaged out with the above formula so that individual block group units can get a ranking from 1-20, but so can bigger chunks of intersections. Scores are organized as such: 1-5.57 (least walkable), 5.76-10.5 (below average walkable), 10.51-15.25 (above average walkable), 15.26-20 (most walkable). The original source of the Smart Location Database is the service provided by the US EPA Smart Growth Program. Alexander Bell, Kevin Ramsey, Jerry Walters, and Gustavo Jiminez were involved in preparing the data. Nick Vanderkwaak and Richard Kuzymyak provided feedback, so both EPA and outside organizations were involved. 

The organization that funded the creation of the dataset is the US EPA, implying the federal government is backing this. This means by having the means to compare and analyze walkability among US communities, the government could be looking to improve and better their image as a nation. Additionally, the data may contain biases as the data was sourced from another dataset not solely prepared by the EPA, but by outside organizations like Renaissance Planning Group and Fehr and Peers Transportation Consultants. This implies the possibility that companies could have personal motivations in preparing the data, which could influence the data to support the need to utilize their services as city planning professionals or consultants. This data contains a lot of information about cities across the entire U.S. There are areas that were excluded, though, which primarily consists of U.S. territories like the Virgin Islands and Puerto Rico. Depending on the data set from which a variable was sourced (i.e. different types of Census data) observations regarding non-State U.S territories were not included. Subsequently, data columns derived from previous variables do not contain information about Puerto Rico, potentially forcing us to remove observations about non-State U.S. territories entirely. In this way, non-State U.S territories have been silenced from this data set. The importance and impact of the walkability index cannot be applied to non-State U.S territories as this data set limits our ability to make such insights. Non-state U.S territories are often overlooked during discourse regarding the states, this is just one example of it.

On the other hand, although the dataset has abundant variables, it focuses on 3 criteria:  1) intersection density, as it correlates with more walk strips (variable D3b, “street intersection density”); 2) proximity to transit stops, as a shorter distance between population center to closest transit stop correlate with more walk strips (variable D4a, “distance from the population-weighted centroid to nearest transit stop”); 3) diversity of land uses, such as a) diversity of employments within a block group, as a more diverse group correlate with more walk strips (variable D2b_E8MixA, “8-tier employment entropy”), and b) diversity of employment and household units within a block group, as a more diverse group correlate with more walk strips (variable D2a_EpHHm, “employment and household entropy”). These highlighted variables are all well-considered and respectively establish a logical correlation with walkability, and do not necessarily overpower much on their own; however, some variables could also establish such correlations but were not being considered in the walkability formula, such as D5br and D5be which focus on the number of jobs and working-age population within 45-minute transit commute weighted by time decay (walk network travel time, GTFS schedules). 

As previously stated, the data set covers every block group in the US, providing a comprehensive view of walkability across various communities. The National Walkability Index offers a standardized measurement of walkability, allowing for easy comparison and analysis across different areas. The methodology for calculating the index is transparent and based on well-defined criteria, such as intersection density, proximity to transit stops, and land use diversity. On the other hand, the data set carries some weaknesses, lacking in factors pertaining to pedestrian activity like safety measures, street design, and socio-economic factors. There are also gaps in state representation that can result in incomplete insights into walkability patterns. Our dataset is a powerful tool that can inform discussion on sustainable urban development by highlighting the relationship between walkability, transportation choices, and environmental outcomes such as carbon emissions and energy consumption.

Information and files regarding our dataset can be found on the EPA’s website under “Smart Location Mapping”, or here.