Multi-Scale Dataset

Editing Mobility Trend by Community from Google

We’ve added “mobility trend” time series data in location_properties.json. For every geo-location in the above json file, a new key called “google_mobility” is added. The data is obtained from: https://www.google.com/covid19/mobility/. Each mobility trend is a set of time series presented by location (community) and highlights the percent change in visits to places like grocery stores and parks within a geographic area.

Place categories:

Grocery & pharmacy (places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies)
Parks (places like local parks, national parks, public beaches, marinas, dog parks, plazas, and public gardens)
Transit stations (places like public transport hubs such as subway, bus, and train stations)
Retail & recreation (places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters)
Residential (places of residence)
Workplaces (places of work)

Data info details:

These datasets show how how visits and length of stay at different places change compared to a baseline. You can interpret baseline as “normal” times because the baseline is the median value, for the corresponding day of the week, during the 5-week period Jan 3–Feb 6, 2020. Google calculated these changes based on Google Map data from users who have opted-in to Location History for their Google Account, so the data represents a sample of our users. As with all samples, this may or may not represent the exact behavior of a wider population. Changes for each day are compared to a baseline value for that day of the week.

Format:

If that geo-location does not have any mobility trend data, the key-value pair in that dictionary will be:

"google_mobility": null

else:

"google_mobility": {
    "transit_stations": {
            "2020-03-08": "0", 
            "2020-03-03": "11", 
            "2020-02-24": "3",
            ... },
    "residential": {
            "2020-03-08": "0", 
            "2020-03-03": "11", 
            "2020-02-24": "3",
            ... },
    "workplaces": {
            "2020-03-08": "0", 
            "2020-03-03": "11", 
            "2020-02-24": "3",
            ... },
    "parks": {
            "2020-03-08": "0", 
            "2020-03-03": "11", 
            "2020-02-24": "3",
            ... },
    "grocery_and_pharmacy": {
            "2020-03-08": "0", 
            "2020-03-03": "11", 
            "2020-02-24": "3",
            ... },
    "retail_and_recreation": {
            "2020-03-08": "0", 
            "2020-03-03": "11", 
            "2020-02-24": "3",
            ... },
    }
}

A Few Notes:

1- Missing value is always None.
2- You may encounter empty fields for certain places and dates and for a specific region or category, because Google left it out if they don’t have sufficient statistically significant levels of data. Hence, it is possible that the entire time series for a category of a geo-location is missing.
3- For New York City, the value is the average from Kings County, New York County, Queens County, Richmond County, and Bronx County (have checked, sum is not correct).
4- The following are the statistics you need to pay attention to:

(1). min_date, max_date: ('2020-02-15', '2020-06-05'). 
     Note, as mentioned in 2, not all time series is from this min_data to max_date. And, there might be missing values in between the dates of the time series.
(2). ('Among all US scale2 locations, has google trend v.s. all: ', 51 v.s. 56, 0.91)
     ('Among all US scale3 locations, has google trend v.s. all: ', 2821, 3811, 0.74)
     ('Among all scale1 locations (countries), has google trend v.s. all: ', 131 v.s. 206, 0.64)     
     ('Among all regions apart from US, has google trend v.s. all: ', 539 v.s. 5203, 0.1)
     Takeaway:
       - Among 56 US scale2 locations, mobility trend is available for 51. The missing ones are the territories.
       - 74% US scale3 locations (2821 out of 3811) have mobility trend data.
       - 131 out of 206 countries have mobility trend data.
       - Outside of US, most scale2 and scale3 locations do not have mobility trend data.

5- Regarding data approximation: If approximation is necessary, summation or division this kind of approximation for missing data is not appropriate, instead, you should use average or copy. For example: if a state of a country is missing the mobility value for ‘residential’ on day 2020-03-21, you can copy that state’s neighbor’s value for ‘residential’ on day 2020-03-21, or copy the country’s value. Or if a country is missing the mobility value for ‘residential’ on day 2020-03-21, you can use its children’s average mobility value for ‘residential’ on day 2020-03-21 to approximate. (similar as what I did for New York City in 2.)
6- Location accuracy and the understanding of places varies from region to region, so Google does not recommend using this data to compare changes between countries, or between regions with different characteristics (e.g. rural versus urban areas).

You can check Global_Mobility_Report.csv to understand the various cases of data fields.

Region Border Edges

We have added:

1- border edges that connect world-wide countries in edges_country_border.csv.
2- border edges that connect US states in edges_us_state_border.csv.
3- border edges that connect US counties in edges_us_county_border.csv.

Format:

The format is similar to "flight.csv" (i.e., edgelist), but no header!

1. border edges that connect world-wide countries.

Every line is country_1_iso2, country_2_iso2, e.g. US, CA. It means that country_1 shares border with country_2. Use the following code to load the data:

>>> import csv
>>> with open('edges_country_border.csv','rt') as f: 
...     edges_country_border = list(csv.reader(f))
... 
>>> edges_country_border[0]
['TP', 'ID']

2. border edges that connect US states.

Every line is US_state_1, US_state_2., e.g. US/New York, US/New Jersey. It means that US_state_1 shares border with US_state_2. Use the following code to load the data:

>>> import csv
>>> with open('edges_us_state_border.csv','rt') as f: 
...     edges_us_state_border = list(csv.reader(f))
... 
>>> edges_us_state_border[0]
['US/Alabama', 'US/Florida']

3. border edges that connect US counties.

Every line is US_county_1, US_county_2., e.g. US/New Jersey/Hudson, US/New York/New York City. It means that US_county_1 shares border with US_county_2. Use the following code to load the data:

>>> import csv
>>> with open('edges_us_county_border.csv','rt') as f: 
...     edges_us_county_border = list(csv.reader(f))
... 
>>> edges_us_county_border[0]
['US/Alabama/Autauga', 'US/Alabama/Chilton']

A Few Notes:

1- The graph that is formed by each type of border edges, is bi-directed. I.e., you will observe both edge (node1, node2) and edge (node2, node1) in the above data files.
2- The graph contains no self loops! I.e., a node does not have an edge pointing to itself in the above data files.

Region Identity

(Throughout this repo, “region” == “geo-location” == “location”, they are equivalent concepts.)

Regions and their properties can be found in location_properties.json. Region name is the key in the above json. Region name in the finest-granularity is in the form of “country_iso2/scale2/scale3/IATA_code”, for example, "US/New Jersey/Newark/EWR". Scales are separated by '/' in the region name. You can use all_locations.txt to see all region names.

Scales:

Scale	Region Type
1	Country ISO2
2	State / Province
3	County / City / Town / Census Area / Borough / Municipality / Parish / Island
4	Airport IATA

Statistics:

(‘number of scale 1 nodes (countries): ‘, 206)

(‘number of scale 2 nodes (states): ‘, 1445)

(‘number of scale 3 nodes (counties, cities, towns): ‘, 7420)

(‘number of airports: ‘, 2890)

(‘total number of locations: ‘, 11961)

(‘number of flights: ‘, 35711)

(‘us flights (within): ‘, 7604)

(‘us flights (arrive): ‘, 1054)

(‘us flights (leave): ‘, 1059)

==== (‘number of locations missing centriods: ‘, 0)
====

(‘number of scale2 locations in US: ‘, 56)

(‘number of scale3 locations in US: ‘, 3811)

— missing populations US

(‘number of scale2 locations missing population in US: ‘, 0)

(‘number of scale3 locations missing population in US: ‘, 0)

(‘total US population: ‘, u’331002647’)

— missing populations entire dataset (including US)

(‘number of locations missing population (apart from airports): ‘, 4750)

(‘number of scale1 locations missing population in entire dataset: ‘, 0)

(‘number of scale2 locations missing population in entire dataset: ‘, 1389)

(‘number of scale3 locations missing population in entire dataset: ‘, 3361)

A Few Notes:

1- Tried to be consistent with JHU COVID-19 database. Though: (‘Diamond Princess’, ‘– which we do NOT consider as node!’), (‘MS Zaandam’, ‘– which we do NOT consider as node!’), (‘Unassigned’, ‘– which we do NOT consider as node!’), and (Baltimore City – which we do NOT consider as node)! (see note 8 for additional details)
2- admin2 in JHU database can be county, or city (e.g., ‘Nassau, Florida, US’ is a county, but ‘Nassau, New York, US’ or ‘Montgomery, New York, US’ are for a town of some county. ‘St. Lawrence, New York, US’ is a county. We also have ‘New York City, New York, US’ in JHU database).
3- Correct: [‘AU/New South Wales/Bathurst/Bathurst Airport’, ‘CA/New Brunswick/Bathurst/Bathurst Airport’]
4- Correct: [‘US/Florida/Melbourne/Melbourne International Airport’, ‘AU/Victoria/Melbourne/Melbourne International Airport’]
5- Name for the city New York can only be ‘New York City’ (check consistency for this when adding new data), because in JHU, we have ‘New York City’ in admin2 (scale3) not ‘New York’
6- Fixed ‘US/New York/Jamaica/John F Kennedy International Airport’ issue (it is now ‘US/New York/New York City/John F Kennedy International Airport’)
7- Names of all locations have been ‘ascii’ encoded to remove weird characters (e.g. French), code using was: airport = airport.rstrip().encode('ascii', 'ignore')
8- JHU database has Baltimore and Baltimore City simultaneously appearing:

— Baltimore, Maryland, US: 353 3 0 0 (Confirmed Deaths Recovered Active)
— Baltimore City, Maryland, US: 265 2 0 0 (Confirmed Deaths Recovered Active)
— Centriods of Baltimore is: 39.45784712 -76.62911955 and Centriods of Baltimore City is: 39.30211911 -76.61151012

Baltimore should contain Baltimore City. We will only keep Baltimore. Hence, in our graph, we also have Baltimore and Baltimore City simultaneously appearing (with same properties and edges).
9- New York City actually contains 5 counties: New York County (Manhattan), Kings County (Brooklyn), Bronx County (The Bronx), Richmond County (Staten Island), and Queens County (Queens). However, JHU database has New York City, and its 5 counties simultaneously appearing, and the counties number does not add up to the New York City number:

— New York City: 12305 99 0 0 (Confirmed Deaths Recovered Active)
— Queens County (Queens): 0 0 0 0 (Confirmed Deaths Recovered Active)
— Richmond County (Staten Island): 0 0 0 0 (Confirmed Deaths Recovered Active)
— Bronx County (The Bronx): 0 0 0 0 (Confirmed Deaths Recovered Active)
— Kings County (Brooklyn): 0 0 0 0 (Confirmed Deaths Recovered Active)
— New York County (Manhattan): Missing data

Resource: https://github.com/CSSEGISandData/COVID-19/blob/93f4ccb579d7ce0323c5ad34084981f891951c0e/csse_covid_19_data/csse_covid_19_daily_reports/03-23-2020.csv. Hence, in our graph, we will only have a New York City node (like what google did). We will not have its 5 county nodes.

US State “Stay at Home” Policy

We’ve added “stay at home” policy data for all US scale2 (state) nodes in location_properties.json. For every geo-location in the above json file, a new key called “policy” is added.

Format:

If that geo-location does not have any policy data, the key-value pair in that dictionary will be:

"policy": {"stay-at-home": null}

else:

"policy": {
    "stay-at-home": {
        "start": "2020-04-03", 
        "end": "2020-04-30", 
        "no_stay_at_home_but_other_restrictions": null}
        "description": "Gov. Ron DeSantis, a Republican, said that the state would take a \"small, deliberate, methodical\" approach to reopening by allowing restaurants and stores to operate at 25 percent capacity starting May 4. Movie theaters will remain closed. So will bars, gyms and personal services such as hairdressing. For now, the reopening will exclude Miami-Dade, Broward and Palm Beach Counties, the state's most populous, which have seen a majority of coronavirus cases.", ``
        "link": "https://www.nytimes.com/interactive/2020/us/coronavirus-stay-at-home-order.html"
    }
}

A Few Notes:

1- In the “policy” sub-dictionary, if value for a certain key is missing, it will be None.

2- If the start date and end date for “stay at home” is both None, then you need to check the “no_stay_at_home_but_other_restrictions” field. E.g., “US/Oklahoma”:

"policy": {
    "stay-at-home": {
        "start": null, 
        "end": null, 
        "no_stay_at_home_but_other_restrictions": "TRUE",
        "description": "Oklahoma was among a handful of states where governors did not issue formal stay-at-home orders. Gov. Kevin Stitt, a Republican, lifted restrictions on businesses starting with salons, barbers and pet groomers on April 24. Restaurant dining, movie theaters, gyms, houses of worship and sporting venues are expected to reopen statewide, with certain restrictions F13 starting May 1.", 
        "link": "https://www.nytimes.com/interactive/2020/us/coronavirus-stay-at-home-order.html"
    }
}

3- Not all states have both start date and end date; some states only have start date but no end date released yet (E.g., “US/California”).
You can check stay-at-home_US.csv to understand the various cases of data fields.

Editing Mobility Trend by Community from Google

Place categories:

Data info details:

Format:

A Few Notes:

Region Border Edges

Format:

1. border edges that connect world-wide countries.

2. border edges that connect US states.

3. border edges that connect US counties.

A Few Notes:

Region Identity

Scales:

Statistics:

(‘total number of locations: ‘, 11961)

A Few Notes:

US State “Stay at Home” Policy

Format:

A Few Notes: