In my last post I created a CLI tool to display NYC Covid-19 test results by Zip code using Perl, my favorite language for the moment. I would also like to do the same using Python. Purely as a an excuse to learn Python. This will download the same data, from the NYC health department’s GitHub page , and create a JSON file which I can use as a very basic database for later analysis.
Here is an sample of the downloaded raw data.
"MODZCTA","Positive","Total","zcta_cum.perc_pos"
NA,1558,1862,83.67
"10001",309,861,35.89
"10002",870,2033,42.79
"10003",396,1228,32.25
"10004",27,85,31.76
"10005",54,206,26.21
"10006",21,91,23.08
"10007",49,204,24.02
"10009",607,1745,34.79
This is the first iteration of my script.
from __future__ import print_function
import datetime, json, requests, os, re, sys
RAW_ZCTA_DATA_LINK = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/tests-by-zcta.csv'
ALL_ZCTA_DATA_CSV = 'all_zcta_data.csv'
# -------------------------------------------------------------------------------------------------
# Functions
# -------------------------------------------------------------------------------------------------
def get_today_str():
today = datetime.date.today().strftime("%Y%m%d")
return today
def find_bin():
this_bin = os.path.abspath(os.path.dirname(__file__))
return this_bin
def create_dir_if_not_exists(base_dir, dir_name):
the_dir = base_dir + '/' + dir_name
if not os.path.isdir(the_dir):
os.mkdir(the_dir)
return the_dir
def create_db_dirs():
this_bin = find_bin()
db_dir = create_dir_if_not_exists(this_bin, 'db')
today_str = get_today_str()
year_month = today_str[0:4] + '_' + today_str[4:6];
year_month_dir = create_dir_if_not_exists(db_dir, year_month)
return year_month_dir
def get_covid_test_data_text():
r = requests.get(RAW_ZCTA_DATA_LINK)
print("Resp: " + str(r.status_code))
return r.text
def create_list_of_test_data():
test_vals = []
covid_text = get_covid_test_data_text()
for l in covid_text.splitlines():
lvals = re.split('\s*,\s*', l )
if lvals[0] == '"MODZCTA"':
continue
zip_dic = { 'zip' : lvals[0], 'positive': lvals[1], 'total_tested': lvals[2], 'cumulative_percent_of_those_tested': lvals[3]}
test_vals.append(zip_dic)
return test_vals
def write_todays_test_data_to_file():
year_month_dir = create_db_dirs()
test_data = create_list_of_test_data()
print(test_data[0])
today_str = get_today_str()
todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'
out_file = open ( todays_file, 'w')
json.dump(test_data, out_file, indent=2)
print("Created todays ZTCA tests file,{todays_file}".format(**locals()))
out_file.close()
# -------------------------------------------------------------------------------------------------
write_todays_test_data_to_file()
Just a few snippets of interesting code here.
To get todays date as a string in the format ‘yyyymmdd’, example, 20200401, I used the datetime module.
today = datetime.date.today().strftime("%Y%m%d")
Python has an interesting syntax for slicing strings or lists up into pieces. I used it here to create a directory name using the current year and month.
year_month = today_str[0:4] + '_' + today_str[4:6]
The ‘[0:4]’ gets the first four characters of the string. The ‘[4:6]’ grabs the subsequent 2 characters of the string. These are combined to create a sub-directory name like ‘2020_05’.
To get the directory location of this script, kind-of similar to the Find::Bin in Perl, I used the path method of the os path library.
this_bin = os.path.abspath(os.path.dirname(__file__))
After downloading the raw test data for the current date from the NYC department of health GitHub page, using the requests library.
r = requests.get(RAW_ZCTA_DATA_LINK)
print("Resp: " + str(r.status_code))
return r.text
It is then split up using the ‘re’ module, which seems to be Pythons rather awkward way of doing regular expression matching.
lvals = re.split('\s*,\s*', l )
This will split each line of input data similar to this,
"10003",396,1228,32.25
Which can then be inserted to a python Dictionary structure like this,
{
"zip": "10003",
"yyyymmdd": "20200503",
"positive": "396",
"total_tested": "1228",
"cumulative_percent_of_those_tested": "32.25"
}
{
"zip": "10003",
"yyyymmdd": "20200503",
"positive": "396",
"total_tested": "1228",
"cumulative_percent_of_those_tested": "32.25"
}
This is appended to the end of a list of similar Dictionaries.
You may notice how I create the file path string is a little kludgy.
todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'
I have since learned that there’s a better way to do this using the os path library, which I’ll do the next time.
To print the data in JSON format to a file, Python provides the aptly named ‘json’ library. To dump the data to a file, simply,
json.dump(test_data, out_file, indent=2)
The “indent=2”, isn’t necessary, but it makes the output more readable.
To read JSON data from the file,
test_data = json.load(in_file)
Read more about it here, Python JSON docs.
In the next post I will add more functionality to add more location details for each zip code where the tests were conducted, using a NYC Zip Code database file.