In my last post I created a CLI tool to display NYC Covid-19 test results by Zip code using Perl, my favorite language for the moment. I would also like to do the same using Python. Purely as a an excuse to learn Python. This will download the same data, from the NYC health department’s GitHub page , and create a JSON file which I can use as a very basic database for later analysis.
Here is an sample of the downloaded raw data.
"MODZCTA","Positive","Total","zcta_cum.perc_pos"
NA,1558,1862,83.67
"10001",309,861,35.89
"10002",870,2033,42.79
"10003",396,1228,32.25
"10004",27,85,31.76
"10005",54,206,26.21
"10006",21,91,23.08
"10007",49,204,24.02
"10009",607,1745,34.79
This is the first iteration of my script.
from __future__ import print_function import datetime, json, requests, os, re, sys RAW_ZCTA_DATA_LINK = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/tests-by-zcta.csv' ALL_ZCTA_DATA_CSV = 'all_zcta_data.csv' # ------------------------------------------------------------------------------------------------- # Functions # ------------------------------------------------------------------------------------------------- def get_today_str(): today = datetime.date.today().strftime( "%Y%m%d" ) return today def find_bin(): this_bin = os.path.abspath(os.path.dirname(__file__)) return this_bin def create_dir_if_not_exists(base_dir, dir_name): the_dir = base_dir + '/' + dir_name if not os.path.isdir(the_dir): os.mkdir(the_dir) return the_dir def create_db_dirs(): this_bin = find_bin() db_dir = create_dir_if_not_exists(this_bin, 'db' ) today_str = get_today_str() year_month = today_str[ 0 : 4 ] + '_' + today_str[ 4 : 6 ]; year_month_dir = create_dir_if_not_exists(db_dir, year_month) return year_month_dir def get_covid_test_data_text(): r = requests.get(RAW_ZCTA_DATA_LINK) print ( "Resp: " + str (r.status_code)) return r.text def create_list_of_test_data(): test_vals = [] covid_text = get_covid_test_data_text() for l in covid_text.splitlines(): lvals = re.split( '\s*,\s*' , l ) if lvals[ 0 ] = = '"MODZCTA"' : continue zip_dic = { 'zip' : lvals[ 0 ], 'positive' : lvals[ 1 ], 'total_tested' : lvals[ 2 ], 'cumulative_percent_of_those_tested' : lvals[ 3 ]} test_vals.append(zip_dic) return test_vals def write_todays_test_data_to_file(): year_month_dir = create_db_dirs() test_data = create_list_of_test_data() print (test_data[ 0 ]) today_str = get_today_str() todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json' out_file = open ( todays_file, 'w' ) json.dump(test_data, out_file, indent = 2 ) print ( "Created todays ZTCA tests file,{todays_file}" . format ( * * locals ())) out_file.close() # ------------------------------------------------------------------------------------------------- write_todays_test_data_to_file() |
Just a few snippets of interesting code here.
To get todays date as a string in the format ‘yyyymmdd’, example, 20200401, I used the datetime module.
today = datetime.date.today().strftime("%Y%m%d")
Python has an interesting syntax for slicing strings or lists up into pieces. I used it here to create a directory name using the current year and month.
year_month = today_str[0:4] + '_' + today_str[4:6]
The ‘[0:4]’ gets the first four characters of the string. The ‘[4:6]’ grabs the subsequent 2 characters of the string. These are combined to create a sub-directory name like ‘2020_05’.
To get the directory location of this script, kind-of similar to the Find::Bin in Perl, I used the path method of the os path library.
this_bin = os.path.abspath(os.path.dirname(__file__))
After downloading the raw test data for the current date from the NYC department of health GitHub page, using the requests library.
r = requests.get(RAW_ZCTA_DATA_LINK)
print("Resp: " + str(r.status_code))
return r.text
It is then split up using the ‘re’ module, which seems to be Pythons rather awkward way of doing regular expression matching.
lvals = re.split('\s*,\s*', l )
This will split each line of input data similar to this,
"10003",396,1228,32.25
Which can then be inserted to a python Dictionary structure like this,
{ "zip" : "10003" , "yyyymmdd" : "20200503" , "positive" : "396" , "total_tested" : "1228" , "cumulative_percent_of_those_tested" : "32.25" } |
{
"zip": "10003",
"yyyymmdd": "20200503",
"positive": "396",
"total_tested": "1228",
"cumulative_percent_of_those_tested": "32.25"
}
This is appended to the end of a list of similar Dictionaries.
You may notice how I create the file path string is a little kludgy.
todays_file = year_month_dir + '/' + today_str + '_tests_by_ztca.json'
I have since learned that there’s a better way to do this using the os path library, which I’ll do the next time.
To print the data in JSON format to a file, Python provides the aptly named ‘json’ library. To dump the data to a file, simply,
json.dump(test_data, out_file, indent=2)
The “indent=2”, isn’t necessary, but it makes the output more readable.
To read JSON data from the file,
test_data = json.load(in_file)
Read more about it here, Python JSON docs.
In the next post I will add more functionality to add more location details for each zip code where the tests were conducted, using a NYC Zip Code database file.