Reading Google Fit data. Part 2

After identifying the data sources held by #GoogleFit (see the post), the next step is to extract the information contained within each source in a format that could be used to replicate the graphs on the https://fit.google.com/ site such as.

fit google web report

The first step is to catalogue the data sources and determine whether they contain data. The following #python function does that.


def getSummariesForDataSources(dataSources, timeWindow):
    step5CodeTemplate = 'curl https://www.googleapis.com/fitness/v1/users/me/dataSources/' + \
    '<dataSource>/datasets/<timeWindow>?access_token=<accessToken>'
    nPoints = []
    sourceList = []
    lastBeginTime = []
    lastEndTime = []
    for dSource in dataSources:
        step5 = step5CodeTemplate.replace('<accessToken>',accessToken) \
        .replace('<dataSource>',dSource) \
        .replace('<timeWindow>',timeWindow)
        try:
            dataInput = getAllJSON(step5)
            nP, lastBegin, lastEnd = getDataTimes(dataInput)
            nPoints.append(nP)
            lastBeginTime.append(lastBegin)
            lastEndTime.append(lastEnd)
            sourceList.append(dSource)                
        except:
            print "error"
    ## summary statistics by data source: number of points, source name, last begin and end times        
    for i in range(len(sourceList)):
        print(i, nPoints[i], sourceList[i], lastBeginTime[i], lastEndTime[i])
    f = open(dataDir + "fitDataSourceList.csv","w")
    f.write("" + "number" +  "," + "nPoints" +  "," + "SourceId" +  \
    "," + "lastBeginTime" +  "," + "lastEndTime" + "\n")
    for i in range(len(sourceList)):
        f.write(str(i) + "," + str(nPoints[i]) +  "," + sourceList[i] +  \
        ",'" + lastBeginTime[i] +  "','" + lastEndTime[i] + "'\n")
    f.close()
    return sourceList

The two functions below extract information from the available data stores and write it to csv files. The createCsvFile functions reads the data in json format. The inputs can be examined in this format or formatted as a csv file in the outputOneSourceStats function.


def outputOneSourceStats(dataInput, outFileName):
    pprint.pprint(len(dataInput))
    points = dataInput['point']       
    pprint.pprint(str(points[0]['value']))
    startTimes = []
    startMilliseconds = []
    endTimes = []
    endMilliseconds = []
    keys = []
    values = []
    for point in points:
        startMilliseconds.append(point['startTimeNanos'])
        startTimes.append(msToTime(point['startTimeNanos']))
        endMilliseconds.append(point['endTimeNanos'])
        endTimes.append(msToTime(point['endTimeNanos']))
        key =[]
        value = []
        for i in range(len(point['value'])):
            key.append(point['value'][i].keys()[0])
            value.append(point['value'][i].values()[0])
        if len(keys) == 0: 
            keys = key
        values.append(value)
        ## write csv file
    header = ""
    for j in range(len(keys)):
        header += "," + keys[j] + str(j+1)
    f = open(outFileName,"w")
    f.write("" + "begin" +  "," + "end" +  header + "\n")
    for i in range(len(values)):
        v = ""
        for j in range(len(keys)):
            v += "," + str(values[i][j])
        f.write("'" + startTimes[i] +  "','" + endTimes[i] + v + "\n")
    f.close()

def createCsvFile(fileId, dataSource, accessToken, timeWindow):        
    step5CodeTemplate = 'curl '\
    'https://www.googleapis.com/fitness/v1/users/me/dataSources/<dataSource>/datasets/<timeWindow>' + \
    '?access_token=<accessToken>'
    step5 = step5CodeTemplate \
    .replace('<accessToken>',accessToken) \
    .replace('<dataSource>',dataSource) \
    .replace('<timeWindow>',timeWindow)
    dataInput = getAllJSON(step5)        
    outputOneSourceStats(dataInput, dataDir + "fitData" + fileId + ".csv")

After executing the processes in the previous post to create the sourceList and accessToken, the functions are called using the following code. The functions create csv files containing a summary of their contents and the information within each data source.



sourceList = getSummariesForDataSources(dataSources, timeWindow)    

import sys

timeWindow = '0-4102462801000000000'  # 1970 to 01/01/2100

dataDir = ''
for i in range(len(sourceList)): 
    sys.stdout.write(str(i) + " ")
    createCsvFile(str(i), sourceList[i], accessToken, timeWindow)

As a reference, on my current Google login, there are 34 sources of which 27 contained data. Of these, ‘derived:com.google.step_count.delta:com.google.android.gms:estimated_steps’ was the id of the data matching the graphs on the Google fit web page. Within that data source, the daily activity is the sum of the entries grouped by the end of each time interval.

Advertisements
Comments
One Response to “Reading Google Fit data. Part 2”
Trackbacks
Check out what others are saying...


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: