FocalCXM builds enterprise mobile applications with modern cloud architecture and tries to solve many life sciences problems. In this journey, we came across this use case, in one of our Product called Folio.
A little intro about Folio: Folio is a Strategic Project Management Platform that leverages the LogFrame methodology to solve complex problems. It empowers project teams with the right set of tools to identify the right problems and to enable continuous feedback within the project teams to help them achieve desired outcomes.
The Folio mobile Application captures the feedback and pushes them to S3 in JSON format.
Currently, we are writing custom scripts to generate a simple CSV report from the JSON captured. But we have a un full-filled dream to show customers a beautiful visualization dashboard generated on the cloud.
As we moved in modern cloud architecture most of our services are from AWS. AWS has a comprehensive blend of services for storage, transformation, and Visualization that can determine our very common request.
Below are the AWS services that are required to solve this use case.
AWS S3: Is a simple storage service that can be used to store and retrieve any amount of data. It’s an object store and very reliable.
AWS Athena: Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. We will see more on the Athena querying in the blog.
AWS QuickSight: Amazon QuickSight is a fast, cloud-powered business intelligence service that makes it easy to deliver insights to everyone in your organization.
Architecture:
- After the user submits the data from the mobile applications, the application will push them into an S3 bucket.
- From Athena, we will connect it to the above S3 bucket and create the Athena DB. Where you can query the data. (we will learn more on how to write Athena queries)
- We will create a New Analysis and connect the Athena DB into the QuickSight and create a simple dashboard.
Creating a sample data set in S3:
As per our research, we found such rules where your S3 data should look like when you are going to query them through Athena.
Rule1: Maintain the unique file format. In our example, we will use the. JSON.
Rule2: Amazon Athena requires the data to be “one record per line” in the object files. If you have multiple JSON objects in a single file. Arrange them in multiple lines. Below is an image for more reference.
upload the Sample.json file into the S3 bucket and create a folder called inputFiles. The path looks like s3://{bucketName}/inputFiles/
Creating Our Athena Database and Table
Athena Database is not real database they don’t store anything, only table schema. To create an Athena Database. Please follow the below steps.
Step1: Navigate to Athena from the AWS services console.
Step 2: Use query Editor, create the database as foliodb
Step 3: You should see the new database in the database dropdown.
Step 4: To create the table under the database. Execute the create table query.
How to write Athena create Table query:
Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena is ideal for quick, ad-hoc querying but it can also handle complex analysis, including large joins, window functions, and arrays.
We are writing our Athena Create table query on top of this below JSON. This documentation will explain how to write Athena query for nested dictionaries and nested arrays.
{
"assessment_id": 27,
"id": "380649fb-20dd-4c03-8fd5-cb7d74f28cab",
"user_id": "1",
"project_id": 231,
"created_by": "10740d86-86a9-40f4-9563-5aab6495772a",
"team_id": 181,
"result": {
"extensions": {
"response": [
{
"assessmentId": 27,
"assessmentName": "Bits Assessment",
"assessmentCreatedDate": "01/21/2020",
"responseData": [
{
"questionId": 70,
"answers": [
{
"answerId": 107,
"answerText": "C"
}
],
"questionText": "Single select the skills provided below",
"questionType": "SELECT"
},
{
"questionId": 68,
"answers": [
{
"answerId": 0,
"answerText": 100
}
],
"questionText": "Provide rating on your technical skills",
"questionType": "LIKERT_SCALE"
},
{
"questionId": 67,
"answers": [
{
"answerId": 0,
"answerText": "selenium"
}
],
"questionText": "Enter your technical skills",
"questionType": "TEXT"
},
{
"questionId": 69,
"answers": [
{
"answerId": 104,
"answerText": "Java"
},
{
"answerId": 105,
"answerText": ".net"
}
],
"questionText": "Multi Select the skills below mentioned",
"questionType": "MULTI_SELECT"
}
],
"userFullName": "Nandini Dyawarkonda"
}
]
},
"completion": true,
"Platform": "FolioLite",
"app_version": "1.2.5"
},
"verb": {
"id": "http://focalcxm.com/expapi/verbs/Completed"
},
"actor": {
"mbox": "mailto:nandini.d@focalcxm.com",
"name": "Nandini Dyawarkonda"
},
"timestamp": 1579613068579,
"version": "1.0.0"
}
Follow the below Rules to write the query.
Rule 1: Initially understand the JSON thoroughly before writing the query. How many Objects, lists are available?
Rule 2: The name of the column matches the names of the JSON keys and followed datatype.
Rule 3: The syntax for the Object is Struct. For every Object, you need to have a struct syntax like below.
struct< >
E.g.: In the above JSON, the dictionary starts from the result. So
result struct<extensions:……………….>
- The Extensions in the JSON is also another dictionary. So, the syntax looks like this.
result struct<extensions:<struct<response:………>
Rule 4: The syntax for the List is.
array< >
E.g. : In the above JSON, the responses are in an array.
result struct<extensions:<struct<response:array<struct<assessmentId: int,assessmentName: string,………>>>>
Rule 5: Please consider commas (,) you usually get errors due to this
Tip: The simple trick to maintain commas is to compare the JSON and apply the similar comma structure in the Athena Query.
Each time you write and execute the query and test how the table structure is created. For my example after the create table query, my result data it looks like below.
All the root keys in the JSON are created as individual columns and rest result JSON comes under a single column, which is not a JSON. It is now a quarriable object. That means you can now able to drill down with required fields under them.
The next step is to flatten the response object. Where AssessmentName, QuestionText, AnswerText, and other variables should have each column separately. To achieve this, we required a select query on top of the table we created in the previous step.
How to write Athena Select Query:
Writing a select query in Athena is pretty simple. Mention each key and its alias name if required.
- If you would like to create a column for the key from inside the array. You can access by mentioning the index of the array.
- To loop/flatten the array, we use the function called Transform and CROSS JOIN unnest() Which loops the array and create different entries(rows) in the result.
Execute the query to see the results table. You can see the different columns in the table.
Let’s Visualize the above result in the QuickSight
Search and navigate to quickSight from the AWS-Console and follow the below steps to create visualize.
Step 1: Click on New Analysis
Step 2: To create a new data set. Select on New data set.
Step 3: Select the source of your data set. In our use case, our source of data set is Athena. Select Athena in the list shown.
Step 4: Enter Data Source Name as AwsDataCatalog and create a data source.
Step 5: Chose your database and table
Step 6: Choose Import to SPICE for quicker analytics, click on Visualize.
Step 7: You should able to select all the fields and use an appropriate graph to visualize the data. Below a few examples where I tried it out.
Hope you find this useful.