Now that you know what an Intersect data notebook is, and what you can do with it, let’s create your first ever notebook. Together, we will create a notebook that will execute the following steps:
Downloading the CSV file
For this notebook, we will be using a CSV file, which you can download by clicking here.
Creating your new notebook
Login to your Intersect data studio, and click on the “New Notebook” button on the top-right. You can name it anything you want; let’s name it “Uber trips analysis”.
First things first, we will need to upload the CSV file we just downloaded to our new notebook. Click on the box that says, “Add an import block to get started…”. That should take you to a block gallery, where you can find a number of blocks available to you to build your notebook.
We need to upload a file, so find the block that’s called, “Read/upload file” and click on “Add”.
This will create an upload box in your notebook. Drag and drop your file into that box and click on “Add step” to trigger the upload process. With a file this size, it should take no more than a few seconds to finish the upload. You will know it was successful when you see a green checkmark next to the box.
To see the “result” of this step (in other words, the data you uploaded), click on the “Result” toggle that appears below the upload box. Take some time to familiarize yourself with the data. You will notice that every row is a separate Uber trip, and contains attributes like when the trip was ordered, when the trip started, when it ended, the mileage, fare etc.
Filtering for real trips
You might notice that some of the rows contain trips that ended up being canceled. Some of the trips were also outside the US. There’s also a variety of products (UberX, Uber Pool etc) used.
For the purposes of this notebook, we only want to look at UberX trips in the US that were completed. In other words, we need to filter our data.
To add a “Filter” block, click on “Work With Data” on the bottom. That should bring up the block gallery again. You can find the block you want one of two ways:
- Type “filter” in the search bar on the top; or
- Navigate to the Filter block by clicking on Analyze Data > Filter or Sort.
Once you see the “Filter Data” block, click on the “Add” button.
This will add an empty block to your notebook that you can fully configure. Block on Intersect are filled out in a mad-lib style; you can read out loud what you want done. For most blocks, the first thing you select is the data you want the block to operate on. The dataset can be any of the outputs from one of the previous steps. The most recent output is always selected by default.
You can use the following figure as a reference to configure this block. When you’re done, click on “Add Step” and wait for the green checkmark to appear. As a sanity check, when you click on the “Results” toggle, it should report 96 rows.
PS: Don’t worry if the name of the dataset looks different – this is generated automatically and is unique in every notebook.
Calculating trip duration and time to pickup
Next, we will do some date/time arithmetic, kind of a pain to do in other tools.
Once again, click on “Work With Data” and find the “Difference between two date/time columns” by navigating “Analyze Data” > “Date Time Operations”, and click on the “Add” button.
Now configure the block to calculate the “time to pickup” (Begin Trip Time - Request Time) as below. You can verify this step worked by clicking on the “Results” toggle and scrolling all the way to the right to find the new column created.
Repeat the same steps, to calculate the “trip duration” (Dropoff Time - Begin Trip Time).
Next, we will summarize the pickup time across cities, to see if there are any patterns.
Once again, click on the “Work With Data”, and find the “Aggregate Data” block by navigating to “Analyze Data” > “Summarize, Group and Enrich Data”. Add that block.
Aggregate data is a very powerful block that enables us to make “groups” similar rows together and calculate summary statistics for each group.
In this case, we will want to group by “City” and calculate the “mean” of the “time to pickup (minutes)” column we created in the previous step.
The output should only have as many rows as the unique cities in our data, and 2 columns.
Finally, let’s visualize this data.
Click on the “Visualize Data” button on the bottom, and select “Bar Chart”. We will want the City on the x-axis and bar heights represented by “time to pickup” as below.
Once again, you can take a look at the resulting chart by clicking on the Result toggle. You can edit the title by clicking on the pencil icon next to the Result toggle.
Congratulations!🎊 You have created your first notebook.
(Extra credit) Relationship between trip duration and fare
Let’s say you want to make a scatter plot of the trip duration on the x-axis and the Fare on the y-axis so you can see if there’s any relationship between the two. Easy!
Just like above, select “Visualize Data” and add the “Scatter Plot” block. You will notice that your dataset does not have these columns, because they went away when you aggregated data.
But there is a dataset that has those columns!
In the first form field (the one after “From Dataset”, that we have been ignoring so far), select the second to last dataset. This one does have the columns you want. Go ahead and select the correct x and y axis columns, and click on Add Step.