Analyzing my own talk transcript with natural language processing

Written by srobtweets | Published 2017/01/27
Tech Story Tags: cloud-computing | google-cloud-platform | natural-language | machine-learning | public-speaking

TLDRvia the TL;DR App

As a speaker, watching videos of my talks is an essential part of continuously improving (though I’ll admit it’s incredibly painful). But what about running natural language processing (NLP) on one of my own talks?

Since it’s almost time for our Google Cloud Next conference I thought it would be interesting to take a look at my talk from the same conference last year by sending the transcript through the Natural Language API. I wanted to look at sentiment, entities, and language trends.

How positive was my talk?

My talk was 37 minutes with 532 sentences. I extracted the text using the captions provided by YouTube, which you can get by clicking “More” > “Transcript” on your video:

Since it was a technical talk I didn’t expect sentiment to be too strong, but I wanted to see how the NL API performed at analyzing it. Here’s a histogram showing the number of sentences from my talk and their sentiment score. The score is a number from -1 to 1 indicating whether a sentence is positive or negative:

This is about what I’d expect for a tech talk — the majority of sentences are close to neutral sentiment with slightly more on the positive side. Let’s take a closer look at the most positive sentences according to the NL API:

Based on this, the NL API did a good job picking out the positive sentences. It also looks like I could benefit from a thesaurus search for ‘awesome’. And upon further examination I used the word ‘so’ 179 times — yikes!

How ‘cool’ was my talk?

Here are the top 15 adjectives I used:

Some of these adjectives are specific to the technology: ‘real’ goes with ‘real time’, ‘new’ refers to creating a new Firebase instance, etc. Other adjectives could definitely stand for some variation: ‘cool’, ‘easy’, and ‘great’.

What topics did I cover?

One nifty (thanks thesaurus) thing about the NL API’s entity analysis is that it pulls entities even if they aren’t proper nouns and don’t have a Wikipedia URL. Here are the top 15 entities from my talk:

Just from that short list we get a good overview of what I covered in my talk. It focused on Firebase (‘data’, ‘app’, ‘database’, ‘users’, ‘security rules’). There was also a robot demo that made use of the Cloud Vision API. Imagine if I was storing thousands of talk transcripts in a database — this sort of metadata would be very useful.

The NL API was also able to extract proper noun entities from my talk and find the correct Wikipedia page, there were a total of 30:

Except for one (UID), the API was able to find the correct Wikipedia page associated with the entity. I was particularly impressed that it picked up my separate references to both Taylor Swift (the singer) and Swift (the programming language).

What’s Next (literally)

Though slightly embarrassing, this NLP analysis was useful in figuring out how I can improve as a speaker. To run this analysis on your talks, grab the transcript from YouTube and check out the NL API quickstart guide.

If you want to learn more about the NL API and other parts of Google Cloud Platform in person, our Google Next conference is coming up in San Francisco this March. Here are the talks I’m most excited about:

I’ll also be speaking on Google Cloud’s machine learning APIs. Say hi if you’ll be there (even if you just want to count the number of times I use the word ‘so’).

Have thoughts on this NL analysis or questions about Google Next? Find me on Twitter @SRobTweets.


Published by HackerNoon on 2017/01/27