By Mohamed Chaouchi and Anasse Bari
The three presidential debates during the 2016 election year were of historical importance. This election campaign is likely to shape the political discourse for the country by unprecedented proportions. The stakes are high and candidates unique.
For the very first time in US history, a major party nominated a woman as its presidential candidate. A woman who has been in politics all her life: a first lady, a senator from New York, and Secretary of State, Hillary Clinton. Her resume is so heavy, and experience so loaded and shiny, we wonder why she isn’t President already. On the opposite ticket, the billionaire real-estate mogul Donald Trump, a heavyweight candidate in his own right. He defeated all the main stream, party-backed candidates, single-handedly and by a landslide.
So we tuned in, as citizens, and also as data scientists. We followed all their debates, we prepared our computer programs to collect tweets, we analyzed the transcripts, and we kept an eye on Google trends during each debate. The goal was to analyze the data and hopefully find some nuggets we could share with our readers. Here is what we found:
First, a few words on our method. We collected tweets related to each debate at near real-time one hour before and during the debate. We ran sentiment analysis algorithms on the tweets to learn more about the mood of the electorate. We also used what was said by the candidates during the debates through the analysis of the full transcripts of their responses. It turned out that “words matter”, as Hillary Clinton put it. As Nate Silver and many data scientists before him are fond of saying, the small sample we collected was noisy, and some of the tools we used are still not mature enough.
For example, in this small experiment, we fed transcripts of each debate with each candidate’s response to the IBM Watson personality analysis application, and to our surprise, the results were almost identical. The same was true when we combined each candidate’s response for all three debates, and ran the personality analysis on them. Both candidates were judged similarly. In fact, no significant differences emerged between the Donald Trump and Hillary Clinton. We know this to be false because the personalities of the two candidates could not be more different. The sentiment analysis on those transcripts did not get us any further, either.
The sentiment analysis on the tweets showed the percentages of tweets with positive, neutral, and negative mood in them were unchanged for Trump across the three debates. Tweets with positive mood about Clinton substantially increased during the last debate while the percentage of the tweets with negative mood decreased markedly.
A simple word count on the candidate’s transcripts for the three debates revealed a few things of interest and Google trends showed us something of value as well. For Hillary Clinton the three most frequently used words in her debates were: “people”, “think”, and “Donald”. While Donald Trump used the word “going”, “people”, and “country”
Clinton greeted Trump at the beginning of the first debate saying: “How are you, Donald?” The third most frequently used word by Clinton, across all three debates, was “Donald”, while any reference to Clinton either by name or by political title did not register among the most frequent words for Trump. We can only venture that it was deliberate for Clinton to address Trump by his first name not for familiarity sake, but to show disdain and discredit him from an asset he is so proud of: his last name.
The Wall Street Journal published an article detailing 13 topics they will be monitoring during the debates. For each topic they listed relevant terms associated with that topic. We used that list to analyze the transcript for each debate. The following graph summarizes our findings:
It is worth noting that Foreign Policy was prominent in all three debates, with the Economy as a distant second. The Immigration was widely covered during this last debate. Equally important, many topics were not at the center of any debates, notably Environment and Education.
During the last debate Google trends were sharply marked by a huge spike of queries related to Donald Trump at exactly 10:16 pm. During that time, Hillary Clinton asked the American people to Google Trump’s support for the Iraq invasion, and they flocked to answer that call.
During the last debate, the word “bigly” was trending, well bigly, in Google. It remains to be seen whether this is a fad, or the word will make it into our daily lexicon.
Lastly, while taking snapshots of Google trends during each debate, we were surprised to find that people across the United States were searching for Hillary Clinton and topics associated with her, while the rest of the world, namely Canada, South America, Europe and Asia were searching for topics related to Donald Trump. The world may be preoccupied by, or just wants to know more about, Trump. While here at home, perhaps people are getting ready to embrace the next President of the United States of America.
About the Authors:
Mohamed Chaouchi has conducted extensive research using predictive analytics and data mining in both the health and financial domains. Mohamed holds a patent for a data-mining platform to analyze cancer development. Mohamed has over 15 years of experience in software development and project management in the public and private sectors. He is an application architect, responsible for building software applications with high business impact and visibility. His technical expertise includes service-oriented architecture, web services, and application security. Mohamed holds a master’s degree in computer science from the George Washington University. Connect with him on LinkedIn.
Dr. Anasse Bari holds a Ph.D. in Computer Science with a focus on Data Mining. Dr. Bari is a clinical assistant professor of computer science at New York University, Courant Institute of Mathematical Sciences. He was previously professor of computer science at George Washington University where he was awarded with the computer science professor of the year award in 2014, and was recognized by the Carnegie Foundation for the Advancement of Teaching for his nomination for the United States Professor of the Year Award in 2015. Anasse is a renowned speaker and his research has been focused on predictive analytics, data mining, and information retrieval. Connect with him on LinkedIn.
Chaouchi and Anasse co-authored Predictive Analytics For Dummies, a second edition of which is coming out on November 7, 2016.
Latest posts by Timothy King (see all)
- 5 Data Quality Tools Vendors to Watch in 2018 - October 19, 2017
- Gartner Names Winners of 2017 BI & Analytics Customer Choice Awards - October 17, 2017
- TIBCO and Cisco Agree to Data Virtualization Merger - October 13, 2017