Dr. Sid J Reddy is chief scientist at Conversica and will be speaking at AI with the Best about creating an automated sales assistant that engages in genuine human conversation! I thought I’d ask him for his insights into the world of machine learning, natural language processing and our AI-powered future!

Sid and the Conversica platform

Sid and the Conversica platform

What is machine learning?

I’m always eager to get new explanations of machine learning, as people in different fields and with different expertise often explain it from different perspectives! Here is Sid’s explanation for us:

“Machine learning can be seen as more advanced versions of the principle of mathematical induction that we are familiar from school days in that we observe the values of a function for the variables we encountered (in machine learning parlance — training data), use intuition and experience that comes with solving it (in the machine learning world — decision tree and other algorithms that try various combinations, develop “intuition” of what works and what does not) and finally come up with a formula that predicts the values for upcoming variables (creating a model taught by training data that is fed features or attributes from new data also called test data or production data to predict values).

Folks from statistics, biomedical and social sciences might be able to relate to the similarity between machine learning and statistics. For example, descriptive statistics that is used to describe the population is similar to unsupervised machine learning algorithms such as “K-means” that is used for clustering and inferential statistics that are used to make inferences and predictions is similar to regression algorithms that are used to predict the value of a numeric variable or classification algorithms that are used to predict the class label of a categorical variable.”

When asked what his favourite example of a well thought out machine learning system, Sid spoke of the Turing machine,

“I am always fascinated by the turing machine invented in 1936 but is still fundamentally relevant as a model of computation and bears close resemblance to many modern artificial intelligence systems. This reminds me to always think of the generalizability of the systems we develop as opposed to focusing on the short-term.” — Dr. Sid J Reddy

What do you need to get into machine learning?

People in the field of machine learning and AI are the best to ask about what skills and concepts are most important to learn — they’d know what they actually do each day! So I asked Sid what he thought people needed to know.

“One would need good mentoring, well-defined real world problems and passion for lifelong learning.” — Dr. Sid J Reddy

He then pointed out there are three career tracks for most college-educated people — data scientist, data engineer and data analyst. Here are what Sid says each role needs:

  • Data Scientist — requires a background in probability, statistics, linear algebra, algorithms and a curious mind.
  • Data Engineer — requires solid technical and programming skills and a passion to keep up with constantly evolving big data technologies such as Hadoop and Spark.
  • Data Analyst — requires a mix of Extract, Transform and Load skills using SQL and basic scripting, combined with knowledge of using AI systems set up for them and an incredible amount of patience.

If you’re keen to get into any of those areas, there are some skills you’ll want to work on! Sid also points out that people with cross-over between those skill sets above often are the star performers:

“Most people don’t think of machine learning and NLP in terms of the three broad sets of skills that I described above. One thing that I have personal realization with and have also witnessed is those that have a combination of two of the three skills sets and attitudes become outstanding data scientists and those that have all the three skill sets and attitudes tend to be the exceptional data scientists and go-to star performers.” — Dr. Sid J Reddy

Any fears on the exponential growth of AI and the Singularity?

Of course, when speaking with people who work in the field of AI, it’s always good to see what they think about its exponential growth and the Singularity which some believe will be us rocketing towards our extinction… while others believe will bring humanity to new heights. Here are Sid’s thoughts,

“AI technology in its current form is limited to weak or narrow AI, meaning that it can only specialize in one area and perform singular tasks really well. There is no doubt AI technology is rapidly advancing and that reaching singularity is a real possibility. However, I don’t think that it’s necessarily a bad thing. Like with any new technology, I believe it it is important to discuss ethical considerations and establish standards for best practices.”

“On a philosophical and optimistic note, my hope is that AI will gradually free us, work and earn for us (reverse taxes), care for us, and we will have an increased opportunity to pursue what matters most to us as individuals.” — Dr. Sid J Reddy

How does Natural Language Processing (NLP) work?

It’s amazing to think about how much is involved in every single sentence and phrase we speak. Natural Language Processing (NLP) needs to automatically extract information from a whole range of different sources and from different levels of a sentence, including documents, words, grammar, meaning and context. There can be quite a few sources involved when doing Written Language Processing (also known as “text mining”) and going through lots and lots of text-based sources! Sid explained a bit about what’s necessary at each level:


We start with documents and split them into individual sentences. Sid points out that there are different methods of splitting up sentences, “simple rules such as based on punctuation marks and capital letters could help in achieving a reasonable accuracy, more accurate methods employ statistical techniques.”


The splitting up of sentences into individual words is known as “tokenization”. Commonly, Sid says this has been executed using rules but recently is starting to also use machine learning techniques. Each word is also split into the base form of the word, known as “lemmatization”. Sid explains, “The base form of the word, also known as lemma, is independent of its part of speech or a morphological variation. Stem, usually used as an approximation for lemma, is that part of the word which is common to all inflected forms. For example, from “deduced”, the lemma is “deduce”, but the stem is “deduc”, because there are words such as “deduction”.” A lot of this can be done automatically using methods such as the “Porter Stemmer algorithm” to “strip the suffixes and normalize the inflections automatically.”


There are eight “parts of speech” in English grammar: noun, pronoun, verb, adverb, adjective, conjunction, preposition and interjection. Sid explains that these can be determined using “statistical models that are customized for specific genres of text”.

NLP also needs to understand “phrases” in grammar, which are two or more words that function meaningfully together within a sentence. There are different types of these, including noun phrases, verb phrases, adjective phrases, prepositional phrases and more. Sid explains that NLP works out what these word groupings mean using “phrase chunking and parsing”.

  • Phrase chunking — Sid says this is also known as “shallow parsing” and is “the process of identifying all the phrases that are not nested”. Sid gives the example of “John Smith will eat the beans.” — this has a noun phrase of “John Smith”, a verb phrase of “will eat” and a noun phrase “the beans”. NLP needs to be able to parse each of these in every sentence!
  • Parsing — Sid explains this as “the process of determining the complete grammatical structure of a sentence with respect to a given formal grammar”. Basically, it involves working out things like phrases, which words are the subject or object of a verb… etc. There are different parser methods out there, Sid mentions the Stanford parser which he explains “utilizes phrase structure grammar to represent the output”. In particular, he continues, “In phrase structure grammar, rules are represented by a tree whose nodes are the different phrases in the sentence and the edges indicate relationship between different phrases.” Sid explains that “some other parsers, such as SyntaxNet utilize dependency grammar where a dependency graph is used to represent the output. In a dependency graph, nodes are the different tokens in the sentence and the edges indicate relationship between the individual tokens.”


When it comes to the level of semantics and meaning in our language, Sid says that we’ve traditionally used dictionaries or lists of words and that there’s a newer method called Distributional Semantics that “is helpful to automatically infer the similarity of words using unannotated text data.”


Sid says that the level of pragmatics or context can be worked out by things such as section names.

Challenges today with NLP

Sid explains that “in the current millennium, extensive use of computers and the internet caused an exponential increase in information”. With all this information, “few research areas are as important as information extraction, which primarily involves extracting concepts and the relations between them from free text”. However, there are limitations that make information extraction a challenge at times:

“Limitations in the size of training data, lack of lexicons and lack of relationship patterns are major factors for poor performance in information extraction. This is because the training data cannot possibly contain all concepts and their synonyms; and it contains only limited examples of relationship patterns between concepts. Creating training data, lexicons and relationship patterns is expensive, especially in the biomedical domain (including clinical notes) because of the depth of domain knowledge required of the curators.”

Sid also points out that using dictionary data to work out concepts just isn’t enough due to what human language involves!

“Dictionary-based approaches for concept extraction in this domain are not sufficient to effectively overcome the complexities that arise because of the descriptive nature of human languages. For example, there is a relatively higher amount of abbreviations (not all of them present in lexicons) compared to everyday English text.”

Sid’s NLP approach!

During his PhD, Sid proposed a new NLP approach that used “distributional semantics and sentence simplification”. As he explains,

“Distributional semantics is an emerging field arising from the notion that the meaning or semantics of a piece of text (discourse) depends on the distribution of the elements of that discourse in relation to its surroundings. Distributional information from large unlabeled data is used to automatically create lexicons for the concepts to be tagged, clusters of contextually similar words, and thesauri of distributionally similar words. These automatically generated lexical resources are shown to be more useful than manually created lexicons for extracting concepts from both literature and narratives. Further, machine learning features based on distributional semantics are shown to improve the accuracy of state of the art systems, and could be used in other machine learning systems.”

“Dictionary-based approaches are not sufficient to effectively overcome the complexities that arise because of the descriptive nature of human languages. Supervised machine learning based approaches offered a promising alternative. All supervised machine learning algorithms such as Conditional Random Fields require a training set labeled with concepts. Since such methods are statistical, a large corpus with as many relevant examples as possible yields an accurate system. My PhD work used unannotated data from large amounts of texts to design a semi-supervised machine learning approach for the purpose of extracting concepts. I constructed a vector-based similarity model using Random Indexing which is much faster than previous methods and is thus scalable to huge unannotated corpora and will promote widespread use of unannotated data for the task of clinical concept extraction.

I also worked on a limitation in association extraction to deal with sentences that are significantly more complex than those in general. My approach to minimizing the limitation in association extraction was simplifying sentences and then using the simpler sentences for extraction of associations between concepts and thus improving the overall accuracy.”


Today, Sid spends his time as chief scientist at Conversica, a company that provides a cloud-based conversational artificial intelligence platform. Their flagship product is the Conversica AI Sales Assistant, which Sid says took about 7 years to develop and over 240 million messages to train. Sid explains that “the assistant improves sales efficiency by automating the initial, key stages of the sales cycle. Through personalized, two-way email or SMS conversations, the sales assistant automatically engages and qualifies leads, then expertly hands those leads off to a salesperson to close the deal – and even follows up afterwards to ensure a great experience.” Overall, the goal was to “help sales teams set more appointments from a pool of leads and ultimately close more business in less time”.

What does a “chief scientist” do at a place like Conversica? Sid explains,

“My role here as a Chief Scientist is to constantly make sure we are using and beating state of the art in natural language processing, classification and generation and help Conversica become the category leader in applying artificial intelligence for business conversations.” — Dr. Sid J Reddy

AI with the Best

Sid will be speaking at AI with the Best, the world’s biggest online AI conference for developers! Attend from anywhere in the world! He’ll be covering their “artificial intelligence approach to creating, deploying, and continuously improving, an automated sales assistant that engages in a genuinely human conversation, at scale, with every one of an organization’s sales leads”. They’ll be covering the benefits of these systems, including potentially hiring more people — not less — and explore why the assistant has been successful so far in improving sales, marketing and business intelligence gathering. Sounds like a pretty fascinating talk!

Thanks to Sid for his time and for sharing some of his knowledge around AI, machine learning and Natural Language Processing! Get yourself a ticket for AI with the Best and see his talk!

Thanks for reading! Dev Diner is a new hub for developers keen to keep up with emerging tech.
Know others who might want to read it too? Please like and share this post with them!

Would you like to republish this article in your own publication?
Contact Dev Diner to request official republication of this article.

Leave a Reply

Your email address will not be published. Required fields are marked *

Want more?