machine learning

Realities of AI in Software from [:] futureDev Meetup, Part I

Picture of Peter Farago
Peter Farago, Friday July 21, 2017

Rainforest QA recently held its latest installment of [:] futureDev, a San Francisco Bay Area meetup that gathers software development and other leaders to discuss top issues in technology.

This meeting’s topic was "Realities & Challenges of AI in Software" and for good reason. Everywhere you turn, it seems like yet another company boasts how AI is the new way forward. But with so much hype, few are sharing the challenges, realities and limitations of AI in software development and startups. We assembled a panel of AI experts to help cut through what’s working and what isn’t as they apply their expertise across their respective companies.

Realities of AI FutureDev 3

The panel boasted a lineup of deep AI experts, who are all actively working in industry: Michael Kim PhD, CTO & Cofounder of Outlier AI; Jake Klamka, CEO & Cofounder of Insight; Alex Jaimes PhD, Head of R&D for DigitalOcean; Clare Corthell, Data Product Manager for Clover Health; and Maciej Gryka PhD, Head of Data Science at Rainforest QA.

VentureBeat participated as a media partner and the panel was moderated by Blair Hanley Frank, AI and Cloud Reporter for the technology news outlet.

AI: What’s in a Name?

This may seem like common sense, but AI is interpreted differently by different people. Especially because AI is so hyped today as a topic, with many companies claiming they have investments in the field, there are varying definitions. The panel began by discussing what AI means to them. And even though our panelists were all deep experts, academically and through practical application, there were some different interpretations.

Jake Klamka, CEO from Insight, explained the “AI is a broad field with a lot of sub-subsystems.” And even though there exists a more purist definition around “the very new development in neural networks, deep learning, natural language and vision,” in fact, “very few teams work on this.” He concluded that the more broad application is about “data teams that use machine learning to drive business and product results.”

Maciej Gryka, Head of Data Science for Rainforest QA, agreed that some of the confusion in how to define AI stems from the fact that it’s “an old field that has been popularized by a recent explosion of activity.” For him, the definition is more simple: “It’s just about machines making decisions. That’s the heuristic I use. There may be simple ways of machines making decisions that some may not actually consider AI. Neural nets definitely fall into AI, where more complex decisions are being made, and take a lot of things are taken into account – ‘Random Forest’ is a good example.” But for Maciej, if a data science effort includes “a simple regression or if-then statement, it can be understood as AI in some context.”

Realities of AI FutureDev panel 1

Which Comes First: The Data or the Scientist?

As companies look to take advantage of AI, where should they start? Views ranged from "it’s never too early to hire a data scientist" to cautionary tales of waiting until well after a critical mass of data has been collected. Since AI depends on machines making decisions based on data, aspirationally big data, gathering and understanding that data is a key step. Clare Corthell, who consulted on data science prior to her current role data product manager role at Clover Health, explained that for many companies the “chicken or egg is a common dilemma. Lots of product leaders need help assessing whether the data they already had was useful to business or could open up a new product.”

Mike Kim, CTO and cofounder of Outlier, shared a "mistake" he had made earlier in his career that he implored others avoid; namely, hiring data scientists too early. As he led data science at a previous company, and anticipated exponential growth in his data set, he hired a ‘world class’ machine learning expert. Unfortunately, he later realized that he would not have enough data to truly leverage that person’s skills for years. His advice was to “make sure you have the scale of data and understand the type of problem you’re trying to solve before hiring a person to build and manage the platform and pipeline.”

Alex Jaimes, Digitial Ocean’s Head of R&D, quickly countered. “It’s never too early to get started. It’s more about hiring the right profile.” Addressing the chicken-or-egg dilemma, he explained while it’s true that “if you don’t have the data, you can’t do too much machine learning.” But he also added that “if you don’t have the expert in machine learning, you don’t even know what data you have.” Overall, he explained that a top-down strategy needs to be considered from early in a company’s history, so that decisions about collecting the right data, how it can help the business and who can work with it are thought about as early as possible.

Mike further added that if a company “is only collecting 2,000 data points a month they don’t need to start staffing PhDs,” as a lot of press stories seem to suggest. Alex elaborated that “the biggest challenge that most early stage startups have is that they don’t understand the value of the data, and the work, beforehand.” Companies often don’t understand the need until they start getting traction, which can be too late. This can be avoided by viewing “data science as an expense, not an investment.”

Realities of AI FutureDev panel 2

The Importance of Labeled Data

As the conversation shifted to misconceptions companies have about implementing artificial intelligence and machine learning, the topic of data management emerged. Maciej from Rainforest explained that a lack of understanding about how having the right kind of data “bite people.” Having awareness of just how valuable data is, especially training data, is key. “Most AI these days is about supervised learning” which is when labeled training data is used to infer a new function through machine learning. “So that means you not only have to have data, but you also have to know what it means. Labeling your data points takes a huge amount about effort.” Maciej pointed to examples where it can take years to manually label data in order to create a useful data set.

Underscoring Maciej’s point, Mike from Outlier added that “if you have a supervised machine learning problem, 80% of the output is dependent on how good your data is before you start. If you don’t have good data out of the gate, you’re toast. Data trumps everything else that you’ve got. Then it’s how well you tune the parameters of whatever decision that you’ve chosen.”

Maciej further expanded that companies like Google and Facebook have done a great service to the community by releasing massive data sets anyone can use. However, while this can be generally helpful, he cautioned that it can only really help your company “if your problem is similar enough to one of the previous problems” the data was used to solve. “If your problem is novel, then you have to go through that expensive process to gather labeled data.”

Maciej emphasized the importance of data, arguing that “labeled data is the new oil. Everyone wants it. There’s a lot of competition to get it and it’s an expensive resource that is necessary to do just about anything.”

While AI is an emerging field, it’s already being practically applied to many businesses, including Rainforest QA. Understanding the practical challenges and realities, as well as the purpose for mounting an AI effort is paramount. In the next installment we’ll learn what the panelist thing about the future of AI, bias in data sets and whether robots can in fact replace humans.

To learn more about what our panelists are working on, check out their profiles below: - Clare Corthell, Data Product Manager, Clover Health - Michael Kim PhD, CTO & Cofounder, Outlier AI - Jake Klamka, CEO & Founder, Insight - Maciej Gryka PhD, Head of Data Science, Rainforest QA - Alex Jaimes PhD, Head of R&D, Digital Ocean.

Filed under: ai, machine learning, and future dev