Checklist 84: Machine learning. Deep learning. Neural networks. Artificial intelligence

April 12, 2018 • 15 min read

0:00 / 0:00

Machine learning. Deep learning. Neural networks. Artificial intelligence… it’s starting to feel like we’re in an Isaac Asimov novel! For this week’s episode of The Checklist, we’re diving in to the latest ways computers and technology are being trained to complete complex tasks, make inferences, and do more overall. From Google training computers to identify specific objects in images to the way anti-malware software figures out how to detect and stop new variants, there are plenty of cool applications for this innovative technology. However, there are some pretty creepy things machine learning can do, too — and it could pose a threat to your privacy and security in the future. On our list for today’s discussion:

What is machine learning?
Using machine learning to fight malware
How this technology enters our daily lives
The creepy side of machine learning
Machine learning in the real world: what the future holds

Let’s kick things off with a quick look at what it means when we talk about computers “learning.” It’s not quite a classroom full of server racks, but it is an incredibly interesting new development in technology.

What is machine learning?

It’s helpful if we start off with a basic definition to work from as we unpack what machine learning is and what it means. As you might imagine, this is a complex field that grows more rapidly each year, and there’s no central authority overseeing research and development. Instead, many teams work on different projects, using similar ideas and frameworks but always chasing that next innovation.

So, let’s look at it this way: machine learning is about creating algorithms for computer systems that can see and select patterns in information based on previously encountered information. In other words, after “learning” what certain patterns (such as the arrangement of features on a human face) look like generally, the algorithm can interpret new data based on what it already “knows.” Instead of simply creating a program that has a built-in answer for every possible input, we create programs that know how to find the answer from past experience. As an ML algorithm receives more “training,” it can quickly begin to develop and improve the accuracy of the answers it returns thanks to this pattern recognition software.

If it sounds like science fiction, it’s because a few decades ago, it very well might have been. Today, machine learning is growing very sophisticated — and it’s often mixed up with and called “artificial intelligence” in many contexts. That’s a bit of a misnomer, though, as machine learning systems cannot draw original conclusions based on a given situation, as a game-playing program such as Google’s AlphaGO can. It can only improve itself based on the data sets its creators use to “train” its “brain.” These systems can draw broad, general conclusions from some very specific data, but there’s no need to worry about Skynet going active tomorrow—at least, not yet.

Machine learning encompasses a broad field, though, and not every type of ML algorithm works in the same way. Typically, they break down into one of several categories based on how they learn and their ultimate purpose. Let’s discuss a few of the main types before diving into some of the ways in which we can use ML for very important applications.

The type we’ve just been talking about might best be called a “classification” type of machine learning, where the algorithm learns how to identify pattern data based on a series of control subjects. For example, facial recognition algorithms learn as researchers show them a series of portraits. A shopping suggestion algorithm might learn what purchases or shopping behaviors indicate that a customer is a suitable candidate for a specific discount or account upgrade. This is a very controlled learning experience with specific desired outcomes.

Clustering is similar but does not begin with a clear control group of data. Instead, algorithms are fed very large data sets and learn to group related items together based on characteristics, even though the algorithm does not “know” what actual relation between the objects. Neural networks function on a similar principle in-between classification and clustering, but instead of a single algorithm, it is a group of “nodes” that work together to learn in concert, almost like an artificial brain.

In another form of ML called regression, algorithms learn to make numbers-based predictions based on historical data. If you’ve ever looked at your home on the real estate website Zillow and seen their “Zestimate” for your home’s value, then you’ve seen a product of machine learning in the wild. There’s also anomaly detection, which is similar to playing a game of “Spot the Difference” — except it’s a computer learning to identify strange deviations in massive data sets. Setting up an anomaly-detection algorithm is a definite priority for the scientists out there searching for extraterrestrial life. It’s obvious all these different types of learning could allow us to accomplish a lot more than we could before — and it’s already poised to revolutionize some industries, like our very own security sector.

Using machine learning to fight malware

Machine learning is a major part of the next wave in the development of anti-malware tools, much the same way that signature analysis changed the game many years ago. In the past, we primarily looked for malware by trying to detect its “signature” on a machine — key components that stayed the same or snippets of offending code. By comparing files to a database of known malware signatures in the form of virus definitions, anti-malware programs can easily identify many common strains of malware old and new.

Today, this is still one of the most important tools in fighting the bad guys simply because it’s effective enough to catch many of the most common threats. As malware evolves and grows more sophisticated, though, it’s not enough to keep up — and that’s where machine learning tools become very important. Remember that some ML algorithms can make general conclusions from specific data, and others have the ability to class objects together without knowing their features at the start. That means that with well-trained ML systems and powerful enough hardware, we can start predicting where an exploit in a system could arise or how to identify new malware threats.

First and foremost, though, machine learning can allow us to take a lot of the grunt work out of security analysis. With so many different types of malware, adware, and other threats popping up on the web every day, it would take far too much time and effort for researchers to unpack and probe each one of them. Not only is that usually not necessary (many variants emerge simply to evade previous detection methods but are fundamentally the same), but it would be a drain on resources too. Instituting machine learning to identify, categorize, and classify malware into existing family types can help researchers focus on bigger or novel threats while still enabling users to remain protected from these threats. An anti-malware program might even use machine learning to detect and stop irregular behavior when a user encounters a threat, even if the algorithm doesn’t understand yet that it is a known threat.

In the latest advancements coming out over the past few years, we’ve even seen efforts to create ML systems that can identify zero-day malware threats and stop them before they do any damage. Think about that: researchers are pioneering ways to stop malware before we ever know the security loophole it’s exploiting even exists. ML can be used to help us identify potential zero-days present in existing code as well, opening a whole range of possibilities for stronger security going into the future.

Of course, there are always going to be bad guys looking to find a way around these tactics, and we’ve seen some of these methods used recently. Fooling these efforts would focus on either tricking the system with learning what led the algorithm to draw spurious conclusions, or using ML components of their own to test anti-malware engines for areas where its detection methods might fail. So far, these counterattacks don’t work well and are in their earliest stages. While it often feels like we’re stuck in an arms race with malicious hackers, ML can give us a leg up on the “competition.” Of course, it can do more than that—

How this technology enters our daily lives

Machine learning concepts are awesome for helping us detect malware thrown at us by bad actors, but it’s good for a lot of other things, too. It’s why you’ve probably noticed so much more buzz about “AI” in the media these days. From aiding in cancer diagnosis to driving our cars, machine learning is set to power a great deal of the world’s next biggest technological revolutions. To understand just how far these concepts can go outside the world of anti-malware crusades, let’s consider a few of the ways these smart machines are inching their way into our daily lives.

Perhaps one of the most memorable appearances of ML concepts — and the one most people would be familiar with — is IBM’s Watson system. Remember the time a computer won a game of Jeopardy! on national TV? Though Watson isn’t entirely based on machine learning (it’s just a component of a much more complex system), it uses many of the concepts to understand human language, parse its content, and return replies based on the input. Today, Watson technology appears in many places that have nothing to do with answering trivia questions. From identifying cancers and assisting in difficult diagnoses and formulating treatment plans to helping tax preparers find every possible deduction, it’s entering many industries. It’s perhaps one of the biggest and best examples of what this tech can do in a mature state.

Self-driving cars are increasingly coming to rely upon ML as well, rather than rely on pre-programmed instructions and behaviors. In one recent experiment, a self-driving car successfully navigated a real-world road course after spending a large amount of time observing controlled situations of humans driving. With the constantly evolving scene on busy roadways, it’s imperative these systems can recognize issues and make changes extremely quickly. Of course, there’s also the question of how we teach self-driving cars the importance of safety regarding human life—but that’s another challenge altogether.

We also see ML at work in systems that develop the ability to play complex games from many sessions or observation periods. AlphaGO could routinely defeat the world’s highest-level Go players, playing in a way that its human opponents found practically impossible to counter. While chess was “solved” many years ago, it is still often used to train ML systems. These algorithms learn to play without explicitly learning the actual rules. Even pros at Texas hold’em poker, a game with a huge amount of variance and many potential scenarios, have recently fallen in defeat to ML-based software.

This tech also powers the latest in language translation technology, and it even helps Netflix tell you what you might want to watch next. Amazon and others use similar recommendation systems, and for the most part, these personalization efforts don’t cross the line from helpful to creepy. However, it doesn’t take much effort to step into a territory where we might start to wonder about the ethics in play.

The creepy side of machine learning

Wouldn’t it be nice if one of these cool innovative technologies emerged and didn’t bring with it the potential for security problems and invasions of privacy? Unfortunately, we don’t live in a perfect world — and machine learning can potentially be used in nefarious ways for some pretty creepy stuff. It’s not just the bad guys who use it, either; big companies such as Facebook invest heavily in finding ways to apply ML concepts to their websites. Some governments, such as China’s, are also looking for ways to use it for applications such as “reducing crime” or implementing censorship. So, what do you need to know about the dark side of these learning processes? For now, it’s mostly a privacy problem, although as we mentioned earlier, it’s possible that hackers could one-day use ML concepts themselves to develop and deploy stronger attacks.

One of the biggest stories in the news recently that has machine learning behind the scenes has to do with Facebook. It hasn’t been a secret for some time that Facebook employs ML in a variety of capacities, especially when it comes to identifying objects and people in photos. For years, uploading a photo to Facebook meant that it automatically detected the faces of those in it, so the uploader could easily and quickly “tag” individuals. Now, Facebook has decided to go a step further — and it’s not just identifying faces, it’s associating those faces with their owners.

In other words, whenever someone uploads a photo on Facebook in which you appear, the company’s ML algorithms identify your face, and draw a connection directly to your actual Facebook account. Since it already knows you from your own pictures, the system looks for you wherever else you may appear, then lets you know. The purpose, Facebook says, is to make it easier for you to tag yourself or to know when someone has uploaded pictures without your permission. It sounds like a fine reason, but it appears they are intruding on our privacy in the name of protection.

The good news is that you can disable this feature. Log in to your Facebook account, visit the Settings page, and locate the new “Face Recognition” menu. Here, you can direct Facebook not to attempt to recognize your face in other photos. While it is nice that there is an opt-out feature, it still means the company has been scanning millions of photos and storing our likenesses in a digital format for some time. They’ve even been slapped with a lawsuit over it — and the company continues to lobby for laws that would exempt facial data from other biometric protections.

What about the China story we mentioned a moment ago? Recently, motorists stopped at a highway checkpoint in the giant Asian nation were greeted by police wearing special glasses — glasses which could identify faces or license plates and correlate them to a known list of wanted suspects. ML algorithms underpin this technology. While it was only a small-scale test, one can easily imagine a future where abuse of such a technology could easily occur. Similar concerns about large-scale facial recognition exist elsewhere, too.

Machine learning in the real world: what the future holds

So where do we go from here? We’ve heard everyone from billionaire Elon Musk to theoretical physicist Stephen Hawking (may he rest in peace) sound off some dire warnings about the influence of artificial intelligence on the future, and ML concepts are certainly one of the big stepping stones on the road to developing more advanced AI. It’s hard to predict exactly what we’ll see two, five, or even ten years from now. We could accept that these things are a regular part of daily life, rather than seeing them as some groundbreaking invention.

That said, the less savory aspects of what it can do mean we should exercise caution. It’s not fun to talk about rules and regulations, but should Facebook be able to search your identity in someone else’s data by default? What about the risks of ML algorithms that learn how to create text that sounds as if someone else wrote it themselves? Imagine how easily a phisher could fool someone into believing they’d received a legitimate email. Similarly, the potential for manipulation of opinions and conversations on sites like Twitter is very real. As machine learning concepts continue to develop, there may come a time when it’s necessary to put some serious rules in place to ensure nothing runs amok and causes unforeseen, and potentially very impactful, problems.

This field has the potential to revolutionize many areas of our lives. Unfortunately, it’s not always for the better as we’ve seen — from dystopian facial recognition glasses to Facebook looking for you in every photo that hits their servers, we need to cautiously watch the way these efforts develop over the next few years. Where it goes from here is truly anyone’s guess — but it can’t hurt to hope for the best and plan for the worst.

That concludes our discussion for today. Do you have questions about machine learning, or think you know of a related topic that would be excellent for its own episode? Let us know by sending us an email at Checklist@SecureMac.com — we’re always excited to hear from our listeners, and your feedback is appreciated as well.

Want to check out our previous episodes, like the time we covered how Facebook builds “shadow profiles” of users who don’t have profiles yet? That episode, and every other edition of the show is always available right here in our archives next to detailed show notes so you can quickly catch up on what you missed.