Hands-on Tutorials

Analysing non-linear relationships with partial dependence plots (PDPs), mutual information and feature importance

When you first start driving you are less experienced and, sometimes, more reckless. As you age, you gain more experience (and sense) and it becomes less likely that you’re involved in an accident. However, this trend won’t continue forever. When you reach old age your eyesight may deteriorate or your reactions may slow. Now, as you age, it becomes more likely that you’re involved in an accident. This means the probability of an accident has a non-linear relationship with age. Finding and incorporating relationships like these can improve the accuracy and interpretation of your models.

Source: Author

In this article, we will…


GETTING STARTED

Creating your first sentiment analysis model with Python

With 11,768,848 comments, ‘Dynamite’ by BTS is the most commented video on YouTube. Suppose a BTS member wanted to know how these listeners felt about the song. Reading a comment per second, it would still take him over 4 months. Luckily, using machine learning he could automatically label each comment as positive or negative. This is known as sentiment analysis. Similarly, through online reviews, survey responses and social media posts, businesses have access to large amounts of customer feedback. Sentiment analysis has become essential to analyse and understand this data.

Source: flaticon

In this article, we’ll go through the process of building…


Human nature is hard to predict. Even more so, when we are trying to be unpredictable

Source: flaticon

Machine learning can be used to predict if a tumour is benign or malignant. Now, imagine if the tumour was conscious and could change its appearance to avoid detection. This suddenly becomes a much harder problem to solve. A self-aware tumour may be a bit of a contrived analogy for a fraudster but the problems faced by fraud detection systems are similar. You are trying to predict behaviour that can actively change to avoid prediction. We’ll discuss this idea in more depth starting with what fraud is and how we can predict it. …


An introduction to manipulating machine learning models

Machine learning models are complicated things and, often, we can have a poor understanding of how they make predictions. This can leave hidden weaknesses that could be exploited by attackers. They could trick the model into making incorrect predictions or give away sensitive information. Fake data could even be used to corrupt models without us knowing. The field of adversarial machine learning aims to address these weaknesses.

Source: flaticon

In the rest of this article, we’ll explore this field in a bit more depth. We’ll start by discussing the types of attacks that adversarial ML aims to prevent. We’ll then move on…


With great power comes great responsibility

For some, the term Artificial Intelligence can provoke thoughts of progress and productivity. For others, the outlook is less positive. Many concerns such as unfair decisions, workers being replaced, and a lack of privacy and security are valid. To make things worse, many of these issues are unique to AI. This means existing guidelines and laws are not suitable to address them. This is where Responsible AI comes in. It aims to address these issues and create accountability for AI systems.

Source: flaticon

Why we need Responsible AI

When we talk about AI, we usually mean a machine learning model that is used within a system to…


The 6 benefits of writing data science articles

Today marks one year since I posted my first data science article on Medium. It did surprisingly well and that initial success gave me a lot of motivation. I’ve posted 11 other articles since. Unfortunately, it became obvious I’d fallen victim to a bit of beginner's luck when the others did not do as well. Even so, I’ve still managed to write and post articles consistently. This is because I am not only motivated by views but also by the many other benefits of writing.

Source: flaticon

Learn and improve technical skills

The first benefit is that it has helped me to learn more about data science…


Using machine learning to predict if you are good at coding

If you’ve ever applied for a technical role you’ve probably sent the company a link to your GitHub profile. The information on this profile can give a good indication of your coding ability and fit within a team. The downside to all this information is that it may take a recruiter a long time to assess it. To save time, machine learning could potentially be used to automatically rate your coding ability.

source: flaticon

In this article, we walk you through the process of building such a model. We discuss how we collected data from GitHub and created model features using this…


Fairness and Bias

An introduction to the field that aims at understanding and preventing bias in machine learning models

At first, the concept of an unfair machine learning model may seem like a contradiction. How can machines, with no concept of race, ethnicity, gender or religion, actively discriminate against certain groups? But algorithms do and, if left unchecked, they will continue to make decisions that perpetuate historical injustices. This is where the field of algorithm fairness comes in.

Source: flaticon

In this article, we will explore the concept of model bias and how it relates to the field of algorithm fairness. To highlight the importance of this field, we will discuss examples of biased models and their consequences. These include models…


Analysing interactions using feature importance, Friedman’s H-statistic and ICE Plots

The side effects of medication can depend on your gender. Inhaling asbestos increases the chance of lung cancer more for smokers than non-smokers. If you are more moderate/liberal your acceptance of climate change tends to increase with higher levels of education. The opposite is true for the most conservative. These are all examples of interactions in data. Identifying and incorporating these can drastically improve the accuracy and change the interpretation of your models.

In this article, we explore different ways of analysing interactions in your dataset. We discuss how to use scatterplots and ICE Plots to visualise them. We then…


How to create time-series choropleths of US election results

This US election has brought with it high tensions, unfounded fraud allegations and, most importantly, some great visualisations. Well, important to data scientists at least. It seems like you can’t look anywhere without seeing some novel way of presenting election results. So why not add a few more to the mix? In this tutorial, you will learn how to create some of your own visualisations using Python.

Source: Author

You will learn how to create two interactive choropleths of US presidential election results from 1976 to 2016. The first map has a time slider. As you move the slider, the map will…

Conor O'Sullivan

Risk Data Scientist — Building credit risk and fraud models for the man. Exploring AI topics for myself.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store