All the code appearing in this post is available as part of the dython library on my GitHub page.
For any code related questions, please
open an issue on the library’s GitHub page.

Not long ago I stumbled across a data-set of mushrooms on Kaggle, where over 20 different features of edible and poisonous mushrooms were collected and sorted into categories. …


The deep dive into the world of model assessment metrics that you didn’t know you needed to know — until now

The code used to generate the graphs and the KS Area Between Curves computation in this post is available as part of the dython library

Image by lloorraa from Pixabay

Here’s a classic scenario you probably ran into as a Data Scientist — you need to train a binary classifier, say a simple logistic regression, over a rather small and imbalanced data — for example: CTR prediction (Click-Through Rate, meaning — will a user click on an ad or not). So again, you don’t have a lot of data, and it’s highly imbalanced, somewhere around the negative/positive ratio of 1000:1.

Your goal, then, is to…


“We can’t be consumed by our petty differences anymore. We will be united in our common interests”

Image taken from Jason’s Movie Blog

One of my favorite movies as a kid was Independence Day (here in Israel it was known as The Third Day, as it takes place on July 4th, and, well, most of the world doesn’t celebrate independency on that day). Briefly, it tells the story of an alien invasion to Earth, and how pretty much the entire human race unites in its do-or-die battle against the extraterrestrial colonialists. The fiery speech of President Bill Pullman still echos in my head: “The 4th of July will no longer be known as an American holiday, but as the day when the world…


Breaking down my reading list of what’s new in the AI world

Image by Free-Photos from Pixabay

The field of Machine Learning and Artificial Intelligence is changing rapidly. Five years ago, classical Machine Learning was the hottest trend; now it’s just like an iPhone 6S — outdated. Deep Learning dominates the market these days, and if you’ll come back to this post in 2025, there’s a good chance we’ve moved way past beyond this (self-note: I bet on Deep Reinforcement Learning).

Being a data scientist requires you to keep up with the latest innovations and discoveries, but there’s so much information coming in from so many directions, it’s easy to get lost in the stream. So what…


Answering frequently asked questions by junior data scientists and data scientists-to-be about first steps towards a DS career

Photo by StartupStockPhotos from Pixabay

In case you missed it, there’s a pandemic out there, and it forces all of us to shut down all public events. As time goes by, we all begin to understand the impact of the lockdowns, social distancing and absence of gatherings. One of the things we realized, and by “we” I refer to the Algo group at Taboola, where I work, is the impact this has on those who are just beginning their career path or are about to shift it.

We used to host and attend many data science meetups and conferences, and noticed that many junior data…


Make your awesome code conveniently available to the world, because you’re awesome too

Original image by Vadim_P from Pixabay

This blogpost is now available in Polish too, read it on BulldogJob.pl

About two years ago I published my very first data-science related blogpost. It was about Categorical Correlations, and I honestly thought no-one will find it useful. It was just experimental, and for myself. 1.7K claps later, I’ve learned that I cannot determine what other people will find useful, and I’m quite happy I can assist others on the web like others on the web assist me.

I was also quite new to Python and Github at that time, so I also experimented with writing the code to these…


Learn how to use ROC curves and AUC scores for more than just saying “I think this model performs well”

The ROC graphs generating code used in this post is available as part of the dython library, which can be found on my GitHub page. Examples seen in this post are also available as a notebook.

ROCking hard (original image by Nadine_Em from Pixabay)

Assessing the predictions of any machine-learning model is probably the most important task of a Data Scientist — perhaps even more than actually developing the model. After all, while building super complex algorithms is the coolest thing, not knowing how to estimate their output properly is not the coolest thing.

There are several algorithms and tools dedicated to allowing a clearer view of how…


Taking another step towards truly understanding the basics of Reinforcement Learning

Implementations of all algorithms discussed in this blogpost can be found on my GitHub page.

The Qrash Course Series:

  1. Part 1: Introduction to Reinforcement Learning and Q-Learning
  2. Part 2: Policy Gradients and Actor-Critic

The previous — and first — Qrash Course post took us from knowing pretty much nothing about Reinforcement Learning all the way to fully understand one of the most fundamental algorithms of RL: Q Learning, as well as its Deep Learning version, Deep Q-Network. Let’s continue our journey and introduce two more algorithms: Gradient Policy and Actor-Critic. …


How to perform valuable exploration for brand new items with nothing more than a simple trick on our items’ embeddings

This blog post was originally published on Taboola’s Engineering Blog.

Our core business at Taboola is to provide the surfers-of-the-web with personalized content recommendations wherever they might surf. We do so using state of the art Deep Learning methods, which learn what to display to each user from our growing pool of articles and advertisements. But as we challenge ourselves manifesting better models and better predictions, we also find ourselves constantly facing another issue — how do we not listen to our models. Or in other words: how do we explore better?

As I’ve just mentioned, our pool of articles…


When interview-like challenges attack in real life, we save the day with math!

This blog post was originally published on Taboola’s Engineering Blog.

If you happen to write code for a living, there’s a pretty good chance you’ve found yourself explaining another interviewer again how to reverse a linked list or how to tell if a string contains only digits. Usually, the necessity of this B.Sc. material ends once a contract is signed, as most of these low-level questions are dealt with for us under-the-hood of modern coding languages and external libraries.

Still, not long ago we found ourselves facing one such question in real-life: find an efficient algorithm for real-time weighted sampling

Shaked Zychlinski

Algorithm Engineer at Taboola. Lives in Tel Aviv, Israel. Actual beard may vary. See me on shakedzy.xyz

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store