An Overview of Each Approach

Outliers are a regular problem for data scientists, much like dealing with missing values. There are various popular methods for identifying outliers, such as John Tukey’s fences, or the standard score method. My focus here is not on algorithms for identifying outliers, but on algorithms for dealing with them once they’re identified. Two standard approaches are trimming and Winsorizing. Trimming amounts to simply removing the outliers from the dataset. Winsorizing, on the other hand, amounts to changing the value of each outlier to that of the nearest inlier.¹

Random distribution Winsorized at the 5th and 95th percentiles.

Sometimes the term “Winsorizing” refers to the more specific method of clipping…


A High-Level Scraping Solution

Nothing gets my blood pumping like web scraping. Think of how much data is floating around out there in the wild — no associated API, no official download links. Don’t you just want to help yourself to that data, like your hunter-gatherer ancestors helped themselves to wild beasts?

Your ancestors helping themselves to some kind of armadillo thing.

If you’re a red-blooded data scientist with an interest in web scraping, you should learn Scrapy. Scrapy is the single most powerful scraping tool for Python — it is to the Requests library what Pandas is to the Python dictionary. It’s fast, flexible, feature-packed, and well-documented. …

Nick Gigliotti

Aspiring Data Scientist and Jack of Many Trades

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store