Lessons learned in real-world

Photo by Dan Dimmock on Unsplash

Today embarks my 5 years of journey in Data Science & Machine Learning. When I took my first course in data science I was working as a Data Engineer and thought of investing some time in exploring this field. Since then it was a great experience and lots of learning. The journey has no shortcuts and I still think I have a lot to learn and explore.

The Machine Learning field is evolving everyday and it has space for everyone from every industry. In this post I won’t list about my journey and how I became a self taught Data…

Spark making ML deployment easier

Photo by Chris Liverani on Unsplash

A while ago, I wrote a post about Productionizing Machine Learning Models, where I mentioned strategies for deploying Machine Learning models into production. This article is about one of the widely used approaches — Batch Prediction using Spark. Using this approach, you can schedule a job to run the predictions at a specific time and output them to database / file systems / streaming / any persistent layer.

Batch Prediction Solutions

Now days many solutions/tools are available for batch prediction, simplest is writing code in python and scheduling it with Cron, but there are 2 problems with this approach-

1. With each prediction…

Ensemble is an art and science

Photo by Jeswin Thomas on Unsplash

In my previous post about ensemble learning, I have explained what is ensemble learning, how it relates with Bias and Variance in machine learning and what are the simple techniques of ensemble learning. If you haven’t read the post, please refer here.

In this post I will cover ensemble learning types, advanced ensemble learning methods — Bagging, Boosting, Stacking and Blending with code samples. At the end I will explain some pros and cons of using ensemble learning.

Ensemble Learning Types

Ensemble learning methods can be categorized into two groups:

1. Sequential Ensemble Methods

In this method base learners are dependent on the…

Ensemble can make things simple

Image Source: pixabay

When you are reading this post about ensemble learning I hope you are familiar with at least few machine learning models and their implementation. When we train ML model on any dataset we all face our error companions called Bias and Variance. Most of the time either we are not getting the satisfactory accuracy results or the data is overfitting into the model. And then we adjust data again, create/drop features, re-train model and this process can repeat several times. …

Deploying models into production is a complex affair

Photo by Stephen Dawson on Unsplash

The biggest issue in the life-cycle of ML project isn’t to create a good algorithm or to generalize the results or to get good predictions or better accuracy. The biggest issue is to put ML systems into production. One of the known truth of Machine Learning world is that only a small part of real-world ML system is composed of ML code and a big part is model deployment, model retraining, maintenance, on-going updates and experiments, auditing, versioning and monitoring. And these steps take a huge part in ML systems technical debt as it exists at the system/platform level rather…

Technique to make a difference in model performance

Photo by KDnuggets

Many people still think that Data Science/Machine is all about executing the algorithms and you can get results. Yes off-course you will get results but are those good enough? or Have you ever noticed in any Data Science/Machine Learning competitions, why only few people are the winners? There are handful of algorithms that everyone is using, then where is the difference?

The answer is very simple and it is the DATA. Data on which you are running you machine learning algorithms is what makes a difference.

There is a simple ML rule…

Save upto 90% of memory with simple tricks in Pandas

Photo by Swaathi Kakarla on activestate.com

Few days back while working on sentiment analysis for a huge data set, I was facing memory issues multiple times. I was cleaning memory, nullifying objects and not loading all the columns at once; still memory issues. I knew that its pandas so I researched for some solution.

Pandas as many of you already know was started as a data wrangling and analysis tool but now due to its capabilities and easy to implement nature it has become the production-level API.

While working with small-sized data, there are no issues…

Charu Makhijani

Data Science & Machine Learning Enthusiast | Writing about Data Science, AI, ML, DL, Stats | https://github.com/charumakhijani

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store