Post

What I Learnt During My Days as a Data Scientist

Before I started blogging about tech and productivity, I spent a fair bit of time in the world of data science. It was a wild ride — filled with cryptic algorithms, endless datasets, and the occasional Excel sheet that still haunts me. Although I’ve since moved on to other ventures, my time as a data scientist left me with some valuable lessons, both technical and personal. So, in the spirit of ‘building in public’, let me share a few of the key things I learnt during my stint as a data scientist.

1. Data Cleaning is 80% of the Job (And It’s Not Glamorous)

If you imagine data science is all about crafting slick machine learning models, then I’ve got news for you: 80% of the work is just cleaning data. Missing values, inconsistent formats, weird outliers — you name it, you’ll find it. I used to think this was a bit of an exaggeration, but it’s not. And no, it’s not fun.

Think of it like preparing a house for painting. You can’t just slap a fresh coat of paint on top of grime and hope for the best. You’ve got to clean the walls, patch the holes, and sand down the rough edges. Data science is no different. If your data is messy, your results will be too, no matter how fancy your model is.

Lesson: Master data wrangling tools like Python’s Pandas library, SQL, or even Excel. These are your weapons in the battle against dirty data.

2. The Power of Simple Models

In the early days, I would spend days tuning complex machine learning algorithms like random forests or deep neural networks. I’d tinker with parameters, adjusting learning rates and hidden layers, expecting dramatic improvements. Sometimes it worked. But more often than not, a simple linear regression or decision tree would give me 90% of the results with 10% of the effort.

The lesson? Don’t let complexity fool you. A model that’s easy to interpret and explain can often be more valuable than one that’s incredibly accurate but a nightmare to understand. In business, that interpretability can be the difference between getting your solution adopted or being ignored.

Lesson: Always start with the simplest model first. Don’t underestimate the power of a basic approach.

3. Communication is King

I can’t tell you how many times I thought I had a breakthrough, only to be greeted by blank stares when I tried to explain it to my colleagues. Data science is as much about storytelling as it is about analysis. It’s not enough to find insights; you’ve got to explain them in a way that your audience (often non-technical) can understand and appreciate.

Executives don’t want to hear about AUC curves or hyperparameters; they want to know how the insights will impact the business. Will sales go up? Will costs go down? Will the sky turn purple? (Okay, maybe not the last one, but you get the point).

Lesson: Practice turning your analysis into digestible narratives. Tools like PowerPoint and basic data visualisation libraries (think Matplotlib or Seaborn in Python) are your friends here.

4. You Can’t Predict Everything

There’s a reason “forecasting” and “fortune telling” both involve a bit of mystery. No matter how good your data is, or how sophisticated your model, predicting the future is tricky. I spent a lot of time building predictive models, and while some of them worked, many didn’t. This is especially true in areas where human behaviour comes into play (like marketing or social trends).

External factors like market crashes, new government regulations, or even a global pandemic (hello, COVID-19) can render your forecasts obsolete overnight. And when things go sideways, it’s crucial to admit that no model can account for everything.

Lesson: Be comfortable with uncertainty. Build robust models but know that predicting the future is never a guarantee.

5. Data Science is a Team Sport

One of the biggest misconceptions about data scientists is that we sit in a dark room with a coffee pot, headphones on, coding away in isolation. But here’s the thing: data science is a team sport. I collaborated closely with business analysts, engineers, project managers, and even sales teams.

A lot of the job is about understanding the business problem, gathering the right data (often from other teams), and communicating findings to decision-makers. I had to break out of my techie bubble and work with people who spoke a completely different “language.”

Lesson: Get out of your silo and collaborate. Learn to speak the language of the business as much as you learn to code.

6. The Value of Domain Knowledge

I entered data science thinking my skills in Python and statistics were all I needed. Spoiler alert: I was wrong. Understanding the domain you’re working in is just as important as your technical skills. Whether it’s healthcare, finance, retail, or marketing, the most successful data scientists know the industry they’re working in.

It’s all well and good to know how to build a churn prediction model, but if you don’t understand why customers leave in your particular industry, your model will be missing context. You’ll be asking the wrong questions, and no algorithm can save you from that.

Lesson: Immerse yourself in the domain you’re working in. Talk to people who aren’t data scientists but know the business inside out.

7. It’s an Ever-Changing Field

Lastly, data science is constantly evolving. New tools, algorithms, and techniques are popping up all the time. Just when you think you’ve mastered one thing, something new comes along. Early on, I thought I’d eventually reach some magical point where I knew everything. Turns out, that point doesn’t exist.

The key to staying relevant is embracing a mindset of lifelong learning. Whether it’s brushing up on the latest machine learning frameworks, or reading up on how AI is disrupting industries, there’s always something new to absorb.

Lesson: Stay curious. Continuous learning is the only way to stay sharp in this field.

Wrapping It Up

My days as a data scientist were full of lessons — some hard-earned and others discovered through trial and error. Data science, at its core, is about solving problems and delivering value through insights. While the technical skills are crucial, the real magic happens when you blend those skills with domain knowledge, collaboration, and the ability to communicate effectively.

So if you’re considering diving into data science, remember: it’s not all glamorous, but it’s definitely rewarding. And hey, if you ever need help cleaning data, hit me up. I might not do it anymore, but I’ve got some serious battle scars that make for good stories.

Happy analysing!

This post is licensed under CC BY 4.0 by the author.