How being a developer made me a better data scientist

When I started working as an algorithm developer, I knew I was making a big change in my career, but didn’t realize how big — from data analyst through business analyst and then data Scientist. However, I never thought I would end up writing C# code. Although it wasn’t (and still isn’t) easy, this change “freed my mind” and let me see the real world outside the research lab.

“Welcome to the real world”

So now I’m trying to develop the best feasible features / algorithms / tools while being aware of real world constraints such as performance, resources, scale and real active users.

From data scientist to algorithm developer

As a data scientist, I was focused on asking the right questions (I hope) and finding the best answers / solutions without in-depth thinking on real world implementation. Working with a version control tool seemed redundant, while writing unit tests for an algorithm was odd and felt like a waste of time since I believed my logic / code could handle any data input.

But I was wrong. I didn’t take into account the development time, integration with the application /environment , resources, performance, etc.

My first feature was an end-to-end task that required the use of Scala, R, SQL, C#, HTML and JavaScript, and to be part of the company’s continuous integration framework. I had to learn new languages, frameworks, tools and company methodologies like design review and automatic tests.I was very proud when it was deployed to production. However, then I realized that my solution wasn’t as good as I thought.

I didn’t have the best design and SOLID principles wasn’t something I was familiar with and my code wasn’t clear enough. My design made it hard to develop the feature over time, be resilient in production and apply other engineering aspects.

I’m not developing in my little lab any more

I’m only at the beginning of my journey, and I’ve already faced some interesting challenges that I’d like to share.

Being a junior with 5 years’ experience

I’ve moved out of my comfort zone. Moved a lot — learning new methodologies, concepts and languages from scratch and putting aside my previous knowledge for a while. It’s not just learning new stuff, it’s also a big change in mind-set, to think like an engineer and not as a researcher. It’s a trade-off between velocity and in-depth research and, like most things in life, the truth is somewhere in the middle.

Start the data revolution

My company uses and understands data. We track goals and KPIs, report analytics and metrics, conduct A/B testing and use a self-made recommendation system and algorithms.

Working as a data scientist has taught me that data can be much more than that. A/B testing is more than a split test; statistics has a key role in decision-making; deep and active data analysis can drive ideas and features. And that’s without even talking about data-mining and machine-learning methods.

As a data scientist, one of the challenges is to guide both R&D and the Product team through the evolution of data and how we as a company can benefit from it, in order to build the products our users will really love and need. For example, why it is better to first understand the metrics, goals and how to measure them before even writing a single line of code (make it an integral part of every design). Explaining the importance of data analysis and ROI before and after writing a feature although it takes more time. A/B testing — why and how to do it right and why it is so important that we need to invest time on it instead of developing another feature.

ETA, predictability and time caps tasks

Predictability was always hard for me, especially in research. Besides the data exploratory part that never fits my plan :), I always want to improve my feature/ model/tool a bit more. For a data scientist it is sometime accepted, but for an algorithm developer/engineer it isn’t.

As an engineer, I need to be able to work with layers, starting with MVP/lean feature and then adding more layers and improving it. And that’s hard, especially with models/algorithms. The million-dollar question is when to stop, when it’s enough, and I was needed to develop this muscle.

Where to put my focus?

I love to learn new things, I can’t sit back. But there are only 24 hours in a day. How much effort should I put on data/model/algorithms/statistics/business (my core skills) compared to the new developer skills I recently gained? The answer to this question keeps changing in my head.

Final words

While there’s a constant fight in my head about what I want to be when I grow up, or what combination I should choose between a developer and a researcher, I know that each side made me a more complete engineer — a kind of “full stack data scientist”.

From the developer side I’ve learned that I need to think about the feasible solution, constraints, reality and ETA. And from the data scientist part, start by asking, why? What are we going to gain from it? And how are we going to measure it success — and only then decide if are we going to start working on it.

So the journey continues. And while I’m not sure I have the right answer yet, I know that I keep learning and understanding the big picture a bit more — so for now that’s good enough for me.