12 years of 'machine learning' engineering in a blink

Note
This article is non-technical and personal. The reader may not get much out of it, as it's intended to help me shape the next decade of objectives to pursue. Making it public is a deliberate constraint to flesh it out more clearly.
Intro
I became a father three weeks ago, and the priority of my life has changed forever. It's time to take a step back and reflect on what I've achieved in this first part of my professional career.
I'm amazed by how fast time has passed. It's been 12 years already, forging machine learning applications for a living.
From the start-ups of my early days to entrepreneurship, then freelancing and big companies, it's always thrilling to apply cutting-edge research from dozens to millions of customer devices.
What a lucky man I am, paid to read scientific articles, implement and improve upon ideas at the frontier of what's known to be possible, in one of the hottest fields of our time: deep learning.
This post looks at my past experience, to better understand what to focus on in future work and career decisions.
Retrospective
The journey has been good to me so far.
The first startup I worked with was more of a second family that helped us grow. I'm forever thankful to Jerry & Nico, the two veteran founders, for that. Their humanity and 'warping' optimism, alongside a good dose of 'deliver fast' and 'push to prod,' worked wonders. They gave me a chance while they were looking for a seasoned machine learning engineer—a job that was very uncommon in the market—and I had little to no experience back then. There, I learned a lot about how to build data pipelines, create models, and scale workloads in service of various business goals, with what most would call traditional (trad) ML today.
Two years later, I was too comfortable. They switched toward more classical business with less need for data science. I felt it was time for me to say goodbye and try something new I had dreamed of doing since university: starting my own company in speech understanding for call centers. The core idea was that we need to stop punishing 'operators' and instead cherish the customer relationship through deeper conversational and emotional analysis of customer calls. This could be measured and aggregated on global dashboards linked back to all those detailed conversations. It was a drastic change. A far broader skill set became necessary, and to be clear, I was (and still am) a clueless beginner in most areas: sales, pricing, marketing, front-end, product development... I never worked so hard, BUT damn, I enjoyed testing those new hats! This period of my professional life was extremely exciting. It was incredible to finally set objectives based on my convictions to solve big real-world problems. So many people I shared a demo with were impressed by what was achieved in such a short amount of time, but after almost two years, things were not yet good enough, and I started to feel drained. A key failure in this adventure was likely spending too much time on feature creep, on things I was good at building. Not enough time was allocated to build the core value proposition of the company: a good, adaptive, tailored speech recognition system for the clients. At the beginning, I took too long to recognize that external speech recognition API providers weren't good enough and charged too much to build a viable business. Later, I spent too little time training speech models due to lack of focus and too little data gathered for the 'locale'. A cardinal sin that I will try never to forget. Secondly, loneliness took a profound toll on me. That's probably my biggest regret: I failed to gather a proper team around the project. In retrospect, there were opportunities. I was probably too young and too introverted, and despite my own first clients proposing that we become partners, or being approached by a fundraiser to perform a proper 'seed' round, I declined. I postponed it indefinitely, wrongly thinking that the kind of project I was working on could be bootstrapped by first client subscriptions.
In 2018, I was out of money and started looking for freelancing work to buy some time. After a market search, I landed a permanent job at a Parisian startup. It was not the original goal, but I thought I would just quit after some time. They also focused on speech recognition, their end product being a privacy-by-design 'voice smart assistant' that could be tailored to client needs and ran fully on a mere Raspberry Pi 3. It was an opportunity I could not refuse, first because this was the occasion to strengthen my knowledge in deep learning and speech recognition, but also because after so long it was the occasion to meet people who shared the same passion I have: building ML systems and experimenting with cutting-edge technologies. Funnily, this company was at the opposite end of the spectrum from what I was trying to build before: around 50 people gathered from around the world, a product pivot every year, money raised again and again without many paying clients. I met some really smart, energetic people who allowed me to grow again; some became friends, and time passed one crazy event at a time. In 2020, the startup was acquired by a big company. I felt I still had so much to learn that I could not leave.
Like all acquisitions, big companies attract different talent than startups. People started slowly leaving the boat, one at a time; a few hiccups in management and product releases accelerated the process. Among the numerous things I learned in this new era, one stands apart: the process of conducting experimental research with successive written logbook reports followed by a global advancement report (something I very much intend to do in this blog from time to time). This was probably the thing most alien to me and my 'startup' mindset. The first time I did it, I felt so illegitimate. The second time, I felt that so much time was being spent on something ad hoc. Then it became a habit, and today I cannot even think of doing serious ML research or advanced engineering without this process. Having written material for discussion or looking back at projects from 2020 and being able to recover the work realized back then is simply invaluable.
During these last years, beyond working on large language model fine-tuning and quantization, I also got the opportunity to share some of that joy back with the community: through an article on 'small footprint voice identification' accepted to ICASSP in 2021, publishing an open-source library torch-to-nnef this year serving as a strong bridge between PyTorch and the Neural Network Exchange Format 'NNEF' (introduction article here), and regular contributions to tract, an open-source neural network Rust inference engine maintained by Sonos. I never thought it would be possible for a simple 'applied' research engineer like me to attend conferences like ICASSP 2021, 2022 & NeurIPS 2023, experiencing these incredible events from the inside, seeing and discussing with the authors behind the latest advances in deep learning, ASR, large language models, reinforcement learning... Meeting people of such diverse backgrounds and experiences, continually willing to share and improve technologies openly with the world, made a positive imprint on me and helped shape who I am.
The journey was good, but I would be lying if I said it's been a perfect honeymoon.
So many projects, so many urgent features that were of core importance back then. Most were merely useful at some point in time, and so few have withstood the test of time. Sometimes it feels like our industry is just a house of cards, slowly but continually churning. Dinosauria we.
With this background established, I think it's time to think about the future by reflecting on what kind of work attracts me most. In a nutshell: public projects and research experiments.
1. The public projects
The public projects that stand the test of time. Here seems to lie the biggest impact for our societies in programming: the foundation of most technological projects I know of. From programming languages to operating systems to the libraries used within each software, private or not, we build on top of each other's work, one brick at a time. Public projects are a strange, living human construct. Fed by the attention of the open crowd, they become so hard to compete against once a big community gathers around them.
At their core, there are always at least a few stubborn people who will maintain and make them evolve beyond reason, mostly because they enjoy playing around with the problems they framed for themselves. Which is funny to think of: we are driven by passion to build one of the most 'logical' technologies existing today.
torch-to-nnef
In that respect, torch-to-nnef is probably my main attempt to contribute.
Why do I take on the burden of starting and maintaining such a project? The safe answer would cite the value proposition: This is a solution for better exchanging our PyTorch models with the tract neural inference engine since 2022. Still, that does not explain my motives. If I'm being realistic, ONNX could fit the bill, aside from a few limitations here and there (and likely some will shrink with time). The project is too big for a single person: bridging two libraries that evolve that fast without control on one side is so time-consuming. We will likely never reach the final goal of being a complete general-purpose exchange. More importantly, this 'exchange format' project is rather far from my usual beloved ML formulation and optimization experimentation.
So, what's the deeper 'reason'?
It may be in part that it helps me understand in-depth the linear algebra primitives that compose our deep-learning models.
How do they compose together? Putting myself in that position allows me to better understand the latest neural network architectures.
It's also an enabler when it comes to my 'unusual' quantization attempts, performance squeezing, or more recently, exporting a model that does not fit in RAM in full precision.
Still, this project will probably always be 'niche,' and the community it attracts is tightly tied to tract, which is not very significant to begin with.
Don't get me wrong: torch-to-nnef open-sourcing is a milestone I'm happy with, and I plan to continue maintaining it. This is just not enough to fulfill
my thirst to help our field move forward with more innovative public projects.
2. The experiments that challenge the status quo
The second technical area I enjoy most is likely experimentation. Optimizing predictive models toward the final end-product goal is so fun. From building solid datasets with classical data science to challenging our evaluations and metrics to designing the neural network itself, I've done all those steps so many times for so many projects.
Selecting the latest convincing techniques published, reformulating them to our goal, improving them by validating or discarding hypotheses one at a time is probably the most thrilling part for me. The critical thinking, the knowledge shared and created around the discussions that come after the experimental reports are wonderful and so fruitful. From my experience, this is what a lot of 'start-ups' and time-constrained projects miss; this time to explore brings unforeseen opportunities. Without it, we are just good copyists, greasing an engine and doomed to incremental small improvements.
Yet, it's impressive how subtle our work can be. Optimization is such a slippery beast. Do not assume prior work reported is always perfectly handled; a working final model is not proof of correctness. As a friend of mine would say, a good scientist is first and foremost a good engineer: basics should be reassessed regularly with low ego. I vividly recall a deep learning model that was core to our stack. It had very simple framing, with years of work invested. Talented people had completely missed resetting a proper learning rate decay and training time. They probably did not revisit the parameters after some data ingestion. The impact was crazy: this meant all subsequent experiments were non-conclusive (that was a couple of years of work). After fixing that and doing a few reassessments over the following two months, we had increased the quality of the model by 70% relative to our baseline.
In recent years, a few core principles have stood out in that area to keep this kind of work sane. Fast iteration cycles with proper reproducibility and clear reporting are of core importance. If validating a hypothesis takes more than two weeks of training, the iteration loop is broken, and launching experiments asynchronously is not a solution. Indeed, managing parallel experiments is exponentially hard as their number grows. This goes beyond 'human' multitasking limits. For example, let's imagine you tweak hyperparameter A from a baseline, then parameters B and C from the same baseline. If A is effective and C too, is B+C still leading to improvements? Guess what: sometimes the union can worsen model quality (or marginalize one benefit). Training big models does not escape this pitfall. Each trick allowing faster hyperparameter 'validation' is a blessing. For example, in this case, a draft on a smaller model or fine-tune when possible to validate a first prototype can save a lot of resources.
Navigating this meta-optimization process led by the practitioner is never the same. Damn, it's hard to avoid hitting local minima. Even if, with time, comes intuition of what works, determining the infamous limiting factor of a good baseline is hide-and-seek, where we are forced to make guesses backed by partial data clues in a hugely complex POMDP. More often than not, we are limited to observing correlations with impossible-to-confirm causalities, building degrees of trust more than certainty.
This leaves us with the question of what kinds of 'domains' I would like to experiment with next. Reinforcement learning is certainly a key area on my radar since I read the Sutton & Barto book. I also very much like my recent work focusing on compression techniques like quantization and efficiency. In the long term, every piece of technology that allows a trained system to be more autonomous and adaptive feels interesting and intriguing.
Conclusion
Opportunities will come. As time passes, we will see how this article ages. The experienced ML practitioner I am can already tell you: this post is certainly not a good predictor of the future 😉 Still, turning the page, maybe I will be able to refer to it as a good compass of my past self. In the meantime, rest assured that the next articles will be far more technical!