paint-brush
Interview with Petr Gusev: How To Manage an ML Team the Right Wayby@javierortega
587 reads
587 reads

Interview with Petr Gusev: How To Manage an ML Team the Right Way

by Javier Ortega-AraizaNovember 21st, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The "Interview with Petr Gusev: How to Manage an ML Team" provides valuable insights into Petr Gusev's extensive experience in ML engineering and product management. Gusev, an ML Tech Lead at Deliveroo, shares his successful management strategies and approaches to handling challenges within an ML team. The interview covers various aspects, including successful project management, delegation of tasks, effective communication, handling challenges during model development, resource allocation, alignment with business goals, impact measurement, and fostering a collaborative and inclusive team environment. Gusev emphasizes the importance of aligning ML projects with business goals, employing strategies such as regular communication, setting OKRs, and tracking progress through various methods like A/B testing and quarterly degradation testing. He also discusses the significance of fostering a collaborative team environment, encouraging continuous learning through initiatives like reading clubs, R&D days, and ML hackathons, and resolving conflicts through open communication and, when necessary, involving HR. The interview provides a comprehensive view of Petr Gusev's approach to managing ML teams, offering valuable insights for ML practitioners, team leads, and those interested in understanding the intricacies of ML project management in real-world scenarios.
featured image - Interview with Petr Gusev: How To Manage an ML Team the Right Way
Javier Ortega-Araiza HackerNoon profile picture


Petr Gusev is an ML expert with over six years of hands-on experience in ML engineering and product management. As an ML Tech Lead at Deliveroo, Gusev developed a proprietary internal experimentation product from scratch as the sole owner. As part of the innovative stream of Yandex Music transforming the product to add podcast listening experience to the service, he built a podcast recommendation system from scratch as an ML Engineer at Yandex and achieved a remarkable 15% target metrics improvement. Additionally, as Head of Recommendations at SberMarket, his tech-driven roadmap elevated AOV by 2% and GMV by 1%.

Can you provide examples of specific ML projects that you have managed successfully? How did you ensure the project's success?

I led a project to personalize the ranking of products featured on the main page of a grocery delivery service. Our main task was to increase business metrics through personalized product offerings.


To make sure the product would be successful, I did:


  • Interviews with users, with the intention of understanding what is valuable to them and the type of products they would like to see on the main page. In this project, I also invited ML engineers to participate in the interviews so that they could get context by listening to the users.


  • Checked whether the hypotheses that arose after interviews with users were confirmed by the data we gathered


  • Recorded the metrics that we expected to reach as a result of developing this functionality


  • Before starting actual development, we developed the model and made sure that it gives us the necessary quality and that we can achieve the necessary results in the A/B test. This is a very important point because developing an ML model is a research project that may not give us the results we expect. And if development has already started, we will waste the entire team’s time without achieving the necessary results.


  • Assembled a cross-functional team of analysts, developers, and ML engineers and made sure that all people working on the project were on the same page throughout every stage of product development.


What are the specifics of managing an ML team?

  • A key aspect of managing an ML team is defining the project scope with precision, given the exploratory nature of ML projects. This requires setting clear, structured tasks. For instance, to delineate tasks accurately, I often use a template like "Implementing X will impact metric Y by Z.” This approach helps us set tangible goals and measure progress accurately.


  • Talent acquisition and skills development are also vital. Building a team with the right mix of skills is essential, and continuous learning should be encouraged to keep up with the rapidly evolving field of machine learning. This includes nurturing both technical skills and soft skills, such as problem-solving and teamwork.


  • Cross-functional collaboration is another important element. Especially when integrating and building a new ML model, the ML team needs to work closely with other departments, such as data engineering, product management, and marketing, to ensure that the ML model goes live.


  • Lastly, in this field, risk management and experimentation are crucial. Managing an ML team involves balancing and mitigating the risks associated with new and untested models while encouraging innovation and experimentation. It's about creating a safe space for trial and error, learning from failures, and continuously improving the models and techniques used.


How do you delegate tasks and responsibilities within your ML team to maximize efficiency and productivity?

  • Structuring Work by Product Functions: I organize responsibilities based on different product functions such as consumer, operations, and marketing. This approach ensures that team members can specialize and develop deep expertise in specific areas, leading to more efficient and effective problem-solving.
  • Rotating Team Members Between Streams: Periodically, I rotate Machine Learning Engineers (MLEs) between different streams. This not only prevents monotony but also broadens their experience and perspective. For instance, an MLE who has completed an initial project in the consumer stream may be moved to work on improving the model in a different stream. This rotation fosters a well-rounded skill set and promotes adaptability within the team, and reduces the so-called “bus-factor.”


  • Assigning Tasks Based on Seniority and Growth Potential: Senior MLEs are often tasked with more complex, end-to-end projects or major research initiatives, such as building initial model prototypes. Meanwhile, junior MLEs are assigned to support these projects, providing them with opportunities to learn and hone their skills. This mentor-mentee dynamic accelerates the growth of junior team members and ensures effective knowledge transfer.


  • Setting Clear Objectives and Expectations: For each task or project, I clearly define the objectives, expected outcomes, and timelines. This clarity helps team members understand their roles and responsibilities, enabling them to work more autonomously and efficiently.


What strategies do you employ to ensure effective communication and coordination between data scientists, engineers, and other team members?

  • Scheduled regular meetings where team members from different functions can update each other on their progress, discuss challenges, and brainstorm solutions.


  • Ensured all work, whether code, data analysis, or research findings, is well-documented and accessible on shared platforms.


  • Established and agreed with team members on clear channels for communication specific to different needs – such as urgent queries, brainstorming, or status updates.


  • Involved all relevant team members in goal-setting and project-planning sessions.


How do you handle challenges that may arise during the development and deployment of ML models? What potential bottlenecks can occur when managing an ML team?

Often, the key factor is the model's execution speed. Initially, it's crucial to understand the specific constraints needed, as these can vary across different segments of a product. For instance, a model might be required to deliver results in less than X milliseconds or operate on less powerful machines.


To effectively manage this, a rapid prototype should be developed to evaluate the model's performance in a production setting. If meeting these constraints is impossible, then it’s important to negotiate with stakeholders to determine which aspect we should prioritize: the model's accuracy or its execution speed and resource consumption. Another common challenge in ML teams, particularly those with a stronger focus on science and mathematics rather than engineering, is optimizing the model for high performance in a production environment. Teams with a predominantly scientific and mathematical orientation might excel in creating robust models but may lack the skills to enhance their operational efficiency. Therefore, incorporating diverse profiles into the team is vital, as this ensures that we have both engineering and scientific backgrounds.


Additionally, managing the required accuracy constraints of the models is a critical aspect of the process. Tasks often necessitate careful balancing of precision and recall. For instance, in scenarios where false positives have significant implications, prioritizing precision might be more crucial. Conversely, in cases where missing true positives is costly, focusing on recall becomes essential. Tailoring the model to effectively manage these trade-offs according to the specific requirements of the task is fundamental to the successful deployment of machine learning models.


Can you share an example of a time when you had to make tough decisions regarding resource allocation within your ML team?

Problem: with a fast-growing user base, our team ran out of resources to store personalized recommendations, which were generated offline.


Short-Term Adaptation:As an immediate measure, I ceased generating personalized recommendations for users with insufficient history feedback. Recognizing that the quality of these recommendations would not be optimal due to the lack of data, I’ve replaced them with non-personalized recommendations. This approach leaned more towards popular recommendations, utilizing a different ML model to manage the immediate storage issue.

Long-Term Solution: To sustainably address the problem, my team developed an online recommendation system. Unlike the previous method, this new system did not require storing recommendations for every user. Instead, it generated recommendations dynamically, in real-time, when a client requested them. This innovative solution effectively resolved the storage constraints and adapted the recommendation process, and made it more scalable and responsive to user demands.


How do you ensure that the ML team's work aligns with the business goals and objectives of the organization?

Ensuring that the ML team's work aligns with the organization's business goals and objectives involves a strategic and proactive approach:


  • Regular Business Context Sharing: I have established a process for consistently sharing the relevant business context of our project with the team. This involves frequent updates and discussions about the company's broader goals, market trends, and customer needs, ensuring that the team's work remains aligned with the overarching business objectives.


  • Setting Quarterly OKRs in Alignment with Company Goals: In collaboration with the team, we set quarterly Objectives and Key Results (OKRs) that are directly derived from and support the company’s own OKRs. This ensures that our projects and initiatives are contributing to the company's strategic objectives.


  • Bi-Weekly Progress Tracking: To monitor our progress towards these OKRs, we conduct bi-weekly check-ins. These sessions allow us to assess our current standing, identify any challenges early, and make necessary adjustments to stay on course for achieving our objectives.


  • Monthly Stakeholder Updates: I proactively engage with stakeholders, providing them with monthly progress updates. This not only keeps them informed about the ML team's contributions and advancements but also fosters transparency and trust. It allows stakeholders to provide feedback and ensures that our efforts are in sync with the company's evolving priorities and needs.


How do you measure and track the impact of ML projects?

  • Pre-Development Impact Estimation: Prior to the actual development of the ML model, I conducted a thorough data analysis to estimate the potential impact of the project. This step involves evaluating historical data and predictive analytics to forecast the expected outcomes and benefits of the ML model.


  • A/B Testing: I implement A/B testing to empirically assess the effectiveness of the ML model and its impact on business metrics. Before initiating the experiment, I identify the key business metrics that are most likely to be influenced by the model. These metrics are carefully selected based on their relevance to the project goals and are analyzed to gauge the model's performance.


  • Dashboard Monitoring:Post-deployment, I actively monitor a set of dashboards that track various company and proxy ML-model metrics. This includes, but is not limited to, metrics like the number of purchases facilitated by the recommendation system or distribution of travel time model predictions. Regular monitoring of these dashboards, typically on a weekly basis, allows for real-time assessment of the model's impact on business operations.

  • Quarterly Degradation Testing: To further understand the value added by the ML models, I conduct quarterly degradation tests. These tests involve temporarily disabling the ML models in specific parts of the business flow or across the entire application for a randomly selected, small user group. This helps in quantifying the impact of the ML systems by observing changes in business metrics in its absence. However, it’s very important to align with all the departments involved, so everyone is aware and actively agrees on performing such a test.


What steps do you take to foster a collaborative and inclusive team environment where all members feel motivated and valued?

  • Valuing Every Team Member's Opinion: I emphasize the importance of every team member's perspective and actively encourage open communication. It's crucial not only to state that all opinions are valued but to demonstrate this in practice by giving equal attention and consideration to everyone's ideas and suggestions.


  • Maintaining Team Diversity:I prioritize building and maintaining a diverse team. This includes diversity in skills, backgrounds, experiences, and perspectives. A diverse team is more likely to generate innovative ideas and solutions, and it also fosters a richer, more inclusive work culture.

  • Inclusive Meeting Management:During team meetings, I ensure that everyone has the opportunity to speak and contribute. I actively facilitate discussions in a way that every member feels comfortable sharing their thoughts. This might involve directly inviting quieter team members to share their views or setting up meeting structures that ensure balanced participation.

  • Encouraging Team Building and Networking: Organizing team-building activities and informal networking opportunities helps in building rapport and understanding among team members.


  • Providing Opportunities for Professional Growth: I support the professional development of each team member through opportunities like personal development plans, workshops, and conferences.


  • Recognizing and Celebrating Contributions: Regularly acknowledging and celebrating the achievements and contributions of team members.


How do you encourage continuous learning and professional development within your ML team? Are there any specific initiatives or programs you have implemented?

  • Reading Club: We have established a reading club where team members take turns to present and discuss a machine learning science paper relevant to our work. This not only keeps the team updated on the latest research but also encourages critical thinking and knowledge sharing.


  • Biweekly Research and Development Days: Every two weeks, we dedicate a day to R&D, where each team member explores a research project or tests new hypotheses. This initiative is designed to foster innovation and practical application of new ideas that could benefit our daily work.


  • Quarterly ML Hackathons: To stimulate creativity and team bonding, we organize quarterly ML hackathons. These events span a few days and are focused on working on exciting, out-of-the-box projects. They provide an opportunity for the team to experiment with new technologies, collaborate in different team configurations than usual, and offer a fun and stress-free environment to innovate.


How do you handle conflicts or disagreements within the ML team? Can you provide an example of a conflict that you resolved successfully?

Initially, I conducted one-on-one meetings with the involved parties. This allows me to understand each person's perspective and the core issues at hand. It's important to listen actively and empathetically during these discussions. Following the individual meetings, I arranged a joint meeting with both parties. Here, I share my observations, facilitate a constructive discussion, and encourage both individuals to express their views. This step often helps in clearing misunderstandings or mismatched expectations.


Most conflicts are resolved at this stage. However, if the disagreement persists, I involve HR to assist in finding a resolution, ensuring that all parties feel heard and that a fair solution is reached.


For technical disagreements, I advocate for a data-driven approach to conflict resolution. I encourage team members to support their arguments with measurable metrics or empirical evidence. For example, a conflict arose over which algorithm would perform faster for a particular application. To resolve this, I asked both parties to provide benchmarks and empirical data supporting their claims. This approach not only resolved the conflict but also fostered a culture of evidence-based decision-making.


How do you ensure that your team is also up-to-date with the latest technologies and techniques?

I arrange for experts from other companies within the industry to visit and share their experiences with our team. This exposure to external expertise provides valuable insights into new practices, tools, and methodologies being used in the industry. It's an opportunity for the team to learn from the successes and challenges faced by others in similar fields.


I actively encourage team members to both attend and present at relevant conferences. This serves a dual purpose. Attending conferences keeps them informed about the latest trends and breakthroughs in machine learning while presenting their own work fosters a deeper understanding of their areas of expertise and enhances their professional profiles. This involvement in the professional community not only benefits individual team members but also brings fresh perspectives and ideas back to our team.