Key things to remember when building data teams

Three factors determine the success of AI solutions: use cases, work culture and teams. In my previous post, I discussed AI use cases and how startup founders can incorporate them. Once the wheels are in motion, the next step is to assemble an AI team that can bring the idea to life. Culture is just as important for nurturing a data-driven environment, but that’s for another article.

The AI ​​team consists of an “external team” (a team external to the data team but part of the core AI team) that works closely with the data team, then it there’s the core data team itself. The diagram below is a very high level representation of the main stakeholders and their role in the AI ​​project. It globally represents the different stages of the AI ​​project and the areas in which the different teams integrate. The “data team” is everywhere. We will detail each of these teams.

The External Data Team

The external team is distributed on an AI project. AI projects are initiated by senior management with a high-level idea. For example, a CEO who wants to create a dynamic pricing system. A product manager or program manager sets the vision for the AI ​​solution and drives the vision by building a team around the solution. The design team then steps in with the intervention plan and user experience, after which the engineering team designs and builds the entire solution. The data team is involved in both execution and ideation throughout the project lifecycle.

Credit: author

Let’s explore the roles of the “external data team” in detail.

Account Managers/Account Manager/CEO/Product Manager/Sales: One of them usually initiates an AI project. They could launch the solution and act as a project sponsor. They manage and inform other stakeholders about the status of the project and offer a general guideline for requirements.

Product Managers: They are responsible for the vision of the AI ​​solution. In addition to establishing metrics, they are involved in solution alignment, obtaining sponsorships, leading and delivering the overall solution with other teams like engineering and design.

Their role is to bind the whole project and the stakeholders – developers, designers and executives – together. Product managers are usually responsible for design, product market fit, and product delivery. With the AI ​​project (compared to other projects), there are some additional responsibilities and challenges like dealing with unknowns, non-deterministic outcomes, new tools and technologies, and new infrastructure.

Engineering Managers: They are responsible for the architecture, construction and maintenance of specific technological components. They take care of the growth and development needs of the team, anticipate and work on the reliability aspect of the engineering systems. For an AI project, the role of the engineering lead would be to design the solution and provide infrastructure for the final AI solution and ensure the integration works. Overall, they are responsible for the reliable execution of the AI ​​solution.

Designate: Appointed as the voice of the customer with internal teams, designers build the user experience of products. They design the right experience for the AI ​​solution – how recommendations should appear, where push notifications should appear, or how a customer is notified of the taxi’s estimated arrival time. Designers of an AI solution also suggest user experience-focused metrics and safeguards. Often machines can forget the spirit of the customer experience and designers have a role to play in making the solution more empathetic. They ensure that the customer experience is at the heart of the product or solution that the company is building.

Core Data Team

The AI ​​data pipeline, shown in the figure below, is a good way to represent the functions of different data teams. The data pipeline follows the journey of data, from data ingestion to insights and prediction. Data teams are present on different parts of the pipeline. We will see each of these roles in detail.

Credit: author

Data Engineer: Data Engineering is responsible for plumbing data pipelines for high-speed, high-volume data. Data engineers design data processing systems to ensure usability and reliability. They also make life much easier for downstream data consumers. In a nutshell, the team does all the work necessary to collect, store and process the data.

A data engineer should be familiar with common programming languages ​​such as Java, Scala, Python, and Ruby. For data collection and ingestion, one should be familiar with Kafka, fluentd, logstash, etc. For storage tasks, knowledge of NoSQL, data warehouse, Amazon Redshift is highly desirable. Other required skills include skills in Spark, SQL, ELK Stack, etc.

Data Analyst: The role of a data analyst is to determine how data can be used to answer questions and solve problems. Data analysts analyze large-scale data and provide reports and dashboards to make decisions. They advise stakeholders on questions regarding certain data points and how they can be improved over time.

A data analyst should have knowledge of SQL, data visualization tools, and business intuition.

Business analysts: Compared to data analysts, business analysts are more involved in the business. While data analysts prepare data in formats that can be easily analyzed, business analysts apply their assumptions and thought process, their understanding of the business and the product to provide actionable recommendations. These use data to create business insights and recommend appropriate actions an organization can take. They work closely with other members of the hierarchy to implement changes based on their findings.

A business analyst would need skills in programming languages ​​such as SQL, R, and Python. Generally, Business Analyst is a good role, even for people from a non-tech background.

Data Scientists: Data science is the automation of thought. Some of the data science that takes place at scale in an industry is nothing but applied research. Data scientists are analytical experts with a good understanding of the problem at hand, basic math, programming skills, and underlying data systems. They use industry knowledge and contextual understanding to address business challenges. Data scientists are using machine learning to enable “micro-decisions” at scale, yielding multiple business impact

An ideal data scientist should have a deep understanding of data, algorithmic knowledge, strong mathematical foundation, proficiency in programming language, and know-how of deployment systems.

ML Engineer: A machine learning engineer combines software engineering and machine learning. ML engineers take algorithmic models and apply them to large-scale consumer data. An ML engineer should have algorithmic knowledge, strong mathematical foundation, good level of programming skills, the know-how to apply large-scale systems, run large-scale algorithms, and rewrite algorithms if necessary.

Many of the skills associated with an ML engineer – algorithmic knowledge, basic math, programming language proficiency, and knowledge of deployment systems – overlap with those of a data scientist. Additionally, an ML engineer must be able to run algorithms at scale and rewrite them if necessary.

The skill map

Credit: author

The illustration above describes the skills required for each role. A double tick refers to strong expertise and a single tick is representative of basic knowledge. For example, a data scientist should be strong on topics such as Python and algorithmic knowledge; however, basic SQL know-how and business intuition would suffice.

We could see overlaps in the skills typical of the roles within the function. This helps provide career opportunities for people to grow in various roles in the data function.


They say it takes a whole village to raise a child. This is the case with an AI solution: several teams are involved in creating the solution. The trick really is in building a team that understands the issues, aligns well with the larger goal, and delivers to the best of their abilities.

This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives from the data science and analytics industry. To check if you are eligible for membership, please complete the form here.

Comments are closed.