Wednesday , July 17 2019
Home / Uncategorized / Case study: how to create a high-performance data science team

Case study: how to create a high-performance data science team



Artificial intelligence (AI) has the potential to change industries across the board, but few organizations are able to capture its value and achieve a real return on investment. The reality is that the transition to artificial intelligence and data-driven analysis is difficult and not well understood. The question is twofold, firstly, the technology needed to complete such a task has only recently become mainstream and, secondly, most data scientists have no experience in their respective fields. However, with all the uncertainty surrounding this topic, a hedge fund has managed to overcome these challenges and realize what many companies can not do: create a high-performance data science team that realizes a real return-on-investment (ROI).

This is the story of an outlier

Business science he was recently invited to the walls of Amadeus Investment Partners, a hedge fund that has unlocked the power of artificial intelligence to achieve superior results in one of the most competitive sectors in the world: investment. Amadeus Investment Partners has spent the last five years building a high-performance data science team. What they have built is nothing short of extraordinary.

In this article, we will find out what it makes Amadeus Investment Partners an outlier and why they are unique in the data science space. We will learn the key ingredients that provide Amadeus with a recipe that is driving ROI with artificial intelligence and analyzing what it takes to build a high-performance data science team.

Data Science team structure, Amadeus Investment Partners

We will then describe how Business science he is using this information to develop the best data science education in the form of both customized workshops on site is virtual workshops on request. We will show how we are integrating the same technology at the forefront of our data science for business programs.

This is all about one thing: develop a system to create best-in-class data science teams.

If you are interested in developing a best-in-class data science team, then read on.

Examining an anomaly

Amadeus Investment Partners it is a hedge fund that fuses traditional fundamental investment principles with quantitative techniques at the forefront to create "Quantum" strategies that identify assets that yield excellent returns while minimizing the risk to their investors. Their goal is to provide their investors with higher risk adjusted returns.

Amadeus' the strategy is working. Here is an overview of the 2005 backtesting results compared to the S & P 500, which is a difficult benchmark to outperform. During the backtest period, we can see that the Amadeus strategy has generated "alpha", which means that the strategy has generated excess returns (performance) beyond the benchmark returns.

Strategic quantitative long-short performances

Performance of risk return, Amadeus Quantial Short-Short strategy

From Growth of $ 10,000 since 2005 graphic, Amadeus it seems to be a performing hedge fund. However, it is not until we dive into the Risk and return profile, that we begin to see the magic come to light. The Sharpe Ratio, which is a relationship between the reward and the risk commonly referred to in investments, is almost double that of S & P 500 over this period of time. This means that Amadeus is taking less risk per premium unit than the S & P 500. In addition, the Maximum drawdown, or the largest loss from the peak over the period of time, was about half the S & P 500 during the same period of time. Ultimately, this means that Amadeus is offering exceptional returns by taking less risk, which is very interesting for investors.

But how is it? Amadeus achieve these results?

A radically different organization

In our meetings with Amadeus, we found 3 key components to the high-performance data science team. Each of these is of paramount importance for Amadeus' successful execution of their radically different data-driven strategy. Amadeus:

  1. Find and train talents in the most unlikely way

  2. It has a well-designed team structure and culture

  3. Provides access to cutting-edge technologies

We will go through each of these key ingredients that make up the recipe for data-driven success.

Key 1: find and train talent in the most unlikely way

The first key to the puzzle is to find and develop the talent to be implemented in the vision. Here's where Amadeus has excelled: find talent in the most unlikely places.

In the last few years, Amadeus has worked tactically with the main educational institutions in Canada to selectively get access to the best students in …

Business programs

Yes – Students who are on top of their classes in Business programs. If you look at the demographics of their team, most have no mathematical or physical background. If you're familiar with the conventional makeup of the data science team full of mathematics and computer science, this may surprise you.

This unusual hiring practice is based on the belief that the subject's knowledge and communication skills that the best business students bring are critical advantages in Amadeus& # 39; Data-driven approach. At the end of the day, data science is a tool that people use to answer the questions they are interested in and hire staff with relevant expertise to ensure that the right questions are asked. Amadeus subsequently converts these corporate-minded people to computer scientists by increasing their skills with math and work-based scheduling.

"Assuming people with theirs subject matter competence ensures that the right questions are asked. "

-Rafael Nicolas Fermin Cota

In terms of training the talent hired, Amadeus has a distinct advantage. One of the founders, Rafael Nicolas Fermin Cota, was a professor at the Ivey Business School of Western University, one of the best business schools in Canada. In his curriculum, he taught his students how to make business decisions using data science. He states,

"My work involves teach students how to think. The material of the specific course, they can forget. But if they learn to think, they will learn to solve the problems they face in their professional careers. "

-Rafael Nicolas Fermin Cota

And this spirit of learning and critical thinking that you experience when you meet with the Amadeus data science team. What you take away too is a structured approach to this intellectual curiosity. Each member told stories of their beginning at Amadeus. Start the same: learn to program, study the statistics and get a great deal of mentorship. It takes six months of education and training before a new employee is ready to be an integral part of the team. The basic curriculum includes the following concepts:

  1. Database management: Obtain data from various sources and store them effectively for further access.

  2. Data manipulation: Working with raw data (often in many different formats) and turning them into one
    set of organized data that can be easily analyzed.

  3. Exploratory analysis of data: Exploration of data to determine the various characteristics of the data set (NA,
    average, standard deviation, type, etc.).

  4. Predictive modeling: Use of available data to predict future results using machine learning and more
    concepts of artificial intelligence.

  5. display: Presentation of the results of the exploratory analysis of data and predictive modeling to various
    public.

This basic training ensures a common understanding that team members derive from discussions, making the communication process much more efficient.

To continue the education and professional development of team members, anyone is free to purchase books, courses or other training material as needed.

Key 2: well-designed team structure and collaborative culture

Once the initial training is finished, each new hire is ready to be integrated into a functional part of the team. Integration involves searching for the role that best suits their abilities along with Amadeus' needs. This approach allows the new employee to occupy a position they are interested in while benefiting from the organization.

The team structure has been carefully designed to optimize team members' talent and to transparently reflect the desired interaction between team members. Think of the high-performance team structure as the project for success.

Data Science team structure, designed for high performance

Data Science team structure, designed for high performance

It involves four key roles:

  1. Experts on the subject
  2. Data engineering experts
  3. Data Science experts
  4. User interface experts

Experts on the subject (SME)

Amadeus has four SMEs involved at the start and end of the investment strategy development process. At the beginning of the process, SMEs are responsible for generating initial ideas for new strategies. These ideas are based on company fundamentals and are meticulously researched before being discussed with the Data Engineering and Data Science teams. SMEs are also responsible for the end of the process, which is the implementation of the strategies. This ensures that the execution of investments is in line with the original strategy design.

Set of relevant skills:

  • Accounting & Finance: Deep knowledge of financial analysis and capital markets is needed to build initial strategic ideas
  • Excel: Excel is used to store initial strategy ideas
  • R: R it is used to perform data exploration and to work efficiently with data

DEE (Data Engineering Experts)

When SMEs come up with new strategic ideas, the Data Engineering team is then called on to collect and make available the data needed by the Data Science team to test the ideas. With the petabytes of financial data available, DEEs must master the programming methods that will make the transmission and calculation of data as efficient as possible. In addition, Amadeus focused on data quality because a further analysis is meaningful only on the basis of good quality data. Financial data is often noisy, contains many missing values ​​and requires join timestamps, which is very difficult due to the size of the data and the fact that global data sources rarely align.

Set of relevant skills:

  • C ++: C ++ it's a high-performance language at the heart of their data engineering business. Calculation in parallel of calculations and development of distributed systems using C ++ allows Amadeus to take full advantage of working with big data
  • SQL: SQL it is the language used to interact directly with their databases
  • R: The data table the package is mainly used to resize R for speed when adopting strategies from exploration
    to production

Data Science Experts (DSE)

The DSE a Amadeus they are fundamental to explore various properties of ideas generated by SMEs and develop different algorithms required by the strategy, based on their experience in statistical analysis, automatic learning (supervised and unsupervised), time series analysis and text analysis. The main challenge they face is to be able to flow through the flow of hypotheses generated by SMEs and rapidly develop analyzes. They are the ones that identify patterns or anomalies in the data set, produce concise reports for SMEs to allow a quick interpretation of the results and determine when the ROI of a project has decreased and new projects have to be launched.

Set of relevant skills:

  • R: R it is used for exploration data analysis (EDA) and visualization due to its ease of use for exploration. The tidyverse it is mainly used for the rapid transformation of data before exploration.
  • Python: Python it is used for advanced machine learning and in-depth learning with high performance NVIDIA GPUs. All major deep learning frameworks are available in Python and can be easily implemented using the tools provided in the NVIDIA GPU cloud.

User Interface Experts (UIE)

Amadeus develops interactive web applications to support internal decision making and operations. New challenges arise when building dashboards. The application must be customized to the problem, but it can also work well when it comes to interactivity. Given these limitations, creating a performing application often comes down to selecting the right tools. Use the UIEs R + Brilliant for light applications or Python, Django is JavaScript when performance and interactivity are important concerns.

Set of relevant skills:

  • Data banks: Data-driven Web applications start in the database. Knowledge of the appropriate query language (SQL, MongoDB, etc.) is necessary to manage data effectively.
  • Data analysis: R + Brilliant it can be used for a quick demonstration of the concept, while Python + Django they are used for production-level performances.
  • Web development: HTML, CSS, JavaScript they are a necessity when creating sophisticated web-based user interfaces.

Emphasis on communication

An often overlooked part of a data science team is the team's appearance, which requires communication of ideas and analysis through the workflow. For most other organizations, the various departments work in silos, interacting only with each other at senior management level. This prevents members from seeing the big picture and generates internal competition at the expense of organizational performance.

TO Amadeuscollaborative culture is encouraged because each project is performed by a transversal team, involving at least one person from each of the four functional parts described above. In this way, projects can benefit from the different perspectives of the team members and the research process is simplified without conflicts between each phase.

In addition, weekly two-handed meetings are organized to keep abreast of individual progress and create a forum for team members to share ideas and suggestions.

Key 3: access to technology at the forefront

NVIDIA V100 "style =" width: 50%;

As mentioned earlier, it takes a huge effort to find and train talents and get them to work collaboratively. At this point, all this effort would be useless if there was a technological bottleneck in the search process.

Data Science Team members have full access to the computational infrastructure for intensive work (DL, NLP) and intensive CPU work (data cleansing, report generation, EDA). Their systems provide all members of the team with immediate access to high-performance computational resources to minimize the time it takes to perform the calculations. This allows the team to quickly scroll through ideas.

TO Amadeus, each team has its own calculation stack in order not to interfere with the work of the other teams. This infrastructure is connected to allow interaction between teams.

  • Data engineering: Systems optimized for population and database query. DEEs provide a & # 39; personalized API that allows all other teams to access the data immediately.

  • Data scienceCPU and GPU high performance ideal for the training of machine learning models and the execution of EDA.

  • UI / Web Applications: Systems designed specifically to host Web applications and internal Shiny / Django applications. UIEs can use the DSE infrastructure when high-performance calculations are required in the back-end.

  • Experts on the subject: High-performance data and hardware access through front-end APIs and hardware specially designed for their execution needs.

Amadeus has collaborated with NVIDIA, pioneers of the next generation of computational hardware for research and implementation of Artificial Intelligence. The team is actively using high-performance computing with its own internal analytical technology stack featuring NVIDIA DGX-1, the world's fastest in-depth learning system.

Business science assisted Amadeus"The data science team forms a text classifier on financial news data to predict the sentiment of the articles.The NVIDIA DGX-1 produced results in minutes, which would take several hours if not days on a CPU system or even a GPU system not optimized for in-depth learning.

Best-In-Class Data Science Education

Turning knowledge into education

Business Science has obtained the following ideas from the Amadeus case study:

  1. Recruitment talent with subject matter competence is subsequently educating them to data science has proven to be effective in building a high-performance team

  2. Communication between different teams is important and The education must support communication between the different teams

  3. The teams must be equipped with the latest technology to reach the full potential

Unfortunately, data science education is still in its infancy because most educational institutions do not understand what it takes to do real-world data science. Most programs focus on theory or tools. This does not work. Learning how to make real-world data science comes only from application and integration, and those with an understanding of the business have an advantage.

This is the reason why Business Science is different.

We are building a best-in-class educational program that incorporates the teachings:

  • Through study an outlier – A radically different and extremely high-level data science team that is successfully generating ROI for their organization.

  • Through ours applied consultancy experiences who have successfully generated ROI for organizations

  • Through experience in the construction of tools and software necessary to solve business problems

The next-generation Business Science education offers two options that complement this knowledge:

  1. Custom workshops on site

  2. Virtual workshops on request

Custom workshops on site

The workshops are short but powerful. In less than 2 days, we can teach what data scientists normally use to learn. The key is our approach:

  • We do 6 weeks of preparatory work with your team

  • We use data relevant to your company and your sector

  • We have expert data science instructors who are experts in every aspect of data science

  • We are focused on business application

SP Global Workshop

Business Science customized learning workshop, Customer: S & P Global

Virtual workshops on request

TO University of Economics, we are building two tracks focused on R is Python, which teach the same tools as Amadeus"Data science experts use for exploratory analysis and machine learning, but they are available on request and independent. The Data science for company tracks focus on a real business problem and students apply many of the most popular machine learning tools in several weeks.

Data Science For Business Routes

Course paths DS4B, Business Science University

The Roadmap of the Business Science University course with a 12-month timeline in particular, it focuses on team roles and the integration of the tools that correspond to the Amadeus team's skills together with the experience of software development and consulting of the Business Science team.

Course schedule and calendar

Course timetable and calendar, Business Science University

In the next 12 months we are focusing in particular Data Science and UI / Web Applications:

  • Data Science For Business with R and Python: DS4B 201-R (available now) and DS4B 201-P (Q4 2018)

  • Development of Web applications: DS4B 301-R using R + Shiny (Q3 2018) and DS4B 301-P using Python + Django (TBD)

  • Time series analysis: Virtual workshop on fundamental time series concepts, machine learning and in-depth learning (H1 2019)

  • Text analysis: Virtual Workshop on Fundamentals of Text Analysis, in-depth Learning (H2 2019)

  • Crash Courses (Not Shown on Roadmap): These are short courses that prepare students to R, Python, Spark, is SQL, which are necessary for the courses in the 200 series

The program will develop into Data engineering including high-performance languages ​​(e.g. C ++), big data tools (e.g. Spark) and distributed calculation.

Create your data science team today

Building a Data Science team? Business Science can help.

We are your educational partner. We are here to support your transition by providing the best training in data science. No matter where you are, we'll take you where you need to go. Contact us to learn more about our educational data science skills.


R-bloggers.com offers daily e-mail updates news on R and exercises on topics such as: data sciences, Big Data, R jobs, visualization (ggplot2, Boxplot, maps, animation), programming statistics (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping ) (regression, PCA, time series, trade) and more …



If you've come this far, why not? sign up for updates from the site? Choose your taste: e-mail, chirping, RSS or Facebook …


Source link

Leave a Reply

Your email address will not be published.