Data Science

Data Science

Data Science combines the scientific method of maths and statistics, specialised programming, advanced analytics, Artificial Intelligence (A.I.), and storytelling to uncover and explain the business insights buried in data. It is a multidisciplinary approach to extracting actionable insights from large and ever-increasing volumes of data collected and created by today’s organisations.

It involves preparing data for analysis, processing and performing advanced data analysis to present results to reveal business patterns of the organisation to enable stakeholders to make informed decisions.

Data Science Pipeline

There is a thing called “Data Lifecycle” or as it is sometimes called “Data Science Pipeline” and it can be split up into sixteen steps, but listed below are the most common steps we constantly look at within businesses;

Capture

This entails us gathering up all the raw structured and unstructured data by any method possible (i.e. web-scrapping to manual entry, to capturing data from systems and devices in real-time)

Prepare and Maintain

Requires putting raw data into a consistent format for analytics, machine or deep- learning models and can include everything from cleansing, de-duplicating, reformatting the data, using ETL (Extract, Transform, Load), or other integration technologies to combine the data into a data warehouse, data lake, or other unified store for analysis.

Pre-process or Process

At this point we examine biases, patterns, ranges and distributions of values within the data to determine the data’s suitability with predictive analytics, machine learning and/or deep learning algorithms.

Analyse

This is where the discover happens as we perform statistical analysis, predictive analytics, regression, machine and deep learning algorithms to extract insights from the prepared data.

Communicate

Finally, we take the insights data and present them as reports, charts and other data visualisations that make the insights, and there impact on business, easier for decision-makers to understand.

As hard code data driven people we also have to have the ability to build and run coding for creating models.

Two of the most popular languages are;

R

It is a programming language and environment for developing statistical computing and graphics. R provides a broad variety of libraries and tools for cleansing and prepping data, creating visualisations; and training and evaluating machine and deep learning algorithms.

Python

Is a general, object-oriented, high level programming language, emphasising code readability through its distinctive generous use of white space.

Not only are our people proficient in the above programming languages, but they are proficient in the use of big data processing platforms like; Apache Spark, Apache Hadoop and skilled in a wide range of data visualisation tools like; Tableau, Microsoft Power BI and open source tools like D3.js and RAW Graphs – there is a more exhaustive pool of tools we are able to dive into should the need arise.