Data Management and Governance

?
  • Created by: DataGal
  • Created on: 06-10-22 17:11
Data Mining
The process of extracting previously unknown, valid, and actionable information from large databases and then using the information to make crucial business decisions.
1 of 16
Artificial Intelligence (AI
A wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that usually require human intelligence.
2 of 16
Machine Learning
The science (and art) of programming computers so they can learn from data.
3 of 16
Steps in data mining
1. Business Objective Determination
2. Data Understanding and Preparation
Data Selection, Data Pre-processing/cleaning, Data quality problems, noise, outliers, missing values, curse of dimensionality, sampling
3. Data Mining/ Modelling
4. Analysis of Res
4 of 16
Data Transformation (Data Wrangling/ Data *******)
Deciding which feature I want to use.
5 of 16
5 Main Features to Look for when Evaluating your Data (C, A, C, V, U)
Consistency- Is your data consistent across your datasets?
Accuracy - Is your data close to the true values?
Completeness - Does your data include all required information?
Validity - Does your data correspond with business rules and/or restrictions?
Unif
6 of 16
Examples of data quality problems.
Noise and outliers, missing values, duplicate data, inaccurate data, fake data
7 of 16
Noise
Modification of Original Values
8 of 16
Noisy Data
One or more variables which have values which are significantly out of line with what is expected for those variables.
9 of 16
Missing at Random (MAR)
Relative to observation
10 of 16
Missing Completely at Random (MCAR)
Missing across all observations we’d expect
11 of 16
Missing Not at Random (MNAR)
Info not collected, info NA to some people, values missing due to human error
12 of 16
How to deal with missing values?
Eliminate the observations, eliminate the variable, sub in mean, predictive modelling
13 of 16
Representative Sample
Has approximately same property (of interest) as the original set of data
14 of 16
Big Data
Datasets that grow so large that it is difficult to capture, store, manage, share, analyse and visualise with the typical database software tools.
15 of 16
3Vs related to Big Data
Velocity
Volume
Variety
16 of 16

Other cards in this set

Card 2

Front

A wide-ranging branch of computer science concerned with building smart machines capable of performing tasks that usually require human intelligence.

Back

Artificial Intelligence (AI

Card 3

Front

The science (and art) of programming computers so they can learn from data.

Back

Preview of the back of card 3

Card 4

Front

1. Business Objective Determination
2. Data Understanding and Preparation
Data Selection, Data Pre-processing/cleaning, Data quality problems, noise, outliers, missing values, curse of dimensionality, sampling
3. Data Mining/ Modelling
4. Analysis of Res

Back

Preview of the back of card 4

Card 5

Front

Deciding which feature I want to use.

Back

Preview of the back of card 5
View more cards

Comments

No comments have yet been made

Similar Mathematics & Computer Science resources:

See all Mathematics & Computer Science resources »See all Week 1 resources »