Data Mining


What is Data Mining?

  • The process of discovering patterns in large data sets 
  • Aims to discover patterns and trends that go beyond simple analysis
  • Combines AI, statistics and database systems 
1 of 7

What is the aim of Data Mining?

  • Mian aim is to extract information from a data set and transform it into an appropriate format for future use
  • Patterns are presented as a summary of the input data and used for further analysis. They may be put into decision support systems to support future prediction.
  • Used to support tactical, operational and strategic decision making
2 of 7

What is Big Data?

  • A term associated with large data sets
  • So complex that traditional databases and other processing applications are unable to capture, curate, manage and process them within an acceptable time frame
  • Defined as the 3 V's. - Volume - the amount, Variety - the number of data types and Velocity - the speed of data processing
3 of 7

The use of Digital Technology in Data Mining

  • Social media is one of the biggest sources of Big Data. Consumer good companies actively scan social media websites to decipher user preferences, choices and perceptions towards their brands.
  • Digital technology allows us to collect data for further analysis using methods such as online forms, email data annd market research.
  • Data sources can be internal and external. Internal includes customer details, sales data etc. External includes business partners, the internet etc.
  • Social media, machine data e.g. RFID chip readers and transactional data are the commonly used data sources
4 of 7

Big Data Storage

  • Cannot be measured in terms of amount of data for many organisations - relates to the need to store data sets in cumulative range of terabytes to many petabytes of data.
  • Key requirements of big data storage are that it can handle large amounts of data and keep scaling to keep up with the growth of data sets in addition to being able to provide high speed input/output operations necessary to support the delivery of data analytics as they are carried out.
5 of 7

Big Data Practitioners Storage

  • Google, Facebook etc run hyperscale computing environments which consist of a vast number of servers with Direct Attached Storage
  • Each is rendered redundant following hardware failure as its mirror will already have taken over
  • Each unit will generally have PCIe storage devices to support data storage and high access speeds to data sets.
6 of 7

Importance of Big Data to Organisations

  • Can help companies gain insight into potential revenue increases or help companies determine how they can gain or retain customers and improve operations.
7 of 7


No comments have yet been made

Similar Computing resources:

See all Computing resources »See all Data Mining resources »