Unit 6: Database Systems
Database: an organised collection of data which can be accessed in many different ways.
- All data stored in a single table
- The headings are the field names
- There is no sharing data
- Can be created using a spreadsheet
Problems with Flat Files:
- Duplication of data - data is entered more than necessary
- Inconsistencies can arise - a change of address results in records having old addresses and new addresses.
- Very restrictive - only suitable for small sotres of data.
Unit 6: Database Systems
RELATIONAL: A large collection of data items stored im tables containing links between them structured in a way that it can be accesssed by a number of different applications programs. The tables are linked together by primary and foreign keys.
Features of a Relational Database:
- Data is held in two or more tables
- Links between the tables
- Data from any of the tables can be extracted
- Greater knowledge is needed to create them
Unit 6: Key 'Data' Words
Data Consistency: Is where the relationship between the input data, the processed data and the output data are related. If the system works correctly the data will be correct at each stage so said to be consistent.
Data Redundancy: Data which is repeated unncessarily or stored twice in a database.
Data Integrtity: The correctness of the data. Truthful and Accurate.
Data Independance: The data and the applications programs used to access it are independant/ seperate.
Unit 6: Data Warehousing
- Used to store all an organisations historical data and is used by management information systems to extract information which will help make decisions.
- Is a corporate resource which everyone in the organisation has access to.
E.g. finding out the day of the week on which a particular store sold the greatest number of a certain product.
E.g. how employee sickness varied over the last year between Manchester and Newcastle.
- Can then be mined
- Allows the company to store all details of what is sold to each customer.
- Company can see who uses a loyalty card and what they have brought.
- Can compare information like sickness data from other stores.
- Sotring all the historical data better equips managers to make decisions.
Unit 6: Data Mining
Once data has been stored in a data warehouse it then needs to be mined to discover:
- patterns in the data
- associations in the data (people who read the times are more likely to drink red wine)
- Trends over time (person buying more healthy food and drinking less alcohol)
- By mining the mass data it allows users to understand the data more
- Looking for patterns presents the results in the form of tables and graphs.
Can produce information such as:
- Lists of customers likely to buy a product
- Comparisons with competitors
- Analysis of best sites for shops
- Predictions for future sales
- Customer buying patterns
Unit 6: Applications of Data Mining
Helping fight against terrorism since 9/11 -
Government has been analysing peoples travel, spending and communications habits using data mining, in order to spot patterns of abnormal behavioural that could lead it to terrorists.
Fighting shop lifting in clothes retailer Jaeger -
Using data mining with information about transactions and position of the clothes in the stores, they found that most of the clothes stolen were situated near doors desptite them being security tagged.
Identification of customer needs -
Virgin Media who supply broadband, telephone and cable TV packages use data mining to segment the target customers most likely to buy new services or upgrade those services they already have.
Unit 6: Distributed Databases
A single logical database whose parts are physically located in more than one place, each location is accessible accross a network.
Why use a distributed database?
- Many organisations have branches and offices located accross the country or world.
- A supermarket stock system with many branches or travel company with branches off it.
A central database updated over night as a batch operation, but day to day work in the branches are carried out using the local part of the database. If the network fails then they carry on as normal until it's fixed.
Duplicated Database: A local copy of the entire database is kept at each location.
Unit 6: Disadv & Adv of Distributed Databases
- Very heavy use of the network
- Whole database may be huge requiring extra storage
- Difficult to keep all the copies in synchronisation
- If the database is fairly small, then this may be a practical solution.
Unit 6: Partitioned Database
The database is split into conveneient data sets depending on the specific needs of the organisation. E.g. a supermarket company that has dozens of stores dotted around the country needs an efficient stock control database so each store has its own section of the stock control database.
- Not so useful if each branch needs information on other branches
- The database must be carefully positioned to keep each section as local as possible
- Each store is independent problems in one store dont affect others
- Network load is less
- High performance as no network bottleneck
Unit 6: Partitioned + Index Database
Includes an index of all the remote database records as well. Still keeps all records as local as possible but also keeps the indexes updated on a nightly batch run.
- The system must now keep all indexes up to date
- More complicated
- High performance as queries and updates remain local
- Can efficiently access remote records by using the index rather than a network query to a central database.
Unit 6: Security Issues
Multiple Entry Points - there are many data entry points to the system. Each 'node' of the system has to be kept physically and logically secure.
Encryption Keys - parts of the system exchange secret keys that are then used to encrypt network traffic. So the more the keys are spread around the more likely they are to be compromised.
Corrupted Node - if one node is compromised by a virus or hacking attempt then the rest of the system is vulnerable as well.
Unit 6: Adv and Disadv of Distributed Database
- Resilient - a problem in one part will not stop the other branches from working.
- Security - staff access can be limited to only their part of the database
- Network traffic reduced - bandwidth costs reduced
- Local database still works if the company network is broken
- High performance - queries and updates are largely local so network bottleneck
- Complexitiy - more comlpicated to set up and maintain compared to a central database
- Security 0 there are many remote entry points in the system compared to a central database
- Data Integrity - more complex to make sure data and indexes are not corrupted
- Data needs to be carefully parititioned to make the system as efficient as possible
- Not so efficient if there is heavy interaction between branches in which case a central database is a better option.