Unit 6: Flat File Databases
Database: an organised collection of data which can be accessed in differnet ways.
- All data stored in a single table
- The headings are the field names
- There is no sharing of data
- Can be created using a spreadsheet
Problems with FLAT FILES:
- Duplication of data - data is entered more than necessary
- Inconsistencies can arise - a change of new address results in records having old and new addresses
- Very restrictie - only suitable for small stores of data.
Unit 6: Relational Databases
RELATIONAL: a large collection of data items and links between them. Structured in such way that it allows it to be accessed by a number of different applications programs.
Features of a relational database:
- Data is held in 2 or more tables
- Links between the tables
- Data from any of the tables can be extracted
- Greater knowledge is needed to create them
Unit 6: Key Database Terms
Data consistency - is the relationship between the input data, processed data and the output data as well as other related data. e.g. if the system is working correctly the data will be correct at each stage and is said to be consistent.
Data redundancy - is where you store an item of data more than once. It refers to the unncessary duplication of data e.g. a company may hold its data in differne files. This is wastefull because some data may need to be input twice.
Data integrity - this is the correctness, truthfulness and accuracy of the information. e.g. if one record was left unchanged the data would no longer be wholly correct.
Data independence - the data and the applications programs used to access it are independent/ seperate.
Unit 6: Data Warehousing
Data Warehousing: Used to store all an organisations historical data and is used by a management information system to extract information to help make decisions. It is a corporate resource which everyone in the organisation has access to.E.g. finding out how employee sickness varied over the last year between Newcastle and Derby.
- Can then be mined easily
- Allows the company to store all the details of what is sold to every customer.
- The company can see who uses a loyalty card and exactly what they have brought and what method used to pay.
- Can compare infomation like sickness data between differnet stores.
- Storing all historical data better equips managers to make decisions.
Unit 6: Data Mining
Data Mining: Once data has been stored in a data warehouse it then needs to be mined to interrogate the data.
- Investigates potential patterns in the data
- Associations in the data (people who read The Times are more likely to drink red wine)
- Trends over time (person buying more healthy food and drinking less alchohol)
- By mining the mass data it allows users to understand the data more
- Looking for patterns allows results to be displayed in tables and graphs.
Can produce information such as:
- Lists of customers likely to buy a product
- Comparisons with competitors
- Analysis of best sites for shops
- Predictions for future sales
- Customer buying patterns
Unit 6: Distributed Database
Distributed Database: has data stored on a number of computers at different locations but appears as one logical database.
E.g. many organisations have branches and offices located around the world.E.g. a supermarket stock system with many differnet branches.
Duplicated Database: a local copy of the entire database kept at each location.
- Very heavy use of the network
- Whole database may be huge requiring extra storage
- If database is fairly small then it may be a practical solution
Unit 6: Partioned Database
Partioned Database: split into convenient data sets depending on specific needs of organisation. Eg. supermarket company has hundreds of stores in the UK and therefore needs efficient stock control so each store has its own section of database.
- Not so useful if each branch needs info on other branches
- The database must be carefuly positioned to keep each section local
- Each store is independent and problems in one store dont affect the rest
- Network load is less as only the central database needs to be in sync
- High performance as no bottleneck
Unit 6: Partioned & Index Database
Partioned & Index Database: includes an index of all remote database records as well. Still keeps all records as local as possible but also keeps the indexes updated on a nightly batch run.
- The system must now keep all indexes up to date.
- More complicated
- High performance as queries and updates remain local
- Can efficiently access remote records by using the index rather than a network query to a central database.
Unit 6: Security Issues
Multiple entry points - there are many data entry points to the system. Each 'node' of the system has to be kept physically and logically secure.
Encryption Keys - parts of the system exchange secret keys that are then used to encrypt network traffic. So the more the keys are spread around the more likely they are to be compromised.
Corrupted Node- if one node is compromised by a virus or hacking attempt then the rest of the system is vulnerable as well.
Unit 6: Methods of overcoming Security Issues
- Computers are located on a number of sites so it is important to ensure only authorised comuters can access the system.
- Can be achieed by using passwords to authorised users and regularly updating the passwords to increase level of security.
- Data regularly transmitted between differnet sites and so data may become corrupt or be tampered with during transmission.
- Encryption of transmitted data.
Normalisation: the organisation of data into tables which relate to a single entity.
Unit 6: Advantages & Disadvantages of Distributed
- Resilient - a problem in one part will not stop the other branches from working
- Security - staff access can be liited to only their part of the database
- Network traffic reduced - bandwidth costs reduced
- Local database still works if the company network is broken
- High performance - queries and updates are largely local so no network bottleneck.
- Easier to keep errors local rather than the entire organisation being affected.
- Complexity - more complicated to setup and maintain compared to central database.
- Security - there are many remote entry points in the system compared to central database.
- Data integrity - more complex to make sure data and indexes are not corrupted.
- Data needs to be carefully partitioned to make the system as efficient as possible.
- Not so efficient if there is heavy interaction between branches in which case a central database is a better option.