Data, information, knowledge and processing
All you need for the first section of the ICT exam, more to follow.
- Created by: Frankie
- Created on: 05-05-13 22:19
Data
Data: a collection of alphanumeric characters without meaning
Data is the raw facts and figures before they have been processed.
'Raw facts and figures' means facts and figures without meaning.
An example of data is: 093234 RDJKAE912.
There is no way to tell what they mean - they are just a random series of numbers and letters. To explain what they mean requires turning them into information.
Information
Information: Data + Context + Structure + Meaning
Context: taking the data and giving it an environment where our prior knowledge and understanding can make sense of it.
Structure: the presentation of data, including formatting.
Meaning: an understanding of what the data relates to.
Information is made by taking the data and processing it. Processing is performing some action on the data. This might be sorting, searching or editing.
Information = Data + [Context] + [Structure] + Meaning
In some cases, the data does not need to have a context and a structure in order to become information.
Examples of Information
Data 17645
Context UK £
Structure nnn.n
Meaning £176.45 (the amount going into a bank)
KEY POINTS:
- Information is processed data
- Information means something
- All examples given must use the context of the question
Representation Methods
There are advantages and disadvantages of all representation methods.
The key methods to focus on are text, graphics, sound, moving pictures and light-emitting diode.
Text:
- Clear to understand
- There is a lot of detail
- You have to be able to read to understand it
- You need to understand the language
- Lots of text cannot be read quickly
- Directional
Graphics:
- Do not need language to understand an image
- Can match to what you see
- Can be confusing if you do not understand the symbols used + is directional
Representation Methods Continued...
Sound:
- No fixed position
- No line of sight required
- Good for visually impaired people
- Isn't good in large areas - distortion of sound
- Usually language based
- May not know the sound - e.g. different alarms have different sounds
- Need to be able to hear - wouldn't be suitable for those who are audio impaired.
Moving Pictures:
- Lots of information conveyed
- Not language dependent
- Can exemplify texts
- Linear - if you do not see the beginning you may not understand
- Problems if there is sound
Representation Methods Continued...
LED:
- Can allow data to be kept secure
- Can be used in noisy places
- Similar to graphics
- Directional
- Combinations of light may need to be known to be understood.
KEYWORDS:
Text: alphanumeric characters put together to deliver a message.
Graphic: picture, image or drawing.
Sound: a range of frequencies which can be used to transmit information.
Video: the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing sequence of still images representing scenes in motion.
LED: a single point of light, can be grouped together to form multiple points.
Use of representation methods
Things to consider...
Language: must the audience be able to understand a specific language?
Cultural: are there cultural differences that may mean that words or signs have different meanings?
Visibility: can the audience see information from where they are or are there line of sight issues?
Complexity of information: a picture is worth a thousand words but sometimes can give more information.
Attention: will the audience know the message has been given?
Physical disability: sight/hearing problems are an issue in transferring information to an audience.
Knowledge
KEYWORDS
Knowledge: application of information (to a given context) resulting in understanding.
Application: using information within a given context.
Converting information into knowledge...
- Information is based on certainties, these are things that will occur the same way every time, or mean the same thing every time.
- They do not change or alter for a specific event.
- Knowledge however, can change. This doesn't mean that it will every time, but it is possible.
- More information can be added to our knowledge and as we add more information we revise our knowledge.
Data Types
Data types define the type of information that can be stored. For example, dates, numbers or characters.
There are five data types:
Boolean: One of only two values (true or false)
Real: Numbers with decimals (87.09)
Integer: Whole numbers (43)
String: Alphanumeric characters (Good Luck)
Date/Time: Numbers and letters (08/11/13)
Static and Dynamic data
Static Data: data that remains the same after a refresh
Dynamic Data: data that can (but does not have to) change after a refresh.
CD-ROM/Magazines: Static Data
- There is a limited amount of information available
- Does no require access to the internet
- More reliable source of information
- Data cannot be updated very quickly
- The CD can be scratched/broken
- If there are errors, erratum notices would have to be sent out to people
- Have to collect/wait
- Magazines do not have sound or multimedia
Static and Dynamic data continued...
The Internet: Dynamic Data:
- The internet has a large volume of information
- Only people with internet access can access the data
- The information is not always reliable
- Can access through mobile/PDA
- Data can be updated very quickly
- The internet may not be working so there will be difficulty accessing the pages.
- The internet has many different opinions
- Available all the time from anywhere
- Internet has a range of multimedia
Data Sources
Direct: no one or thing comes between the source of the data and yourself when collecting the data.
Indirect: something comes between yourself and the data collection - it might be bought, another person collects it or it is collected for a different purpose.
It is easiest to thing to think of direct data as data that has been physically collecting by you. This could be questionnaires, interviews, physcially viewing or collecting the data.
Indirect data has two interpretations:
- Data that has been used for a purpose different to that for which it was originally collected.
- The people/companies involved in collecting the data are different to those using the data. Typically this might be orginisations that conduct market surveys and then sell the results to other companies who use it ind advertising.
Advantages and Disadvantages of data sources
Direct:
- The source and collection method is known and verified.
- The exact data required can be collected.
- Can change the data being collected in response to answers.
- May not get a large range of data.
- Data may not be available - location/time.
Indirect:
- Large range of data available.
- Data can be available from different locations and time periods.
- Analysis might already have been completed on some of the data.
- Cannot be certain of accuracy of the recording of data.
- May not have all the information about how, when and where it was collected to make a valued opinion on its usefulness.
- Do not know if any bias was placed on collection.
Quality of information
Accuracy: The information needs to be correct. If it is not accurate you cannot rely on it.
Relevance: The information must relate to the topic. Having information that is not relevant increases the volume of data and can take time to look at.
Age: Information from the past may not be relevant today. Age can affect the accuracy of information.
Completeness: This is only some of the information.
Presentation: This is how easy it is to extract the information from. This is related to format and layout.
Level of detail: The voume of information - too much or too little information.
Encoding Data
Encoding: transforming the data from one format into another. Can include shortening, symbol replacement or abbreviations.
Advantages:
- Less memory requirement: storing less information, therefore less memory is required.
- Security: If the codes are not apparent then it is difficult to know and understand the meaning of the codes.
- Speed of input: the codes take less time to enter, therefore it is quicker to input a large volume of data.
- Data Validation: since the codes follow a strict set of numbers and letters they are easy to validate.
- Organisation of data: If the data is standardised format then it can be compared and organised.
Encoding continued...
Disadvantages:
- Precision of data coarsened: An example being, Light Blue encoded as Blue.
- Encoding of value judgements: An example being, 'Was the film good?' to be encoded as a judgement of 1-4. This will be encoded differently by different people and makes comparisons difficult.
- The user needs to know the codes used: If the user does not know the codes they cannot use them.
- Limited number of codes: If codes are made up of a range of letters and numbers the options will be limited.
- Difficult to track errors: Validation will ensure the code is entered correctly but the nature of the code will make it difficult to see if the code is actually correct.
Validation
Validation: A check that is performed by the computer as the data is being entered. It tries to prevent entry of any data that does not conform to pre-set rules.
There are many different rules that can be created; however, they will not stop incorrect data being entered, they will just ensure that the data that is entered is:
- sensible
- reasonable
- within acceptable boundaries
- complete
Range Checks:
A range check sets an upper and a lower boundary for the data. The data entered must lie between these two values.
Type Checks:
This makes sure that the data entered is of the correct type. Types of data include: Numeric, String, Boolean and Date/Time.
Validation continued...
Length Checks:
When any data is entered into a computer it has a length. A single character has a length of 1, 'Hello' has a length of 5. Length checks ensure that the maximum and minimum length is achieved by the data entered.
Lookup Validation:
This looks up the data in another table to ensure that it is valid. It might also return additional information - entering the postcode and house number returns the rest of the address.
Picture Checks:
This is also known as a format check. It checks the data type of each character in each position to check it conforms to rules.
Check Digit:
A check digit is calculated using a set of numbers and then added to the end of them. When the code is created, the check digit is created and added to the code. Before the code is processed, the check digit is recalculated and compared with the one in the code. If they are the same, processing continues. If they are not, an error has occured and the code value needs to be re-entered.
Validation and Data Types
Not all validation rules can be applied to all data types.
Validation
Range (All data types)
Type (Cannot be applied to string)
Presence (All data types)
Length (All data types)
Picture (All data types)
Check digit (Integer only)
More than one rule can be applied to the same field. For example, to validate a phone number, the field must be a string (cannot be validated), so you could apply a presence check, length check and picture check.
Verification
Verification cannot ensure the accuracy of data entered.
Verification ensures that data is entered correctly, not that it is correct.
- Verification is making sure that the information on the source document (paper based copy) is the same as the information on the object document (computer copy).
- It is making sure that the same information that is on the paper has been entered and not changed in any way - it is an exact copy.
There are two main methods of verification:
1. Computer Verification: data is entered twice and the computer performs the task of matching the two sets of data looking for mistakes.
2. Manual verfication: the individual is responsible for making sure the object and source are the same - either the same person re-reading or another person reading both copies.
The two main errors are transcription (miscopied) and transposition errors (numbers and letters are put round the wrong way.)
Back-up and Archiving
Back-up:
This is keeping a copy of the current data. If there is a failure of the computer system, then the back-up can be used to restore the data.
It is important as data is not lost. Information is valuable and needs to be protected. Once created the back-up should not be stored in the same location - it should be on removable media and taken to another location.
Archiving:
This is for long-term storage of data that is not required immediately. Archived data is often not required at all. It is taken off the system and stored in case it is required for an investigation in the future. When archiving, data is written to a large capacity storage device at long intervals, unlike back-ups, which should be written at short intervals.
Files should be archived when they are no longer needed immediately, and when you don't want to delete them permanently.
When you archive old files, you eliminate the waste of time and media that results from backing up unused files, free up hard drive space, and improve the performance of the system.
Cost of producing information
Hardware: can be used to collect the information, process it and output it. It can also be used to store the information for use at a later data. It may be necessary for the organisaton to purchase items of hardware. Ongoing hardware costs include repair and maintenance costs.
Software: needs to be purchased. In addition to the operating system this may include (depending on the context) DTP, graphics and website software.
Consumables: are items that get used - paper, printer ink, toner and electricity being the main items.
Personnel: these are the costs related to people working in the orginisation. People are required to collect, collate, enter, process and output information. There may also be costs involved in sending people on software training courses, and the associated costs of covering their jobs whilst they are absent. Depending on the information that is being produced, there may be additional costs that involve checking the accuracy of the data.
Input, processing, output, storage and feedback
Input, processing, output, storage and feedback
Input: This is taking information that is external to the system and entering it into the system. This may be manual input or automated input (e.g. OMR). It may also be input by electronic means - via a network or CD/disk.
Processing: This is an action performed on the data. Processing can include sorting, searching or performing calculations on the data.
Output: This is taking information that was in the system and outputting it. The method used may result in printed output, screen output or electronic output (e.g. disk/CD).
Storage: This is where data is held. It may be the data that has been input, data required during processing or the results of processing. This is data that is still within the system.
Feedback: This is where the output from the system forms part of the input to the system. Feedback is usually applied to real-time situations. If the response to the feedback is automatic then the process is a closed loop. If there is an operator involved then the process is an open loop.
Related discussions on The Student Room
- Is data analyst seen as a ‘professional’ career? »
- Unit 14 IT Service Delivery Part A Question »
- MSc Data Science at Uni of Manchester »
- Which uni is good? »
- 5 ways how AI is improving cloud computing »
- What are the 5 concepts of business intelligence? »
- Online MSc in Computer Science with Data Science, Sunderland »
- Help! What are CamSIS Data!? »
- Best Uni to study marketing »
- [Birkbeck] - MSc in Data Science / Computer Science - Interview + Test »
Comments
Report
Report
Report
Report