SCD - Unit 1.4.2

?

Unit 1.4.2 T1 - Arrays, Tuples, Records

Data Structures:

  • Elementary - Char, real, Integer, Boolean
  • Built-in - Strings, Arrays, Lists, Records

Structured data types are usually made of elementary e.g. a string can have letters, numbers, etc.

2D arrays (and any other) have to have their size delclared e.g. [2,2]

  • Arrays with any number of dimensions can be defined, it is a set of elements with the same data type
  • Array elements can be referenced using arrayname[arrayindex]

Tuple - An ordered set of values that are immutable (values cannot change)

Records - Fixed number of fields, different data types, implemented as an object and treated like a tuple, when writing records are usually written as a single line

Dynamic Data Structures - Changes size when required

Static Data Structures - Cannot change size (arrays in most languages do not automatically change size; some provide resizing functions but must be programmed)

1 of 24

Unit 1.4.2 T2 - Queues

4 distint operators of a queue:

  • Add item at rear of queue
  • Remove item from front of queue
  • Check if empty
  • Check if full

When items leave the queue they remain as it is wasteful of CPU cycles to remove and fill positions with blanks instead pointers are used:

  • Front - Holds index (number) of next position to be removed
  • Rear - Holds index of last position added

Full queues cannot be added to and empty queues cannot have items removed

A queue may be full as when initialising the max no. of values it can hold needs to be declared

Another valirable size may be needed to hold the number of items currently in the queue

2 of 24

Unit 1.4.2 T2 - Queues - Functions/Methods

  • enQueue(item) - Adds item to rear of queue
  • deQueue() - Remove and return item from the front
  • isEmpty() - Inicates if queue is empty
  • isFull() - Indicates if queue is full

Queues can become circular to overcome the problem of reusing spaces freed by dequeuing from the front of the array as pointers go 0 >1>2>3>4>0>1, the MOD function allows array positions to be changed

Psuedocode: (1st two are for standard queues, 2nd two are for circular queues)

3 of 24

Unit 1.4.2 T2 - Queues - Priority Queue

Priority queues allow items to jump the queue by assigning priorities to arriving items

Example: Each item has a priority associated with it,e.g.

  • Buses ending with A are high priority
  • Buses ending with B are medium priority
  • Buses ending with C are low priority

92B, 64C, 142A, 25C, 87B > 142A, 92B, 87B, 64C, 25C

An algorithm such as an insertion sort can be used to insert buses into the queue. If 142A leaves and 75B joins the queue now look like this: 92B, 87B, 75B, 64C, 25C

4 of 24

Unit 1.4.2 T3 - Lists and Linked Lists

Lists can be static (cannot change size) or dynamic(grow and shrink)

Programming Operations for a list:

  • isEmpty() - Test for empty list 
  • append(item) - Add a new item to end of list
  • remove(item) - Remove first occourrance of an item from list
  • count(item) - Return the number of occurrences of item in list
  • len(list) - Return no. of items in list
  • index(item) - Return position of item
  • insert(pos,item) - Add new item as position pos
  • pop() - Remove and return last item in list
  • pop(pos) - Remove and return item at position pos

Using a dynamic structure such as a list to implement a queue does not need maxSize and isFull as it is never full

5 of 24

1.4.2 T3 - Linked Lists

Linked lists, like lists are dynamic abstract data structures and is also implemented as an array and uses pointers

Linked lists have nodes with 2 parts:

  • Data (can be complex data structure)
  • Pointer of next node

A nextfree pointer is used to show the index of the next free space in the array

A start pointer is used to point to the first node in the list (if empty this is null as with any pointers with nothing to point to)

The last node in the list has the pointer null and each pointer holds the next index of the next node depending on how it is sorted (numerical/alphabetical order)

An alphabetically ordered list may look like this

6 of 24

1.4.2 T3 - Linked Lists - Adding & Deleting Nodes

When adding a new node:

  • Data is placed in node pointed to be nextfree
  • Follow pointers to where the new node needs to be linked in
  • Adjust pointers

When deleting a node:

  • Adjust pointers
  • Pointers of previous item before deleted node needs to point to index of new next node in list
  • nextfree pointer adjusted to deleted node if list is full; else it remains the same
7 of 24

1.4.2 T3 - Linked lists - Peeking Ahead

Peeking:

Data and pointer in the current node (p) can be examined without altering the data

8 of 24

1.4.2 T4 - Stacks

Another example of an Abstract data type (ADT) such as queues, lists and linked lists

Stack is a Last in First Out (LIFO) list unlike a queue which is a First in First Out (FIFO) list

Used in many cases such as an interrupt

Opertations of modelling a stack:

  • Add item to the top
  • Remove item from the top
  • Check if stack is full
  • Check if stack is empty

Programming Operations:

  • Push(item) – adds an item to the top of the stack
  • Pop() – removes and returns the item on the top of the stack
  • isFull()
  • isEmpty()
  • peek() – Returns an item from the top without removing it
  • size() – Return the number of items in a stack
9 of 24

1.4.2 T4 - Stacks as lists

Stacks can bne implemented as a list using operations for functions such as:

  • Append() – Add an item to the stack
  • Pop() – remove an item from the stack
  • isEmpty() – Check if the stack is empty
  • len(stack) – Check the number of items in the stack

Overflow - Attempting to push onto a full stack (unless dynamic)

Underflow - Attempting to pop from a stack that is empty

Stack overflow error may be implemented in a dynamic list but as an overflow cannot happen it will only error if the computer runs out of memory

10 of 24

1.4.2 T4 - Call Stack

Call Stack

System level data structure (on CPU level) that provides mechanism for passing parameters and return addresses to subroutines, this is hidden from the user in high-level languages

An example of this is comparing numbers:  Bigger = max(num1,num2) as the programmer doesn't need to know how the arguments are sent to the function or how the result is returned. The values and the return address are saved on the stack and the values are popped when the function completes

11 of 24

1.4.2 T4 - Subroutines

A subroutine can also be called using the following method:

  • Parameters are saved onto stack
  • Address to which execution returns after the end of the subroutine is reached and saved onto stack
  • Execution is transferred to subroutine code

This is then executed as follows

  • Stack space is allocated for local variables
  • Subroutine code executes
  • Return address is retrieved
  • Parameters are popped
  • Execution is transferred back to the return address
12 of 24

1.4.2 T5 - Hash Tables

A hash table is another ADT which can find any record in a data set almost instantly, no matter the size where the address of the data is calculated from the data itself using a hashing algorithm

A hash table is what is formed when a set of data has a hashing algorithm/hash function applied to it

A problem however is that sometimes the address of a piece of data may be the same as another e.g. Assign 1000 slots to hold records - Address - key mod (number of slots)

When the same address is generated for different primary keys this is called a hashing collision (or a synonym)

To deal with this you can add 1 to the address or putting it in the next free slot, this however can lead to clustering in the table. Alternatively the algorithm increments the skip value by adding 1,3,5,7 etc. this skip value needs to be such that all slots are included which can be easily avoided by using a table with a prime number of slots

13 of 24

1.4.2 T5 - Load Factor + Searching a Hash Table

As a hashing table fills up its load factor (no. of items against table size) increades and more collisions occour, for this reason a table should not have a load factor outside of 50-70% of the table size so it is big enough but no too big

Psuedocode for searching a hash table:

For items that aren’t in the next slot keep going on until the next free slot is found, this means that the value is not in the table.

Many methods exist to search but all use the mod function at some point as the hash address has to be in the range of table addresses

14 of 24

1.4.2 T5 - Other Hashing Methods

Mid Square:

  • Square item (assuming it is numeric)
  • Extract the middle digits
  • Perform mod function to get an address

Folding:

  • Divide item e.g. 34721062 into equal size pieces > 34, 72, 10, 62 (last piece may not be of equal size)
  • Add pieces together
  • Perform mod function to get an address

Dealing with alphanumeric data - can be converted to numeric by using the ASCII code for each character then apply the hashing alogoithm to the reulting digits

Dealing with deletions - itmes to be deleted must be replaced with a dummy item (e.g. Deleted/##) as when searching for an item that hashes to that spot, if not found the algoithm will continue until the value is found or a free space is located. If a new item is added to the address with deleted it will replace the dummy item

15 of 24

1.4.2 T5 - Dictionary

Dictionaries are an ADT made up of associated pairs

  • Each pair has a key and a value
  • Value is only accessed by the key
  • example (python): ID = {145: "Ken", 675: "Belinda")
  • Multiple items can share the same key

Used in ASCII or Unicode table in a computer system, translations program, trees and graphs

Operations:

  • Add new key:value pair to the dictionary
  • Delete a key:value pair from the dictionary
  • Amend the value in a key:value pair
  • Return the value associated with key k
  • Return True or False if the key is in the dictionary
  • Return length of dictionary

Address of key:vlaue paur is calculated in a hashing algorithm so they are not held in any particular sequence

For keys with multiple items enclose the items in square brackets[]

16 of 24

1.4.2 T6 - Graphs

A graph is another ADT that represents complex relationships

Graphs contain Nodes/Vertex, Edges/Arcs and Edge Weights

Types of graph:

  • Undirected - No arrows indicating direction (bidirectional)
  • Directed - Edges are all one-way
  • Unweighted - Edges do not have a value associated
  • Weighted - Edges which indicate a cost of traversal
17 of 24

1.4.2 T6 - Representing Graphs (Adjacency Matrix)

A computer needs to represent the information about distances and connections in a structured, numerical way, this is implemented using an Adjacency Matrix or an Adjacency List

Adjacency Matrix - Graph is represented by a table, each row and column in a table represents a nodeThe item in the cell [row, column] indicates a connection (in an unweighted graph, this can be a 1). If the graph is undirected the matrix will be symmetric, if it is weighted the values in the cells represent the weights.

  • Advantages - Easier to work with and adding edges is simple
  • Disadvantages - A sparse graph will leave cells empty wasting memory space
18 of 24

1.4.2 T6 - Representing Graphs (Adjacency List)

Adjacency List - List of nodes is created and each node points to a list of adjacent (connected) nodes

Weighted graphs can be represented as a dictionary of dictionaries, each key = node, value = dictionary of adjacent edges and weights). Example: AdjacencyList = {A: {C:7, D:13}, B:{C:4, D:5}}

1.4.2 T6 - Graph Traversal + Applications

There are 2 methods for traversing a graph:

Depth First - Follow each connection all the way to the end before going to the next one (often left first) then backtrack

Breadth first – Check each neighbouring connection before moving to the next level

Graphs are used in: Networks, states in a finite state machine, social networks, web pages, chemistry, project managment, maps

20 of 24

1.4.2 T7 - Graphs

Tree is another common ADT that has a root, branches and leaves, the difference in computer science is that the root is on the top

A tree is also a connected, undirected graph with no cycles (unique path from one node to any other node)

In a rooted tree a node is identified as the root that every node apart from it has one unique parent

  • Node - Contains tree data
  • Edge - connects 2 nodes
  • Root - Only node with no incoming edges
  • Child - Set of nodes that have incoming edges from the same node
  • Parent - Node at the top of connecting edges
  • Subtree - Set of Nodes with a parent and descendants (can also be a leaf)
  • Leaf - Node with no children (at bottom)
21 of 24

1.4.2 T7 - Binary Trees

Binary Tree - Rooted tree where each node has a maximum of 2 children

Each node consists of: Data, Left pointer to a child, Right pointer to a child

Bulding a binary tree:

  • Nodes are added in order given, first item is the root
  • Starting at the root each time, if item is less than root it is added to the left, otherwise it goes to the right, this same rule applies at the root of each sub-tree

22 of 24

1.4.2 T7 - Trees as an array + Traversal

Trees can also be represented as an array with numbers in the left and right columns indicating the index position of the array, and a -1 indicating that there is no sub-tree

Traversal Methods:

  • Pre-order – Visit root then traverse the left sub-tree, then the right sub-tree, Place dot before letter
  • In-order – Traverse the left sub-tree, then visit the root, then traverse the right sub-tree, Place dot under letter
  • Post-order – Traverse the left sub-tree, then traverse the right sub-tree, then visit the root, Place dot after letter
23 of 24

1.4.2 T7 - Tree Traversal Algorithms

In-Order - Traversing the nodes in sequential order, alphabetic or numeric, goes down left tree until -1 reached, back up up to previous node, checking right tree until at the top, then right tree traversed

Pre-Order - Used to produce Pre-fix/Polish Notation for a mathematical expression so the expression - +12 15 4 is the pre-order form of the expression 12 + 15 -4

Post-Order - Used to produce post-fix or Reverse Polish notation for an expression. e.g. expression 12 15 + 4 - is the post order form of the expression 12 + 15 - 4

Algorithm:

  • Procedure traverse(p)
  •  print (tree[p].data) – Pre-Order Only
  •   if tree[p].left <> -1 then
  •   traverse(tree[p].left)
  •  endif
  •  print (tree[p].data) – In-Order Only
  •   if tree[p].right <> -1 then
  •  traverse(tree[p].right)
  •  endif
  •  print (tree[p].data) – Post-Order Only
  • endprocedure
24 of 24

Comments

No comments have yet been made

Similar Computing resources:

See all Computing resources »See all Data structures resources »