ISC 4350
Written Assignment 3 Database and Data Mining Security
2) Using the two-step commit presented in the beginning of this chapter, describe how to avoid assigning one seat to two people, as in the airline example. That is, list precisely which steps the database manager should follow in assigning passengers to seats.
Intent:
1) Check the value of COMMIT-FLAG in the data base. If it is set, wait until available, if not set then proceed to step 2.
2) The agent checks 2) The agent checks the system to see if there are seats available on the flight.
3) The agent then prepares 3) The agent then prepares selections of seats in the seat database, but does not process yet.
Commit:
1) Set COMMIT-FLAG in the database. 1) Set COMMIT-FLAG in the database.
2) Copy seat file to database. 2) Copy seat file to database.
3) Prepare ticket, indicate transaction completed in log. 3) Prepare ticket, indicate transaction completed in log.
4) Unset COMMIT-FLAG. 4) Unset COMMIT-FLAG.
The agent prepares to reserve the seat in the intent phase by gathering all the necessary information. During the commit phase the agent sets the COMMIT-FLAG so that the same seat cannot be assigned twice. Then after updating the database, releases it by unsetting the COMMIT-FLAG.
6) Can a database contain two identical records without a negative effect on the integrity of the database? Why or why not?
In general it is probably not a great idea to have identical records because they have a tendency not to remain that way. If one record for John Doe has the address updated but the other does not, there will be conflicting information in the system. In order to have duplicate records and maintain the integrity of the system, these records would have to be linked in such a manor so that when one updates the other will also be updated.
10) Suppose query Q1 the median M1 of a set S1 of values. Suppose query Q2 obtains the median M2 of a subset S2 of S1. If M1 < M2 what can be inferred about S1, S2 and the elements of S1 not in S2?
If I say that S1 is the number of boys in the dorm, and S2 is the number of students receiving financial aid then, S1 not in S2 would be students that do not receive financial aid. If M1 < M2, then a lower proportion of boys than girls receive financial aid. If I say that S1 is the number of boys in the dorm, and S2 is the number of students receiving financial aid then, S1 not in S2 would be students that do not receive financial aid. If M1 < M2, then a lower proportion of boys than girls receive financial aid.
11) Disclosure of the sum of all financial aid for students in Smith dorm is not sensitive because no individual student is associated with an amount. Similarly, a list of name of students receiving financial aid is not sensitive because no amounts are specified. However, the combination of these two lists reveals the amount for an individual student if only one student in Smith dorm receives aid. What computation would a database management system have to perform to determine that the list of names might reveal sensitive data? What records would the database management system have to maintain on what different users know in order to determine that the list of names might reveal sensitive data?
The data base should use the rule of “n items over k percent”, so that data will be withheld if n items represents over k percent of the result reported. Information can be maintained on all users. This information can be tracked so that no user would be able to compile enough information that will allow them to infer sensitive data. Example: A is not sensitive, B is not sensitive, The data base should use the rule of “n items over k percent”, so that data will be withheld if n items represents over k percent of the result reported. Information can be maintained on all users. This information can be tracked so that no user would be able to compile enough information that will allow them to infer sensitive data. Example: A is not sensitive, B is not sensitive, and C is sensitive. A + B = C. The system would need to block someone who queries A from querying B and therefore being able to infer C. The n-item k-percent rule eliminates certain low frequency elements from being displayed. Suppression and/or concealing can also be used to protect sensitive data.