Saturday, February 1, 2014

35. SQL Injection


SQL Injection: What is it?


SQL Injection is one of the many web attack mechanisms used by hackers to steal data from organizations. It is perhaps one of the most common application layer attack techniques used today. It is the type of attack that takes advantage of improper coding of your web applications that allows hacker to inject SQL commands into say a login form to allow them to gain access to the data held within your database.

In essence, SQL Injection arises because the fields available for user input allow SQL statements to pass through and query the database directly.



SQL Injection: An In-depth Explanation


Web applications allow legitimate website visitors to submit and retrieve data to/from a database over the Internet using their preferred web browser. Databases are central to modern websites – they store data needed for websites to deliver specific content to visitors and render information to customers, suppliers, employees and a host of stakeholders. User credentials, financial and payment information, company statistics may all be resident within a database and accessed by legitimate users through off-the-shelf and custom web applications. Web applications and databases allow you to regularly run your business.

SQL Injection is the hacking technique which attempts to pass SQL commands (statements) through a web application for execution by the backend database. If not sanitized properly, web applications may result in SQL Injection attacks that allow hackers to view information from the database and/or even wipe it out.

Such features as login pages, support and product request forms, feedback forms, search pages, shopping carts and the general delivery of dynamic content, shape modern websites and provide businesses with the means necessary to communicate with prospects and customers. These website features are all examples of web applications which may be either purchased off-the-shelf or developed as bespoke programs.

These website features are all susceptible to SQL Injection attacks which arise because the fields available for user input allow SQL statements to pass through and query the database directly.


SQL Injection: A Simple Example


Take a simple login page where a legitimate user would enter his username and password combination to enter a secure area to view his personal details or upload his comments in a forum.

When the legitimate user submits his details, an SQL query is generated from these details and submitted to the database for verification. If valid, the user is allowed access. In other words, the web application that controls the login page will communicate with the database through a series of planned commands so as to verify the username and password combination. On verification, the legitimate user is granted appropriate access.

Through SQL Injection, the hacker may input specifically crafted SQL commands with the intent of bypassing the login form barrier and seeing what lies behind it. This is only possible if the inputs are not properly sanitised (i.e., made invulnerable) and sent directly with the SQL query to the database. SQL Injection vulnerabilities provide the means for a hacker to communicate directly to the database.

The technologies vulnerable to this attack are dynamic script languages including ASP, ASP.NET, PHP, JSP, and CGI. All an attacker needs to perform an SQL Injection hacking attack is a web browser, knowledge of SQL queries and creative guess work to important table and field names. The sheer simplicity of SQL Injection has fuelled its popularity.


Why is it possible to pass SQL queries directly to a database that is hidden behind a firewall and any other security mechanism?


Firewalls and similar intrusion detection mechanisms provide little or no defense against full-scale SQL Injection web attacks.


Since your website needs to be public, security mechanisms will allow public web traffic to communicate with your web application/s (generally over port 80/443). The web application has open access to the database in order to return (update) the requested (changed) information.

In SQL Injection, the hacker uses SQL queries and creativity to get to the database of sensitive corporate data through the web application.

SQL or Structured Query Language is the computer language that allows you to store, manipulate, and retrieve data stored in a relational database (or a collection of tables which organise and structure data). SQL is, in fact, the only way that a web application (and users) can interact with the database. Examples of relational databases include Oracle, Microsoft Access, MS SQL Server, MySQL, and Filemaker Pro, all of which use SQL as their basic building blocks.

SQL commands include SELECT, INSERT, DELETE and DROP TABLE. DROP TABLE is as ominous as it sounds and in fact will eliminate the table with a particular name.

In the legitimate scenario of the login page example above, the SQL commands planned for the web application may look like the following:



SELECT count(*)
FROM users_list_table
WHERE username=’FIELD_USERNAME’
AND password=’FIELD_PASSWORD”



In plain English, this SQL command (from the web application) instructs the database to match the username and password input by the legitimate user to the combination it has already stored.

Each type of web application is hard coded with specific SQL queries that it will execute when performing its legitimate functions and communicating with the database. If any input field of the web application is not properly sanitised, a hacker may inject additional SQL commands that broaden the range of SQL commands the web application will execute, thus going beyond the original intended design and function.

A hacker will thus have a clear channel of communication (or, in layman terms, a tunnel) to the database irrespective of all the intrusion detection systems and network security equipment installed before the physical database server.

Is my database at risk to SQL Injection?




SQL Injection is one of the most common application layer attacks currently being used on the Internet. Despite the fact that it is relatively easy to protect against SQL Injection, there are a large number of web applications that remain vulnerable.

According to the Web Application Security Consortium (WASC) 9% of the total hacking incidents reported in the media until 27th July 2006 were due to SQL Injection. More recent data from our own research shows that about 50% of the websites we have scanned this year are susceptible to SQL Injection vulnerabilities.

It may be difficult to answer the question whether your web site and web applications are vulnerable to SQL Injection especially if you are not a programmer or you are not the person who has coded your web applications.

Our experience leads us to believe that there is a significant chance that your data is already at risk from SQL Injection.

Whether an attacker is able to see the data stored on the database or not, really depends on how your website is coded to display the results of the queries sent. What is certain is that the attacker will be able to execute arbitrary SQL Commands on the vulnerable system, either to compromise it or else to obtain information.

If improperly coded, then you run the risk of having your customer and company data compromised.

What an attacker gains access to also depends on the level of security set by the database. The database could be set to restrict to certain commands only. A read access normally is enabled for use by web application back ends.

Even if an attacker is not able to modify the system, he would still be able to read valuable information.


What is the impact of SQL Injection?



Once an attacker realizes that a system is vulnerable to SQL Injection, he is able to inject SQL Query / Commands through an input form field. This is equivalent to handing the attacker your database and allowing him to execute any SQL command including DROP TABLE to the database!

An attacker may execute arbitrary SQL statements on the vulnerable system. This may compromise the integrity of your database and/or expose sensitive information. Depending on the back-end database in use, SQL injection vulnerabilities lead to varying levels of data/system access for the attacker. It may be possible to manipulate existing queries, to UNION (used to select related information from two tables) arbitrary data, use subselects, or append additional queries.

In some cases, it may be possible to read in or write out to files, or to execute shell commands on the underlying operating system. Certain SQL Servers such as Microsoft SQL Server contain stored and extended procedures (database server functions). If an attacker can obtain access to these procedures, it could spell disaster.

Unfortunately the impact of SQL Injection is only uncovered when the theft is discovered. Data is being unwittingly stolen through various hack attacks all the time. The more expert of hackers rarely get caught.


Example of a SQLInjection Attack


Here is a sample basic HTML form with two inputs, login and password.

<form method="post" action="http://testasp.vulnweb.com/login.asp">

<input name="tfUName" type="text" id="tfUName">

<input name="tfUPass" type="password" id="tfUPass">

</form>


The easiest way for the login.asp to work is by building a database query that looks like this:


SELECT id
FROM logins
WHERE username = '$username'
AND password = '$password’



If the variables $username and $password are requested directly from the user's input, this can easily be compromised. Suppose that we gave "Joe" as a username and that the following string was provided as a password: anything' OR 'x'='x



SELECT id
FROM logins
WHERE username = 'Joe'
AND password = 'anything' OR 'x'='x'



As the inputs of the web application are not properly sanitised, the use of the single quotes has turned the WHERE SQL command into a two-component clause.



The 'x'='x' part guarantees to be true regardless of what the first part contains.



This will allow the attacker to bypass the login form without actually knowing a valid username / password combination!


How do I prevent SQL Injection attacks?


Firewalls and similar intrusion detection mechanisms provide little defense against full-scale web attacks. Since your website needs to be public, security mechanisms will allow public web traffic to communicate with your databases servers through web applications. Isn’t this what they have been designed to do?


Patching your servers, databases, programming languages and operating systems is critical but will in no way the best way to prevent SQL Injection Attacks.


34. Difference between Where and Having Clause in SQL?



In SQL, what’s the difference between the having clause and the where clause?


The difference between the having and where clause is best illustrated by an example. Suppose we have a table called emp_bonus as shown below. Note that the table has multiple entries for employees A and B.


emp_bonus
Employee
Bonus
A
1000
B
2000
A
500
C
700
B
1250

If we want to calculate the total bonus that each employee received, then we would write a SQL statement like this:


select employee, sum(bonus) from emp_bonus group by employee;

The Group By Clause


In the SQL statement above, you can see that we use the "group by" clause with the employee column. What the group by clause does is allow us to find the sum of the bonuses for each employee. Using the ‘group by’ in combination with the ‘sum(bonus)’ statement will give us the sum of all the bonuses for employees A, B, and C.


Running the SQL above would return this:

EmployeeSum(Bonus)
A1500
B3250
C700


Now, suppose we wanted to find the employees who received more than $1,000 in bonuses for the year of 2007. You might think that we could write a query like this:

BAD SQL:
select employee, sum(bonus) from emp_bonus 
group by employee where sum(bonus) > 1000;

The WHERE clause does not work with aggregates like SUM


The SQL above will not work, because the where clause doesn’t work with aggregates – like sum, avg, max, etc.. Instead, what we will need to use is the having clause. The having clause was added to sql just so we could compare aggregates to other values – just how the ‘where’ clause can be used with non-aggregates. Now, the correct sql will look like this:

GOOD SQL:
select employee, sum(bonus) from emp_bonus 
group by employee having sum(bonus) > 1000;

Difference between having and where clause


So we can see that the difference between the having and where clause in sql is that the where clause can not be used with aggregates, but the having clause can. One way to think of it is that the having clause is an additional filter to the where clause.


33. RTM: Requirements Traceability Matrix

What is Requirements Traceability Matrix in Software Testing?

Requirements tracing is the process of documenting the links between the user requirements for the system you're building and the work products developed to implement and verify those requirements. These work products include Software requirements, design specifications, Software code, test plans and other artifacts of the systems development process. Requirements tracing helps the project team to understand which parts of the design and code implement the user's requirements, and which tests are necessary to verify that the user's requirements have been implemented correctly.

Requirements Traceability Matrix Document is the output of Requirements Management phase of SDLC.




The Requirements Traceability Matrix (RTM) captures the complete user and system requirements for the system, or a portion of the system. The RTM captures all requirements and their traceability in a single document, and is a mandatory deliverable at the conclusion of the lifecycle. 

The RTM is used to record the relationship of the requirements to the design, development, testing and release of the software as the requirements are allocated to a specific release of the software. Changes to the requirements are also recorded and tracked in the RTM. The RTM is maintained throughout the lifecycle of the release, and is reviewed and baselined at the end of the release.
It is very useful document to track Time, Change Management and Risk Management in the Software Development. 



Here I am providing the sample template of Requirement Traceability Matrix, which gives detailed idea of the importance of RTM in SDLC.



The RTM Template shows the Mapping between the actual Requirement and User Requirement/System Requirement.

Any changes that happens after the system has been built we can trace the impact of the change on the Application through RTM Matrix. This is also the mapping between actual Requirement and Design Specification. This helps us in tracing the changes that may happen with respect to the Design Document during the Development process of the application. Here we will give specific Document unique ID, which is associated with that particular requirement to easily trace that particular document.


Requirements Traceability Matrix Template Instructions:


This document presents the requirements traceability matrix (RTM) for the 'Project Name'  and provides traceability between the customer approved requirements, design specifications, and test scripts. 


The table below displays the RTM for the requirements that were approved for inclusion in Application Name/Version. The following information is provided for each requirement:



1. Requirement ID
2. Risks
3. Requirement Type (User or System)
4. Requirement Description
5. Trace to User Requirement/Trace From System Requirement
6. Trace to Design Specification
7. UT * Unit Test Cases
8. IT * Integration Test Cases
9. ST * System Test Cases
10. UAT * User Acceptance Test Cases
11. Trace to Test Script



Disadvantages of not using Traceability Matrix

What happens if the Traceability factor is not considered while developing the software?


a) The system that is built may not have the necessary functionality to meet the customers and users needs and expectations
b) If there are modifications in the design specifications, there is no means of tracking the changes
c) If there is no mapping of test cases to the requirements, it may result in missing a major defect in the system
d) The completed system may have 'Extra' functionality that may have not been specified in the design specification , resulting in wastage of manpower, time and effort.
e) If the code component that constitutes the customer's high priority requirements is not known, then the areas that need to be worked first may not be known thereby decreasing the chances of shipping a useful product on schedule
f) A seemingly simple request might involve changes to several parts of the system and if proper Traceability process is not followed, the evaluation of the work that may be needed to satisfy the request may not be correctly evaluated



Where can a Traceability Matrix be used?


Is the Traceability Matrix applicable only for big projects?



The Traceability Matrix is an essential part of any Software development process, and hence irrespective of the size of the project, whenever there is a requirement to build a Software this concept comes into focus.
The biggest advantage of Traceability Matrix is backward and forward traceability. (i.e) At any point of time in the development life cycle the status of the project and the modules that have been tested could be easily determined thereby reducing the possibility of speculations about the status of the project.

Developing a Traceability Matrix




How is the Traceability Matrix developed?



In the diagram, based on the requirements the design is carried out and based on the design, the codes are developed. Finally, based on these the tests are created. At any point of time , there is always the provision for checking which test case was developed for which design and for which requirement that design was carried out. Such a kind of Traceability in the form of a matrix is the Traceability Matrix.

In the design document, if there is a Design description A, which can be traced back to the Requirement Specification A, implying that the design A takes care of Requirement A. Similarly in the test plan, Test Case A takes care of testing the Design A, which in turn takes care of Requirement A and so on.


There has to be references from Design document back to Requirement document, from Test plan back to Design document and so on.

Usually Unit test cases will have Traceability to Design Specification and System test cases /Acceptance Test cases will have Traceability to Requirement Specification. This helps to ensure that no requirement is left uncovered (either un-designed / un-tested).


Requirements Traceability enhances project control and quality. It is a process of documenting the links between user requirements for a system and the work products developed to implement and verify those requirements. It is a technique to support an objective of requirements management, to make certain that the application will meet end-users needs.



Traceability Matrix in testing



Where exactly does the Traceability Matrix gets involved in the broader picture of Testing?



The Traceability Matrix is created even before any test cases are written, because it is a complete list indicating what has to be tested. Sometimes there is one test case for each requirement or several requirements can be validated by one test scenario. This purely depends on the kind of application that is available for testing.



32. Difference between Web service and Windows service?

  • A Windows service is an application that runs without a user being logged into the system, usually to process some data on the machine that needs no user intervention to work with.
  • A Web service is a website that, when contacted, returns XML (typically) in one of several standard formats for the service consumer to process.
  • A Windows Service runs as a local application, while a web service is typically accessed remotely using a stateless protocol. Communicating with a Windows Service uses local machine protocols, while a web service uses XML.

Points to consider:


  • Windows service is not a web based whereas the web service is web-based
  • Windows service will start as windows will start whereas the web service is located at remote location on the server.
  • Windows service can be seen in the windows administrative tools>>Services
  • Finally we can say that Windows services are used locally on the machine and the Web services are used on internet based web applications.

31. What is a Formal Review?


Formal reviews follow a formal process. It is well structured and regulated.

A formal review process consists of six main steps:
  • Planning
  • Kick-off
  • Preparation
  • Review meeting
  • Rework
  • Follow-up

1. Planning: The first phase of the formal review is the Planning phase. In this phase thereview process begins with a request for review by the author to the moderator (or inspection leader). A moderator has to take care of the scheduling like date, time, place and invitation of the review. For the formal reviews the moderator performs the entry check and also defines the formal exit criteria. The entry check is done to ensure that the reviewer’s time is not wasted on a document that is not ready for review. After doing the entry check if the document is found to have very little defects then it’s ready to go for the reviews. So, the entry criteria are to check that whether the document is ready to enter the formal review process or not. Hence the entry criteria for any document to go for the reviews are:
  • The documents should not reveal a large number of major defects.
  • The documents to be reviewed should be with line numbers.
  • The documents should be cleaned up by running any automated checks that apply.
  • The author should feel confident about the quality of the document so that he can join the review team with that document.

Once, the document clear the entry check the moderator and author decides that which part of the document is to be reviewed. Since the human mind can understand only a limited set of pages at one time so in a review the maximum size is between 10 and 20 pages. Hence checking the documents improves the moderator ability to lead the meeting because it ensures the better understanding.

2. Kick-off: This kick-off meeting is an optional step in a review procedure. The goal of this step is to give a short introduction on the objectives of the review and the documents to everyone in the meeting. The relationships between the document under review and the other documents are also explained, especially if the numbers of related documents are high. At customer sites, we have measured results up to 70% more major defects found per page as a result of performing a kick-off, [van Veenendaal and van der Zwan, 2000].

3. Preparation: In this step the reviewers review the document individually using the related documents, procedures, rules and checklists provided. Each participant while reviewing individually identifies the defects, questions and comments according to their understanding of the document and role. After that all issues are recorded using a logging form. The success factor for a thorough preparation is the number of pages checked per hour. This is called the checking rate. Usually the checking rate is in the range of 5 to 10 pages per hour.

4. Review meeting: The review meeting consists of three phases:

Logging phase: In this phase the issues and the defects that have been identified during the preparation step are logged page by page. The logging is basically done by the author or by a scribe. Scribe is a separate person to do the logging and is especially useful for the formal review types such as an inspection. Every defects and it’s severity should be logged in any of the three severity classes given below:

– Critical:The defects will cause downstream damage.
– Major: The defects could cause a downstream damage.
– Minor: The defects are highly unlikely to cause the downstream damage.

During the logging phase the moderator focuses on logging as many defects as possible within a certain time frame and tries to keep a good logging rate (number of defects logged per minute). In formal review meeting the good logging rate should be between one and two defects logged per minute.

Discussion phase: If any issue needs discussion then the item is logged and then handled in the discussion phase. As chairman of the discussion meeting, the moderator takes care of the people issues and prevents discussion from getting too personal and calls for a break to cool down the heated discussion. The outcome of the discussions is documented for the future reference.

Decision phase: At the end of the meeting a decision on the document under review has to be made by the participants, sometimes based on formal exit criteria. Exit criteria are the average number of critical and/or major defects found per page (for example no more than three critical/major defects per page). If the number of defects found per page is more than a certain level then the document must be reviewed again, after it has been reworked.

5. Rework: In this step if the number of defects found per page exceeds the certain level then the document has to be reworked. Not every defect that is found leads to rework. It is the author’s responsibility to judge whether the defect has to be fixed. If nothing can be done about an issue then at least it should be indicated that the author has considered the issue.

6. Follow-up: In this step the moderator check to make sure that the author has taken action on all known defects. If it is decided that all participants will check the updated documents then the moderator takes care of the distribution and collects the feedback. It is the responsibility of the moderator to ensure that the information is correct and stored for future analysis.

30. Difference between Inspection and Walkthrough?

Walkthrough: It is a step-by-step presentation by the author of a document in order to gather information and to establish a common understanding of its content.

It is review of a software work product by colleagues of the producer of the product for the purpose of identifying defects and improvements.

Inspection: It is a type of peer review that relies on visual examination of documents to detect defects, e.g. violations of development standards and non-conformance to higher level documentation. The most formal review technique and therefore always based on a documented procedure. Its purpose is to the identifying defects and improvements.

But I always believe in remembering anything in my own language. And everyone should do in the same way. If you will try to remember these definitions, you may not understand properly also you may forget soon.

Walkthrough :-


1.) It's a type of Semi Formal Review.
2.) 2 to 7 People are involved.
3.) Author is Presenter.
4.) Lead by Author only.
5.) Reviewers are not aware of the subject/topic.
6.) Other people are Inspectors or Reviewers.

Inspection :-


1.) It's totally a Formal Review.
2.) 2 to 10 or more People are involved.
3.) Author is not presenter. Some one else is giving presentation.
4.) Lead by Moderator.
5.) Reviewers are aware & well prepared for the subject/topic.
6.) Other people are Inspectors or Reviewers.
7.) Minute Taker is writing MOM.
8.) Recorder is noting down everything. Like defects, changes, improvements etc.
9.) Moderator's responsibility is to make sure that meeting is successful.


Walkthrough: Author presents their developed artifacts to an audience of peers. Peers question and comment on the artifacts to identify as many defects as possible. It involves no prior preparation by the audience. Usually, it involves minimal documentation of either the process or any arising issues. Defect tracking in walkthroughs is inconsistent.

Inspection: is used to verify the compliance of the product with the specified standards and requirements. It is done by examining, comparing the product with the designs, code, artifacts and any other documentation available. It needs proper planning and overviews are done to ensure that inspections are held properly. Lots of preparations are needed, meetings are held to do the inspections and then on the basis of the feedback of the inspection, rework is done.

Sunday, January 26, 2014

29. What is Isolation Testing?

Isolation testing is the process of breaking down the system into various modules so that defects can be spotted easily in isolation. It happens especially when the bug is difficult to locate and resolve by development team.


Characteristics of isolation Testing:

  • It is a time-consuming process and there should be stubs and drivers available to retest each one of them individually.
  • It is expensive as all the items are to be broken into several pieces to make it atomic in nature.
  • It verifies the output of each one of those interfaces/subsystems precisely.

Isolation testing requires that a test be done that has caused a problem previously. An example of this can be to perform a certain action that is known to cause system failure. Isolation testing helps testers to isolate a problem that is causing a specific error to occur.

28. What is Referential Integrity?


Referential integrity is a relational database concept in which multiple tables share a relationship based on the data stored in the tables, and that relationship must remain consistent.

The concept of referential integrity, and one way in which it’s enforced, is best illustrated by an example. Suppose company X has 2 tables, an Employee table, and an Employee Salary table. In the Employee table we have 2 columns – the employee ID and the employee name. In the Employee Salary table, we have 2 columns – the employee ID and the salary for the given ID.

Now, suppose we wanted to remove an employee because he no longer works at company X. Then, we would remove his entry in the Employee table. Because he also exists in the Employee Salary table, we would also have to manually remove him from there also. Manually removing the employee from the Employee Salary table can become quite a pain. And if there are other tables in which Company X uses that employee then he would have to be deleted from those tables as well – an even bigger pain.

By enforcing referential integrity, we can solve that problem, so that we wouldn't have to manually delete him from the Employee Salary table (or any others). Here’s how: first we would define the employee ID column in the Employee table to be our primary key. Then, we would define the employee ID column in the Employee Salary table to be a foreign key that points to a primary key that is the employee ID column in the Employee table. Once we define our foreign to primary key relationship, we would need to add what’s called a ‘constraint’ to the Employee Salary table. The constraint that we would add in particular is called a ‘cascading delete’ – this would mean that any time an employee is removed from the Employee table, any entries that employee has in the Employee Salary table would also automatically be removed from the Employee Salary table.

Note in the example given above that referential integrity is something that must be enforced, and that we enforced only one rule of referential integrity (the cascading delete). There are actually 3 rules that referential integrity enforces:
1. We may not add a record to the Employee Salary table unless the foreign key for that record points to an existing employee in the Employee table. 
2. If a record in the Employee table is deleted, all corresponding records in the Employee Salary table must be deleted using a cascading delete. This was the example we had given earlier. 
3. If the primary key for a record in the Employee table changes, all corresponding records in the Employee Salary table must be modified using what's called a cascading update. 

It’s worth noting that most RDBMS’s – relational databases like Oracle, DB2, Teradata, etc. – can automatically enforce referential integrity if the right settings are in place. But, a large part of the burden of maintaining referential integrity is placed upon whoever designs the database schema – basically whoever defined the tables and their corresponding structure/relationships in the database that you are using. 

27. Difference between Primary Key and Foreign Key?



Primary Key
Foreign Key
Primary key uniquely identify a record in the table.
Foreign key is a field in the table that is primary key in another table.
Primary Key can't accept null values.
Foreign key can accept multiple null value.
By default, Primary key is clustered index and data in the database table is physically organized in the sequence of clustered index.
Foreign key do not automatically create an index, clustered or non-clustered. You can manually create an index on foreign key.
We can have only one Primary key in a table.
We can have more than one foreign key in a table.

Define Primary key and Foreign key


--Create Parent Table
CREATE TABLE Department
(
DeptID int PRIMARY KEY,
--define primary key
Name varchar (50) NOT NULL,
Address varchar(100) NULL
)
GO
--Create Child Table
CREATE TABLE Employee
(
EmpID int PRIMARY KEY, --define primary key
Name varchar (50) NOT NULL,
Salary int NULL,
--define foreign key
DeptID int FOREIGN KEY REFERENCES Department(DeptID)
)




26. Clustered and Non- Clustered Index?


What is the difference between a Clustered and Non Clustered Index?


A clustered index determines the order in which the rows of a table are stored on disk. If a table has a clustered index, then the rows of that table will be stored on disk in the same exact order as the clustered index. An example will help clarify what we mean by that.

An example of a clustered index

Suppose we have a table named Employee which has a column named EmployeeID. Let’s say we create a clustered index on the EmployeeID column. What happens when we create this clustered index? Well, all of the rows inside the Employee table will be physically – sorted (on the actual disk) – by the values inside the EmployeeID column. What does this accomplish? Well, it means that whenever a lookup/search for a sequence of EmployeeID’s is done using that clustered index, then the lookup will be much faster because of the fact that the sequence of employee ID’s are physically stored right next to each other on disk – that is the advantage with the clustered index. This is because therows in the table are sorted in the exact same order as the clustered index, and the actual table data is stored in the leaf nodes of the clustered index.

Remember that an index is usually a tree data structure – and leaf nodes are the nodes that are at the very bottom of that tree. In other words, a clustered index basically contains the actual table level data in the index itself. This is very different from most other types of indexes as you can read about below.

When would using a clustered index make sense?


Let’s go through an example of when and why using a clustered index would actually make sense. Suppose we have a table named Owners and a table named Cars. This is what the simple schema would look like – with the column names in each table:

Owners
Owner_Name
Owner_Age

Cars
Car_Type
Owner_Name

Let’s assume that a given owner can have multiple cars – so a single Owner_Name can appear multiple times in the Cars table. Now, let’s say that we create a clustered index on the Owner_Name column in the Cars table. What does this accomplish for us? Well, because a clustered index is stored physically on the disk in the same order as the index, it would mean that a given Owner_Name would have all his/her car entries stored right next to each other on disk. In other words, if there is an owner named “Joe Smith” or “Raj Gupta”, then each owner would have all of his/her entries in the Cars table stored right next to each other on the disk.

When is using a clustered index an advantage?


What is the advantage of this? Well, suppose that there is a frequently run query which tries to find all of the cars belonging to a specific owner. With the clustered index, since all of the car entries belonging to a single owner would be right next to each other on disk, the query will run much faster than if the rows were being stored in some random order on the disk. And that is the key point to remember!

Why is it called a clustered index?


In our example, all of the car entries belonging to a single owner would be right next to each other on disk. This is the “clustering”, or grouping of similar values, which is referred to in the term “clustered” index.

Note that having an index on the Owner_Name would not necessarily be unique, because there are many people who share the same name. So, you might have to add another column to the clustered index to make sure that it’s unique.

What is a disadvantage to using a clustered index?


A disadvantage to using a clustered index is the fact that if a given row has a value updated in one of it’s (clustered) indexed columns what typically happens is that the database will have to move the entire row so that the table will continue to be sorted in the same order as the clustered index column. Consider our example above to clarify this. Suppose that someone named “Rafael Nadal” buys a car – let’s say it’s a Porsche – from “Roger Federer”. Remember that our clustered index is created on the Owner_Name column. This means that when we do a update to change the name on that row in the Cars table, the Owner_Name will be changed from “Roger Federer” to “Rafael Nadal”.

But, since a clustered index also tells the database in which order to physically store the rows on disk, when the Owner_Name is changed it will have to move an updated row so that it is still in the correct sorted order. So, now the row that used to belong to “Roger Federer” will have to be moved on disk so that it’s grouped (or clustered) with all the car entries that belong to “Rafael Nadal”. Clearly, this is a performance hit. This means that a simple UPDATE has turned into a DELETE and then an INSERT – just to maintain the order of the clustered index. For this exact reason, clustered indexes are usually created on primary keys or foreign keys, because of the fact that those values are less likely to change once they are already a part of a table.

A comparison of a non-clustered index with a clustered index with an example:


As an example of a non-clustered index, let’s say that we have a non-clustered index on the EmployeeID column. A non-clustered index will store both the value of the EmployeeID AND a pointer to the row in the Employee table where that value is actually stored. But a clustered index, on the other hand, will actually store the row data for a particular EmployeeID – so if you are running a query that looks for an EmployeeID of 15, the data from other columns in the table like EmployeeName, EmployeeAddress, etc. will all actually be stored in the leaf node of the clustered index itself.

This means that with a non-clustered index extra work is required to follow that pointer to the row in the table to retrieve any other desired values, as opposed to a clustered index which can just access the row directly since it is being stored in the same order as the clustered index itself. So, reading from a clustered index is generallyfaster than reading from a non-clustered index.
A table can have multiple non-clustered indexes

A table can have multiple non-clustered indexes because they don’t affect the order in which the rows are stored on disk like clustered indexes.

Why can a table have only one clustered index?


Because a clustered index determines the order in which the rows will be stored on disk, having more than one clustered index on one table is impossible. Imagine if we have two clustered indexes on a single table – which index would determine the order in which the rows will be stored? Since the rows of a table can only be sorted to follow just one index, having more than one clustered index is not allowed.

Summary of the differences between clustered and non-clustered indexes


Here’s a summary of the differences:

A clustered index determines the order in which the rows of the table will be stored on disk – and it actually stores row level data in the leaf nodes of the index itself. A non-clustered index has no effect on which the order of the rows will be stored.
Using a clustered index is an advantage when groups of data that can be clustered are frequently accessed by some queries. This speeds up retrieval because the data lives close to each other on disk. Also, if data is accessed in the same order as the clustered index, the retrieval will be much faster because the physical data stored on disk is sorted in the same order as the index.
A clustered index can be a disadvantage because any time a change is made to a value of an indexed column, the subsequent possibility of re-sorting rows to maintain order is a definite performance hit.
A table can have multiple non-clustered indexes. But, a table can have only one clustered index.
Non clustered indexes store both a value and a pointer to the actual row that holds that value. Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index – and the clustered index actually stores the row-level data in it’s leaf nodes.

25. Difference between Primary Key and Unique Key?



Primary Key
Unique Key
Primary Key can't accept null values.
Unique key can accept only one null value.
By default, Primary key is clustered index and data in the database table is physically organized in the sequence of clustered index.
By default, Unique key is a unique non-clustered index.
We can have only one Primary key in a table.
We can have more than one unique key in a table.
Primary key can be made foreign key into another table. 
In SQL Server, Unique key can be made foreign key into another table.



Define Primary key and Unique key

CREATE TABLE Employee
(
EmpID int PRIMARY KEY, --define primary key
Name varchar (50) NOT NULL,
MobileNo int UNIQUE, --define unique key
Salary int NULL
)

In one of the differences, I have used 'clustered' and 'non clustered index'. In order to understand the concept, please follow this link:  http://cracksoftwaretestinginterviews.blogspot.in/2014/01/26-clustered-and-non-clustered-index.html