When evaluating a study, it is best to follow GRAVES for the top marks. This is Generalisability, Reliability, Application to real life,Validity, Ethics and Studies (that support or go against).
Generalisability - This is whether or not the results from the study are generalisable to everyone. You have to base this on the samle used. For example, in Milgram's study, the sample was males (no females) so therefore the results cannot be generalised to females as they are genetically different which may have an affect on their behaviours. Another thing to question is the culture of the participants: if it was only carried out on Americans, it may not be generalisable to English people (or any other culture) as we are brought up differently which again could affect our behaviours.
Reliability - If a study is reliable, it is therefore replicable (can be done again and will achieve more or less the same results.) A study is reliable if controls are in place, meaning a lab experiment is the most reliable as it is highly controlled, followed by a field experiment as there are some controlls in place but it is natural for the participants, and then a natural experiment which isn't classed as reliable as no controls are in place.
In the exam, make sure you give examples of the controlls used within the study as the examiner cannot mark you for just saying 'this study is reliable as controls were in place'; what controls?