Low in reliabilityInter-rater reliability is the extent to which a procedure, task, or measure would produce the same results when two observers observes the same behavior independently.There were two observers in this study. Observer 1 in this study recorded sex, race and location of passengers in the critical carriage while observer 2 recorded sex, race and location of people in the adjacent carriage.Because the two observers recorded different things, inter-rater reliability of their observations could not be checked. Therefore, it is highly difficult to replicate and produce the findings from this study.Miscalculation, misinterpretation of the conversations, view could have been blocked,Consistency of the response categories could not be identified
High in ecological validityThe extent to which the findings of research is one situation would generalise to other situations. This is either the situation represents the real world effectively or the task is relevant to real life.For example, the participantswere ordinary train passengersthat were taking the New Yorksubway, IND, 59 th Streetstation to 125 th Street station.The setting is normal toeveryday life, thus, most of theparticipants are likely to be regular train passengers.Since they are unaware thatthey were part of theexperiment, this helps toreduce demandcharacteristics, which leadsthem to behave naturally asthey believed the emergencysituation to be real. Since theresponses shown are genuineto real life, thus this study ishigh in ecological validity.
High in validity (use of stooge)The extent a researcher is testing what they claim to be testing.The study used a stooge to act as a drunk or ill victim. The ill victim always carried a cane and the drunk victim always smelled of alcohol and carried a liquor bottle in a brown paper bag. Both the ill and drunk stooge were standardized to wear Eisenhower jackets, old slacks and with no tie. The victims would fall over after 70 seconds (early).

Because participants were not aware that the stooge were part of the study, demand characteristics were reduced as participants are more likely to believe the behaviour of the stooge.

The use of stooge allows standardization between both the ill and drunk victim. Researchers can be more confident that participants were responding to the manipulated drunk and ill condition rather than extraneous variables such clothing or time given for participants to react.

Ethical issues[TM6]  (informed consent)Participants should be asked if they want to take part in a study and be given all relevant information about the study and be allowed to leave.Participants in this study were 4450 men and women who were travelling between 11am and 3pm. People who just happened to be on the train at that time of the day were part of the study.Because participants were unaware that they were involved in an experiment, researchers were not able to provide relevant information about the study. Therefore, participants were not given an opportunity to decide if they were willing to be observed and recorded as part of a study. This breaches the guideline for informed consent.
Low in generalisabilityHow widely findings apply to other settings and populations.For example, the participants in this study were all subway passengers from New York City, and the study was conducted between 11am to 3pm, which means the sample might not representative of the whole population.Since it is possible that the same commuters travelled on the same train and at the same time, the findings of this study might not be applicable to other New York citizens which do not commute by train. The study also had a limited sample since the study was conducted between 11am to 3pm, making this study not very representative of the population as passengers who take the train earlier than 11am or later than 3pm might respond differently to the study. Thus, this study is low in generalisability.
Qualitative data (strength)Consist of descriptions or words, rather than numbers.Data that describes meaning and experience in the form of words rather than providing numerical values for behaviour (not restricted by fixed options)For example, the observersalso recorded qualitative dataincluding the remarks andmovements made by thepassengers during each trial.Since qualitative data allowsparticipants to express howthey feel and therefore notreductionist, this allowsresearchers to understand thethoughts and behavioursassociated with helping in anemergency situation more indetail. Hence, qualitative datais a strength.

strength participants were unaware that they were being observed, so this should have reduced demand characteristics in their helping behaviour

strength observations were in the field so the helping behaviour should have been spontaneous and natural, raising validity

weakness because the observations were covert, the participants could not give informed consent, raising ethical issues

 weakness because the situation being observed was artificial / trails were repeated, participants may not have reacted realistically / may have been suspicious.

Usefulness: the study told us that type of victim can affect how long people take to help or whether they help at all. This could be used to educate people that in an emergency we should help others quickly no matter who they are: the longer it takes to help. the more the victim may suffer more in long term (especially if medical attention is needed).

As participants will not know they are taking part in a study, there will be few or no demand characteristics so behaviour is more likely to be natural and valid.

The setting was a subway train which is not artificial (it is a real situation that many people find themselves in daily). Even the event is something that could easily happen so the study does have ecological validity.

As the setting was natural and no one was aware that the whole situation was staged, there was very little chance that anyone would have shown behaviour to fit the aim of the study. The behaviour shown by the participants was natural and therefore valid.

Situational variables can be difficult to control so sometimes it is difficult to know if it is the IV affecting the DV. It could be an uncontrolled variable causing the DV to change.

The positioning of people in the carriages could not be controlled for (this is just one example). Therefore, they may not have noticed the incident or ignored it (e.g. as they were reading) so it may not have been the type of victlm affecting helping levels.

As participants did not know they are taking part in a study, there are issues with breaking ethical guidelines relating to informed consent and deception.

Participants in the train did not know it was a study so were deceived and obviously inf()(med consent was not taken. This goes against ethical guidelines (although formal guidelines were not around at the time or the study).

Generalisation: it would be difficult to generalise past the sample itself as participants were all urban dwellers who were (presumably) used to travelling on a subway train. People in urban areas are more used to deindividuating (losing their sense of identity) and feeling “anonymous” whereas people who live in rural areas or even a different city might act differently.