Occupational performance measurement issues and methodologies

This essay was produced in 1996 as coursework in the Professional Studies component of a two-year accelerated post graduate diploma in Occupational Therapy. Students and prospective students should not be put off by the daunting essay question -- there was a much more understandable alternative question. I'm still reasonably happy with it, but would welcome any responses, which could be posted as appendices. At the time of writing I was moving away from a one-sided defence of hard science to a more measured view.

Professional Studies Essay: Year One

Mike Griffin

The Question:

"Measuring the functional performance of people with disabilities requires us to address technical measurement issues and methodologies beyond classical post hoc reliability and validity investigations" Smith (1992)

Discuss this statement in relation to occupational therapy assessment.

Marks awarded: 67% Word count: 3030



The aim of occupational therapy is to assess and to remediate human performance deficits. Occupational performance can be defined as the ability to perform those tasks that make it possible to carry out occupational roles in a satisfying manner that is appropriate to the individual's developmental stage, culture and environment (Llorens, 1991). The importance of measuring functional performance is therefore of crucial interest to occupational therapy. Its importance is clear in the assessment of a person's level of functioning, in assessing the impact of therapy on a person's level of functioning and, finally, in enabling the efficacy of interventions to be ascertained. This begs the question of which components of functioning are to be measured and how. Several associated questions arise: what is the role of standardised assessments? Why do a plethora of home-made assessment tools inhabit the field of professional practice? What theoretical approaches exist as an alternative to a reliance on positivist reliability and validity testing? As occupational functioning operates on varying levels of complexity, it will be necessary to consider which levels of occupational dysfunction therapists need to measure.

What do occupational therapists measure?

The World Health Organisation (1980) produced a classification of impairments based on a functional hierarchy that can serve as a useful starting point for an examination of the levels of function assessed by occupational therapists. The lowest level of complexity is that of impairment, which is used to refer to a dysfunction of organs that may or may not impair functional ability. An example of this could be diabetes - a dysfunction of the pancreas which, if properly controlled need not involve any impairment of occupational performance. Occupational therapists sometimes assess at this level, in the case of manual muscle testing, for example, but it is far more typical of the level of assessment of other health professionals such as doctors. Because the phenomena dealt with here usually involve bodily organs they are much easier to subject to positive testing than the higher level functions.

Trombly (1995) gives a thorough and comprehensive summary of technologies for measuring biomechanical and physiological aspects of motor performance. Aspects such as joint range of motion (ROM) and muscle strength are dealt with. These are basic measurements that are easily measured with devices such as a goniometer. These measurements are of a lower order than measurement of occupational performance and the abilities that are being measured can properly be regarded as component parts of occupational performance. The relative importance of each of these abilities for the occupational performance of each individual varies greatly from one individual to the next.

Disability is used to refer to the impairment of task function resultant from an impairment. By definition, this functional category can be eliminated through the use of, for example, assistive equipment. If a person's level of functioning was such that he were able to function normally apart from transferring out of the bath, this could be considered to be part of the category of disability. This is still a relatively straightforward category - it is possible to be confident in assessing whether a person can perform a particular physical task. However, we also need to address issues such as motivation which can have a great influence on whether a person can effectively overcome a disability. The question of motivation is relevant when considering whether a person will regularly wear an orthotic device and it is necessary to compare the relative compliance rates of various devices as well as simply establishing that they will alleviate a disability. Law (1995) describes the Functional Independence Measure, which seeks to be a measure of disability. It measures 18 items such as self-care activities, sphincter control and mobility on a seven point scale. As it cannot effectively determine the relative importance of each activity for an individual it is a measure of disability and not of occupational performance.

The highest level of complexity exists at the level of what WHO refer to as handicap. This is used to refer to the hindering of successful role performance by a disability. This role is of great relevance for occupational therapists as they seek to find ways of overcoming performance deficits. The complexity of assessment at this level lies in the fact that roles are socially defined phenomena rather than easily quantifiable physical abilities. Smith (1992) discusses many of the problems encountered in measuring a person's ability to fulfil a social role. The relative importance of dysfunctions is variable "for example, how important is learning to dress independently if an individual has a spouse who is willing to help dress (and speeds the process to one-tenth of the time)". Smith proposes that some of these issues can be addressed through the use of non-classical test and measurement models but because of the influence and complexity of social meanings, a detailed understanding of the construction of social roles necessitates a simultaneous consideration of the individual and of the totality to which the individual belongs. It can be argued that there is a need here for an approach that goes beyond that of positive testing. Measurements need to be understood in their social and individual contexts if they are to be meaningful both in the context of individual function and of theory development.

Reliability and validity

Reliability can generally be defined as the consideration of the relation of a measuring device to itself (Thorndike, 1976). The device must provide consistent and reliable measurements.

Tests of reliability might typically involve the replication of the same measurement with the same or an identical subject. This "test-retest" procedure is relatively unproblematic when dealing with inanimate subjects such as the interaction of two chemical compounds. Human occupational performance, however, is permeated with complexities and interrelationships with other factors such as social norms and values. Human subjects are also characterised by their ability to learn, so that subjecting a person to the same test twice may lead to the distortion of a learning effect. This can be partially addressed by, for example, dividing a test into two and administering part of the test in the initial test, saving the second part of the test for the re-test. Even here, however, there is the possibility that the principles of the test can be learned. IQ tests, for example, incorporate tasks that follow a consistent problem-solving method (see Eysenck, 1977, for examples of tests whose methods can be learned with practice). It is evident then that when measuring some of the more complex facets of human occupational functioning we are faced with formidable problems in the realm of reliability. Some intelligence tests have been criticised on the basis that they are culturally biased because of the inclusion of tests which require a knowledge of vocabulary which is more familiar to one cultural group than another. The tests were accused of lacking validity because they were actually measuring knowledge of vocabulary not intelligence.

Validity refers to the question of whether a measuring device really measures what it is intended to measure. Two of the main components of validity are content validity and criterion-related validity.

Content validity refers to the extent to which the range of behaviours or aptitudes measured accurately and meaningfully reflect the full range of behaviours under consideration. For example, we might assess the level of an individual's social functioning in a given range of functions but remain uncertain about the extent to which these measures predict the level of that individual's functioning in areas not included in the test. Betz and Weiss (1976) suggest subjective judgement as an index of content validity, but this suggests that the project of applying positive testing in these cases is flawed, as the positivist method has to be supplemented with a degree of subjective judgement, thus rendering the measurement outside of the realms of positivist testing. Betz and Weiss also suggest factor analysis as a means of assessing content validity: if the factor analysis yields several factors, it can be concluded that the tests are actually measuring more than one trait.

Criterion-related validity refers to the degree of correlation between the measure and an independent measure of the trait being investigated. Psychometric tests used in recruitment are intended to assess how well an individual would perform if appointed. A measure of criterion-based validity might examine correlations between test scores and job performance of individuals who have been appointed (although we would then need to examine the reliability and validity of the measure of performance). There are problems here also. Betz and Weiss (1976) describe how IQ tests were compared to school performance in order to assess their validity. It is not difficult to argue that school performance reflects a great deal more than simply intelligence and therefore is invalid as an index of the criterion-related validity of intelligence tests. Here again we see that the complex and varied nature of human occupational performance evades attempts to enclose it within the strictures of positivist testing. The main issue here is that we need to recognise that tests based on validity measures, while necessary for some assessments, are necessarily limited in their usefulness for the development of theory.

The role of standardised tests

Occupational therapy uses both standardised tests and non-standardised tests. Standardised tests are based on published work and can be assumed to have evidenced reliability and validity. Pedretti (1995) states that standardised tests have a role to play in increasing professional confidence both within and without occupational therapy. There are, however, relatively few standardised tests in occupational therapy. This may in part be due to the relative youth and low academic level of the profession (there are comparatively few occupational therapists with doctoral degrees). It may also reflect on the enormous variety of human behaviours with which occupational therapy is concerned. Practise settings have greatly differing areas of concern, reflecting both the variance of their client groups and the variance of individuals within these groups. Standardised tests also have potential use for theory development. The standardised nature of the tests means that the results obtained from a wide range of practice settings can be brought together to provide general results for large populations.

Home-made tests

Home-made tests are non-standardised tests that lack the evidenced reliability and validity of standardised tests. They rely much more on the subjective evaluations of the assessor. The scarcity of standardised tests is a major incentive to use non-standardised tests. The differing demands of practice settings also necessitate their use. A local authority occupational therapist may find, for example, that some of the items assessed in a standardised test are not provided for by the local authority, rendering their measurement superfluous. Equally, the standardised tests available may not include items of specific relevance for the client group in question, necessitating the use of home-made tests.

Limitations of positive testing

Lunz and Stahl (1993) concluded from a literature review that "persons who rate, judge or examine have unique perspectives that interact with the examination materials and the candidates performances. Applying standardised grading criteria to standardised protocols and administration procedures defines the examination process but will not remove differences in rater perspective and severity". Severity in this context refers to the relative tendency of each rater to make judgements on a client's performance more or less favourably than another rater. Lunz and Stahl discuss means of adjusting for the differing severity of raters, but it is worth discussing whether the problem lies in the subjectivity of the rater, in other words an imperfect measurement technology, or in the nature of the phenomena being measured. Measurement techniques usually focus on the individual. Sometimes they focus only on component parts of the individual. Occupational functioning, however, is a phenomenon that transcends individuality. Even when we perform a solitary occupational function, such as preparing a meal alone for one's own consumption, the situation is permeated with social influences: should we eat healthily to avoid being a drain on health services?; what does our choice of food, crockery, etc. say about our position in society? (would we feel a lack of self respect if eating from dirty, chipped plates?); what can we afford to eat? A positive measurement of a person's functioning in this context cannot hope to capture the nuances of the social influences on individual functioning. We can reasonably argue that there are many aspects of occupational functioning that can not be captured by positive testing of an individual, no matter how well trained the rater, because that which is being measured is not a phenomenon which is contained within the individual. Rather, the phenomenon of occupation is an expression of the individual's interaction not only with nature but with society. Only in the works of Defoe and Adam Smith do we find the self-contained individual whose occupational functions can be divorced from the influences of a wider society. Our clients are not Robinson Crusoes but social beings functioning in the context of complex societies.

Kielhofner (1993) develops this point and argues "A dialectical evaluation must produce information about the person-environmental interface and that locates problems in the interface between the individual and the social collective". The concept of dialectics here referred to originates in Hegel and was developed by Marx as a process which "comprehends things and their representations, ideas, in their essential connection, concatenation, motion, origin, and ending" (Engels, 1973b). It is a rejection of approaches which "investigate things as given, as fixed and stable" (Engels, 1973a). The usefulness of this philosophical approach, which rejects the application of the methods of natural science to the study of humans as social beings, lies in its insistence that the subjects of its investigations should not be seen as interacting, but otherwise unchanging, objects. Rather, it attempts to grasp reality as a social process. Lukacs (1990) provides a brilliant exposition of this approach: "the objective forms of all social phenomena change constantly in the course of their ceaseless dialectical interactions with each other. The intelligibility of objects develops in proportion as we grasp their function in the totality to which they belong".

The dialectical approach, in contrast to that of positive testing, is holistic, because it relates the object of its investigations to totality; humanistic, because it deals with the question of the construction of social reality by dynamic social beings; and, most importantly, encompasses meaning. Hawes (1996) articulates this well: "meaning always takes place in a wider historical and material context which, while it does not determine either form or content of human transactions, fundamentally influences both". In the context of a study of occupation, this means that we must understand the ways in which social norms, for example, are continually created and recreated and the dialectical interaction with society in which the individual understands himself and his actions.


Occupational therapy measurements deal with different levels of complexity ranging from impairment to handicap (WHO, 1980). The higher the level of function being assessed, the greater the importance of social factors. The result is that in order to properly understand function, we need to go beyond a consideration of an isolated individual and conceive of the person as a social being. This requires a methodology that transcends the examination of individual phenomena and situates the individual in the context of the social totality of which he is a part. An understanding of the dialectical relationships between individual and society is called for and in order to do this we need to have an appreciation of the construction of social reality. We need to understand how roles are created and how individuals relate to those roles. This is something that positive testing cannot encapsulate. More easily achieved than this ambitious project, however, is the project of developing and refining reliable and valid assessment procedures to address the lower levels of the functional hierarchy. The use of non-classical test and measurement models needs to be explored as an alternative to the use of classical reliability and validity models.


