J. Anim Sci.
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bracke, M. B. M.
Right arrow Articles by Schouten, W. G. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bracke, M. B. M.
Right arrow Articles by Schouten, W. G. P.
J. Anim. Sci. 2002. 80:1835-1845
© 2002 American Society of Animal Science

Decision support system for overall welfare assessment in pregnant sows B: Validation by expert opinion

M. B. M. Bracke*,2, J. H. M. Metz*, B. M. Spruijt{dagger} and W. G. P. Schouten{ddagger}

* Institute of Agricultural and Environmental Engineering (IMAG), Wageningen University and Research Centre, 6700 AA Wageningen, The Netherlands; and {dagger} Animal Welfare Centre—Faculty of Veterinary Medicine, University of Utrecht, Yalelaan 17, 3584 CL, Utrecht, The Netherlands; and and {ddagger} Department of Animal Sciences, Ethology Group, Wageningen University and Research Centre, 6700 AH, Wageningen, The Netherlands

2 Correspondence:
Inst. of Animal Science and Health (ID-Lelystad), Wageningen University and Research Centre, P.O. Box 65, 8200 AB Lelystad, The Netherlands (phone: +31-(0)320-238205; fax: +31-(0)320-238050; E-mail:
m.b.m.bracke{at}id.wag-ur.nl).


    Abstract
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
This paper examines the validity of a model that is embedded in a computer-based decision support system to assess the welfare status of pregnant sows in housing and management systems. The so-called SOWEL (SOw WELfare) model was constructed using a formalized procedure to identify and weight welfare-relevant attributes of housing systems in relation to the animal’s needs, and evidenced by scientific statements collected in a database. The model’s predictions about welfare scores for 15 different housing systems and weighting factors for 20 attributes were compared with expert opinion, which was solicited using a written questionnaire for pig-welfare scientists.

The experts identified tethering and individual housing in stalls as low welfare systems. The group of mid-welfare systems contained indoor group-housing systems and an individual-housing system with additional space and substrate. The five best systems were all systems with outdoor access and the provision of some kind of substrate such as straw. The highest weighting factors were given for the attributes "social contact," "health and hygiene status," "water availability," "space per pen," "foraging and bulk," "food agonism," "rooting substrate," "social stability," and "movement comfort." The degree of concordance among the experts was reasonable for welfare scores of housing systems, but low for weighting factors of attributes. Both for welfare scores and weighting factors the model correlated significantly with expert opinion (Spearman’s Rho: 0.92, P < 0.001, and 0.72, P < 0.01, respectively). The results support the validity of the model and its underlying procedure to assess farm-animal welfare in an explicit and systematic way based on available scientific knowledge.

Key Words: Animal Welfare • Housing • Indexes • Management • Pigs


    Introduction
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
In the previous paper (Bracke et al., 2002) we described the computer-based decision support system with the SOWEL model for welfare assessment in the case of pregnant sows. In the development phase of the model we had elicited expert opinion for the seven main housing systems for pregnant sows to serve as a benchmark (Bracke et al., 1999b), but we did not consult experts about their reasons to select and weight attributes. We used a common conceptual framework (Anonymous, 2001), a list of needs (Bracke et al., 1999d; Spruijt et al., 2001), and factual scientific statements about what scientists had found in empirical research to construct the model. Using a formalized procedure we tried to capture the experts’ implicit reasoning processes to weight attributes and to assess welfare.

The decision support system is designed to be adaptable, that is, new insights can be incorporated when these become available. Validation of such a dynamic model will, therefore, be an ongoing process. In part A we showed that our model performs in accordance with previous expert consultation and with several other models. In this paper we attempt to validate the model further by comparing it with expert opinion.

For empirical validation a variety of measures would have to be used, but there is no generally accepted way to combine them into one overall judgment (Fraser, 1995; Bracke et al., 1999c). Without a standard, expert opinion seems to be the best available, though not entirely independent, standard for welfare assessment. We suggest that if the model were found to be in accordance with expert opinion, this could be regarded as a confirmation that experts implicitly, as does our model explicitly, use an assessment of welfare based on scientific findings and the biological needs of the animals.

The main aim of this paper, therefore, is a "validation" of the welfare model using expert opinion.


    Materials and Methods
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Expert Consultation
In February 2000 we sent a written questionnaire to 29 pig-welfare experts of 14 different nationalities. They were all scientists, members of the International Society for Applied Ethology, and known for their work on pig welfare. The questionnaire contained two main parts. One part concerned the weighting of 20 welfare-relevant attributes. The other part concerned the welfare assessment of 15 housing systems for pregnant sows. Part order was reversed for half of the experts.

One part of the questionnaire contained 20 randomly presented attributes, which were a selection from the attributes in the model in order to limit the length of the questionnaire. The attributes were described by their levels, which we had ranked from best to worst, but we had omitted all minimum-requirement levels, such as a poor health status or very slippery floors, because these convey an overriding weight making welfare "poor" (i.e. low), no matter what else is true about the other attributes.

For each attribute we asked the experts to assign a score between 0 and 10 expressing its weighting factor relative to the other attributes in the list. The least important attribute was to be given a weighting factor of 0 and the most important attribute a weighting factor of 10. We also asked to verify the ranking of the levels, and to indicate whether any of the levels, in the expert’s opinion, constituted a minimum requirement for welfare. Finally, we asked to give one confidence score (scale 0 to 10) for the entire set of weighting factors to express the expert’s confidence in the validity of his/her own set of weighting factors.

The set of 15 housing systems in the other part of the questionnaire included the seven main housing systems for pregnant sows, which are the reference systems that had been used for model development and which had been evaluated in previous interviews (Bracke et al., 1999d). In addition, eight "novel" systems were added to test the model’s predictions (Table 1Go).


View this table:
[in this window]
[in a new window]
 
Table 1. Housing systems for pregnant sows (ID: identification number, ranked according to our model from low to high welfare), included in the description is the main reference used to define the system
 
In the questionnaire the housing systems were described in a standardized way on cards (Figure 1Go). For all systems a moderate climate, suitable soil conditions, suitable building construction, and stabilized conditions with animals that are used to the system were presumed. The experts were asked first to rank the cards, which were presented in a randomized order, and then to give a relative welfare score for each of these systems on a scale from 0 for the worst system to 10 for the best system. For each system we also asked them to state the main arguments for the score and an indication of what we called the "90% range." This range indicates uncertainty about the welfare score. It is expressed as the number of points above and below the welfare score that covers 90% of the farms with the system. The 90% range indicates uncertainty due to differences in familiarity of the expert with the different systems and due to differences between farms (e.g., between stock handlers and other factors not described on the card).



View larger version (31K):
[in this window]
[in a new window]
 
Figure 1. Example of a card used in the questionnaire describing housing system 9 "zigzag" (Brent, 1986).

 
Finally, for the worst and the best systems only (with relative welfare scores of 0 and 10, respectively), we asked them to give an "absolute" welfare score, also on a scale from 0 to 10, where all conceivable production systems defined the scale, rather than only the ones we had described on the cards. In the remainder of this paper the term scores will refer to relative welfare scores unless specified otherwise.

The response rate of this enquiry was 79% (n = 23; 12 different nationalities including, in alphabetical order, Belgium, Canada, Czechia, Denmark, Germany, France, the Netherlands, Spain, Sweden, Switzerland, United Kindom, United States). The average time needed to complete the questionnaire was 125 min, with 44 min for the assessment of the attributes, and 81 min for the assessment of the housing systems.

Of the 23 experts that had assigned weighting factors one expert was excluded, because he had used a scale from 7 to 10, which resulted in many extreme values upon transformation to a scale from 0 to 10.

Statistics
Nonparametric statistics were used (cf. Siegel and Castellan, 1988), because the scores were bound between 0 and 10, the variance in welfare scores for housing systems and weighting factors was not constant, and the scores were not always normally distributed.

Spearman rank correlation coefficients (Rho) were used to compare the various sets of scores. In particular, Rho was used to determine the correlation between our model and the median expert scores. Rho was also used to determine the model’s performance compared to the experts individually using the median expert scores as a reference. Because the set of Rhos of the individual experts as correlated with median expert scores was not normally distributed, we expressed the performance of our model as the percentile level, where a value of, for example, 75 indicates that the model’s performance is at least as good as 75% of all the individual experts.

Kendall’s coefficient of concordance (W) was used to examine the degree of consensus between the experts, and between the experts and the model.

The Friedman two-way analysis of variance by ranks and the post-hoc multiple comparisons test were used to determine whether there were significant differences in ranks between housing systems and between attributes (Siegel and Castellan, 1988, pp 174–183). Because the Friedman test cannot handle missing values, for the set of welfare scores data from two experts and for the set of weighting factors data from three experts had to be excluded listwise because they each had one missing value. However, because this resulted in discarding many valuable data, we used overall medians, rather than the Friedman mean ranks, to represent expert opinion (with n = 23 experts for welfare scores and n = 22 for weighting factors).

The data were analyzed using SPSS 10.0 (SPSS Inc., Chicago, IL).


    Results
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Attributes
Table 2Go lists the attributes sorted according to the weighting factors calculated with our model.


View this table:
[in this window]
[in a new window]
 
Table 2. Weighting factors for 20 welfare-relevant attributes sorted according to the weighting factors calculated with our model (transformed to a scale between 0 and 10); missing ID numbers represent attributes in the model that werenot included in the questionnaire
 
Differences between attributes were found (Friedman test, P < 0.001), but multiple comparisons identified only 51 significant differences out of 190 combinations, that is, 27% of all cases (cf. Table 2Go).

According to the experts the most important attributes were the attributes "social contact" (median score: 8.94 on a scale from 0 to 10), "health and hygiene status" (8.57), "water availability" (8.54), "space per pen" (8.00), "foraging and bulk" (7.89), "food agonism" (7.89), "rooting substrate" (7.57), "social stability" (7.14), and "movement comfort" (7.14). The least important attributes were "light" (3.88), "visually isolated areas" (3.75), "food palatability" (3.17), "nestbuilding (resting nest)" (2.68), "scratching" (2.36), and "wallowing" (1.38) (Table 2Go).

Figure 2Go shows a boxplot of the scores obtained from the experts as well as the predictions made with our model.



View larger version (72K):
[in this window]
[in a new window]
 
Figure 2. Boxplot of weighting factors per attribute (cf. Table 2Go) as assigned by the experts, showing the median score, the quartiles (box), and the whiskers, that is, those values that are not outliers. Outliers are identified as O for = 1.5 times the interquartile range. Stars represent weighting factors calculated with the model.

 
The variance around the median weighting-factor scores was considerable. The interquartile ranges vary from 1.7 for attribute 19 ("resting comfort") to 5.7 for attribute 2 ("health and hygiene status"), with a median interquartile range of 3.5 (cf. Figure 2Go).

The degree of concordance among the experts is significant, but low (Kendall’s W is 0.43, P < 0.001), and the experts themselves were also only moderately confident that their own weighting factors represented the actual importance of the attributes as they gave a median confidence score of 7.00 on a scale from 0 to 10 (interquartile range: 5.0 to 8.0).

The model contributed positively to the concordance, because when we included its scores as if the model were an additional expert, the concordance increased slightly (new W is 0.44).

Our model correlated moderately with the median expert scores (Spearman’s Rho is 0.73, P < 0.001). With this Rho the model performed at the 55 percentile level (i.e., 55% of the experts had a lower correlation coefficient and 45% had a higher correlation than our model).

Large absolute differences between the model and the experts were found for several attributes. Our model gave lower scores for attribute 25 "movement comfort" (-6.0 points; 0 percentile of the expert scores for this attribute), attribute 15 "water availability" (-4.8 points; 5 percentile), and attribute 19 "resting comfort" (-3.37 points; 10 percentile). For attribute 4 "exposure to cold" our model gave a higher score (3.2 points; 95 percentile).

Two further, more qualitative, indicators of the model’s quality are the number of minimum requirements and changed level rankings. The model’s attributes were presented to the experts without minimum-requirement levels and with their levels sorted as done in the model from best to worst. Therefore, if the experts would fully "agree" with the model, they would never assign a minimum requirement or change the level ranking.

Minimum requirements were assigned legitimately in only 2.1% of the cases (8 out of 378 instances assigned by five experts). It concerned the attributes "space per pen" (three times), "health and hygiene status" (twice), "air quality" (once), "social contact" (once), and "water availability" (once).

The level rankings were changed only 42 times out of 399 cases (11%). The attributes "social contact," "rooting substrate," "exposure to cold," and "social stability" were changed most often (nine, seven, five, and six times, respectively). Few patterns of changed level rankings could be identified, but these did not help to identify obvious conceptual errors in the model, although they could be used as a starting point for further analysis.

Housing Systems
One expert assessed the housing systems very differently from all other experts. He accounts for seven outlier values in Figure 3Go. Principal Components Analysis gave a factor score of 3.8 for this expert, which may be considered extreme (SPSS Inc.). Although this expert may be considered to be an outlier, his main arguments and weighting factors presented a consistent view. He highly valued protection during feeding and the control of feed intake (e.g., using feeding stalls and[or] an electronic sow feeding system [ESF]) and placed a low value on the provision of space and rooting substrate such as straw and pasture. As this was a consistent view of only one out of 23 experts (with only minor effects on the overall scores), we decided not to exclude this expert from the dataset.



View larger version (42K):
[in this window]
[in a new window]
 
Figure 3. Boxplot of (relative) welfare scores per housing system (cf. Table 1Go) as assigned by the experts, showing the median score, the quartiles (box), and the whiskers, that is, those values that are not outliers. Outliers are identified as {circ} for = 1.5 and * for = 3 times the interquartile range. Stars represent welfare scores calculated with our model.

 
Table 3Go lists the housing systems according to the welfare scores calculated with our model. The experts gave the tethered system (1) and individual housing in stalls (2) the lowest median welfare scores (0.00 and 0.56, respectively). Group-housed indoor systems (3, 4, 5, 6, 7, 8, and 10) and one individual housing system with additional space and substrate (Zigzag, 9) scored considerably higher (between 4.00 and 6.00; Table 3Go).


View this table:
[in this window]
[in a new window]
 
Table 3. Welfare scores for 15 housing systems sorted according to the scores calculated with our model (transformed to a scale between 0 and 10)
 
The five systems with the highest scores were all systems with outdoor access including the Family Pen system (which uses open-fronted buildings). Their scores range from 7.78 for system 12 "huts with stalls" to 9.50 for system 15 "semi-natural."

Differences in welfare scores between housing systems were found (Friedman, P < 0.001), and multiple comparisons identified 47 significant differences (where P < 0.05) out of 105 combinations (i.e., 45% of all cases).

No significant differences between tethered and individual housing in stalls could be detected with the Friedman test, but the sign test was significant (P < 0.001).

Figure 3Go shows a boxplot of the (relative) welfare scores obtained from the experts as well as predictions made by our model. This figure suggests that according to the experts the set of 15 housing systems can be divided into roughly three groups: 2 low-, 7 or 8 mid-, and 5 high-welfare systems, with system 10 "Hurnik-Morris" being an intermediate mid- to high-welfare system.

However, the distinctions between low- and mid-welfare systems and between mid- and high-welfare systems were not confirmed with the Friedman test (Table 3Go). This may be due to the fact that for the Friedman test two experts with missing values had to be excluded, and that it only considers ranks and does not take the distance between scores into account. ANOVA with bonferroni correction, which does take distances into account, did show significant differences between low- and mid-welfare or between mid- and high-welfare systems, whereas system 10 "Hurnik-Morris" was identified as an intermediate system between mid- and high-welfare.

The variance around the median expert scores was smaller than for the weighting factors of the attributes. The interquartile ranges for the 15 housing systems range from 0.0 for the tethered system to 4.0 for system 9 "zigzag," with a median interquartile range of 2.00. Especially the low-welfare systems, 1 and 2, had low interquartile ranges.

The experts themselves assigned a median 90% range score of 2.00 (i.e., an uncertainty margin of two welfare points above and two points below the welfare score). The median 90% range scores correlated positively with the median welfare scores (Spearman’s Rho was 0.73, P < 0.05). This higher uncertainty about more welfare-friendly housing systems may result from the fact that relatively little is known about welfare in supposedly welfare-friendly housing systems, because up to now most research has focused on identifying and understanding welfare problems in prevalent conditions.

The absolute scores given by the experts differed considerably from the scores calculated with the model. With the model we calculated a range of absolute scores from 3.9 to 8.1 for the tethered and semi-natural systems, respectively (calculated from scores from Bracke et al., 2002). These scores fall just within the ranges of absolute scores given by the experts, which were 0.0 to 4.0 for the worst (tethered) system and 8.0 to 10.0 for the best systems. For the tethered system and individual housing in stalls the experts gave median absolute scores of 0.00 and 1.00, respectively. The semi-natural system received the highest absolute score (9.00), and the ESF with pasture system had the second highest absolute score (8.83).

For the experts the absolute scores for the set of 15 housing systems range from 0.00 to 9.00, whereas their relative scores range from 0.00 to 9.50. The relative scores, therefore, cover as much as 92% of the experts’ absolute welfare scale. By contrast, the relative scores calculated with our model cover only 42% of the model’s absolute welfare scale. This difference between the experts and the model is not very important. It may be due to the fact that our model is based solely on the logical possibilities to alter welfare within the model’s domain of housing systems and ignores aspects of economical feasibility and moral acceptability, which may have affected the experts’ choice of the end-points of their welfare scales.

For the set of all 15 housing systems the concordance between the experts in assigning relative welfare scores was 0.73 (Kendall’s W, P < 0.001), which increased to 0.79 (P < 0.001) when the one outlier expert was excluded. For the various subsets of housing systems W varied from 0.19 for the five high-welfare systems to 0.82 for the seven reference systems (all P < 0.01; Table 4Go). When our model was added as if it were an additional expert, W increased slightly (by about 0.01 point).


View this table:
[in this window]
[in a new window]
 
Table 4. Kendall’s coefficient of concordance among the experts (W) and Spearman rank correlation coefficients (Rho) between our model and expert opinion for various sets of housing systems and for the set of 20 attributes
 
The model correlated well with expert opinion. For the set of 15 housing systems, Spearman’s Rho was 0.92 (P < 0.01, Table 4Go). For the eight novel systems, which were those systems that had not been used to develop the model, Rho was 0.78 (P < 0.05).

In order to examine whether the model was also capable of making finer distinctions, we calculated Rho for the eight mid-welfare systems and for the five high-welfare systems separately. For these subsets Rho was 0.50 (not significant) and 0.90 (P < 0.05), respectively.

For the eight novel systems and for the eight mid-welfare systems our model performed as an "average" expert in that its correlation coefficient was at the 45 and 50 percentile level of how the individual experts correlated with the median expert scores. For all 15 systems and for the five high-welfare systems our model compared well to the individual experts, in that it performed at the 70 and 80 percentile level, respectively (Table 4Go).

We did not find large differences between our model and the median expert scores. The largest difference was found for system 4 "free-access stalls" (-1.34 points difference, with the model score lying at the 25 percentile level of the expert scores for this system).

The expert data can be used to estimate a confidence zone for scores obtained with the model. These zones represent measurement error, which for practical purposes is often centered around the scores (Nunnally, 1970, pp 554–556). Two factors that affect the confidence zone are the standard error and the correlation between the model and the experts. Spearman’s Rho is 0.92 (P < 0.01). As indicators of the "standard error" we have the median interquartile range around the experts’ welfare scores (which is 2.00), the median 90% range (which is also 2.00), and the maximum difference in welfare scores between the model and the experts (which is 1.34 welfare points for system 4 "free-access stalls"). A confidence zone of about 3.00 (i.e., 1.5 points above and below the score calculated with the model) would, therefore, seem to be proper. This is substantial and defines the scope for potential improvements in further research.


    Discussion
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
Consensus Among Experts
We believe that we have found a sufficient degree of consensus among experts to use the results to "validate" the model. Most importantly, we found clear significant differences between housing systems and between attributes. Furthermore, for both welfare scores and weighting factors Kendall’s coefficient of concordance among the experts was highly significant (P < 0.001). Finally, with a much larger group of different experts we confirmed in the present study the previous findings (Bracke et al., 1999b) for three levels of welfare for the seven main housing systems for pregnant sows.

These observations led us to conclude that we succeeded in probing the degree of consensus among experts using relative scores on a scale from 0 to 10, and that the results provide a frame of reference for welfare assessment in pregnant sows.

This is not to deny that differences exist between experts. Expert opinion is not fail-safe (cf. Siegel and Castellan, 1988, p 271). We consulted a substantial proportion of the world’s senior pig-welfare scientists, because we were interested in overall welfare (rather than in some aspect of welfare). For this reason we did not select topic-experts such as veterinary scientists, nutritionists, and climatic experts to validate our model, even though their views could be valuable to further improve the model. In our study some welfare aspects may have been missed. For example, one expert systematically identified wrong slat directions of the floor as a main argument for assigning welfare scores in our questionnaire. To our knowledge this has never been published, and, therefore, could not have been included in our science-based model, but if it were shown, it could be incorporated.

We believe that the degree of concordance among the experts was acceptable for welfare scores (Kendall’s W 0.73, P < 0.001), because it was within the range of what McDowell and Newell (1987, p 32) reported as typical findings (0.65 to 0.90) in the field of health measurement scaling, even though it falls below what they would call satisfactory (0.85). However, for the weighting of attributes Kendall’s W was only 0.43 (P < 0.001), which is low. This low concordance for attribute weighting is what we expected on theoretical grounds, and this was the reason why in the model we separated the calculation of weighting factors from assigning attribute scores (Bracke et al., 1999a). The low concordance relates to the larger variance of weighting factors (median interquartile range of 3.5 compared to 2.0 for welfare scores) and the lower percentage of significant differences found (Friedman test: 27% for attributes vs 45% for housing systems). The experts were also only moderately confident about the validity of their weighting factors as indicated by their median confidence score of 7.00, and they took only half as much time for weighting the 20 attributes compared to assessing the 15 housing systems (44 vs 81 min). Assessing housing systems may have seemed to be the more complex task because it is more comprehensive, but it is also more in accordance with the scientists’ profession. Logical errors were also encountered most evidently in the part on attribute weighting (e.g., some experts assigned weighting factors lower than 10 to minimum-requirement attributes, and one expert commented to have assigned a weighting factor of 6 to an attribute that contained both an important component with weight 9 and an unimportant component with weight 1). (An overall weighting factor of at least 9 would have been correct.)

Our finding that welfare assessment of housing systems has a higher degree of concordance than the weighting of attributes points toward a problem as well as a solution for the drafting of animal welfare legislation. The problem is that legislation focuses on prescribing many minimum-requirement attributes. A potential solution could be to focus instead on prescribing a minimum overall welfare status.

The experts assigned the highest weighting factors to the attributes "social contact," "health and hygiene status," "water availability," "space per pen," "foraging and bulk," "food agonism," "rooting substrate," "social stability," and "movement comfort." This list is largely in accordance with welfare priorities set by a group of 22 welfare scientists who independently contributed to a larger paper on farm animal welfare assessment (Anonymous, 2001). There, the main design criteria for pregnant sows were preliminarily identified as space (quantity and quality), substrate, social contact, and social stability (mixing) (which were all classified as most important), and the main welfare performance criteria were abnormal behavior, aggression, behavioral restrictions, and health problems.

Our study was the first to express weighting factors of attributes and welfare scores of housing systems as scores on a scale from 0 to 10. This allows a more quantitative assessment of what constitutes a substantial improvement for overall welfare, and what is only of minor importance. The relative scale used by the experts to assess housing systems covered as much as 92% of their absolute scale. This implies that the data set included in their opinion both truly high and truly low welfare systems, which the experts appear to have classified into three main groups (see Figure 3Go): low-, mid- and high-welfare systems.

In accordance with previous findings (Bracke et al., 1999b), the experts gave very similar welfare scores to tethered housing and individual housing in stalls (median scores of 0.00 and 0.56, respectively), with the latter system often receiving a slightly higher score (sign test, P < 0.01). In other words, the experts indicated that individual housing in stalls constitutes an improvement for welfare compared to tethered housing, but this improvement is not substantial.

A more substantial step seems to be made going from individual stalls (0.56) to group housing (with scores of 4.00 and up). However, we may ask whether group housing is necessary to improve welfare substantially. The experts identified social contact as the most important attribute (score 8.94). However, one of the mid-welfare systems was the zigzag system, which is an individual housing system with substrate and additional space. Therefore, group housing cannot be a necessary condition for welfare, and other attributes such as space and substrate for pregnant sows seem to be able to compensate, at least partly, for the lack of social contact in individual housing systems. This, again, identifies a problem for legislation or for any other attempt to capture welfare solely in terms of cut-off points. Welfare is a quantitative and multifactorial state, which allows at least some compensation between positive and negative attributes. All housing systems contain a mixture of such attributes, and it is the balance between them that results in a high or low welfare status of its animals.

A final implication of the experts’ opinion about welfare is that natural conditions are not normative for welfare. In the group of high-welfare systems the semi-natural system was scored as the best system (median absolute score: 9.00). This system, however, did not provide fully natural conditions in that predation, starvation, and extreme weather were all presumed to be absent. It is hard to imagine that these conditions would reduce, rather than improve, the overall welfare status. The second best system was ESF with pasture, which contains an electronic sow identification system for computer-controlled sow feeding (absolute score: 8.82). The Hurnik-Morris system was the most high-tech system. It was found to be intermediate between mid- and high-welfare systems (absolute score: 6.00), and several more natural systems received a lower score. These three systems show that the experts do not just equate welfare with the provision of natural conditions, but suggest that their views are more in accordance with an assessment based on the fulfillment of the animals’ needs as is done for the first time in an explicit way in our model.

Model Validation
Our model was designed to provide overall welfare assessment with a scientific basis. For this we aimed to formalize the experts’ implicit reasoning steps to assess welfare based on available scientific knowledge. In the development process of the model we used information collected from experts about their views on welfare in the seven main housing systems for pregnant sows to act as a reference. The present study confirmed that the model calculates welfare scores for these systems in accordance with expert opinion. Spearman’s Rho was 0.94 (P < 0.01). We also found a lower, but significant, correlation between the model and the experts for the eight novel housing systems (Rho was 0.78, P < 0.05). This indicates that the model can predict the welfare status for systems that have not been used to construct the model. A cautionary note, however, is that all housing systems in this study were derived from the literature. We have not shown that the model can predict welfare scores for truly novel concepts in pregnant sow housing and husbandry systems.

We also examined the model’s correlation with expert opinion for the whole set of 15 housing systems and for the mid- and high-welfare systems separately (Table 4Go). Except for mid-welfare systems significant correlations were found, and the performance of the model was at least at the 45 percentile level of the performance of the individual experts.

Further evidence why we think we were successful in modeling the experts’ implicit reasoning process was found for the weighting of attributes, where we found also a significant correlation (Rho was 0.73, P < 0.001), and a performance of the model at the 55 percentile level. Furthermore, when the model was added as if it were an additional expert, both for the different sets of housing systems and for the weighting of attributes Kendall’s coefficient of concordance (slightly) increased, indicating that the model was a "good" expert.

In general, the differences between the scores predicted with the model and the median expert scores were relatively small for the housing systems, but for some attributes, such as water availability, resting comfort, and exposure to cold, the discrepancy was large. For the attribute "movement comfort" our model even predicted as much as 6 points below expert opinion. Contrary to our model, the experts may have included space aspects into the attribute "movement comfort," which was a separate attribute in our model. For some reason three experts also changed the level rankings of this attribute, thereby preferring more slippery floors to less slippery ones (sic). Further research would be needed to determine which are the reasons for such differences in weighting between the model and the experts. One reason may be that in the model attributes were technically defined so as to reduce the overlap between them, whereas the experts may have used the "concept" of an attribute in a more integrated way. More detailed descriptions of the attributes with techniques such as fuzzy logic could possibly be useful to clarify the "fuzzy" borders between different attributes, and between different levels of an attribute, especially those involving minimum requirements.

Despite some differences, we believe that our model performed sufficiently well in predicting median expert scores as regards overall welfare scores and weighting factors of attributes. Therefore, we may well have succeeded in making the underlying reasoning process explicit. Our model is the first to derive overall welfare scores from scientific statements in a formalized way. This has two main advantages. It makes a quantitative welfare assessment possible, which has considerable utility (e.g., to evaluate measures for improved welfare). It also makes a structured assessment possible, where hidden assumptions and intermediate steps are made explicit. It enforces generic rules such that personal biases are controlled to a large extent, because it is not possible to make exceptions for individual cases unless there is general knowledge available that justifies making such an exception.

The model still requires an experienced user (Bracke et al., 2002), and we have not tested its inter- and intra-observer reliability. We have only validated the model with expert opinion, which, though not a truly independent measure, is the best available standard at present. Empirical validation would be welcome, but it is presently not known how to measure the overall welfare status directly or indirectly. Nevertheless, future scientific measurements can be used to further "validate" the model in an ongoing process. When such new knowledge is also incorporated in the decision support system to upgrade the model, the decision support system could remain the best possible assessment based on available scientific knowledge. Eventually, such a system may exceed human performance as it continuously integrates information. If this decision support system were generalized to other species, its value might increase even further, because there are not experts on every species, as is the case with pigs.


    Implications
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 
We found a considerable degree of consensus among pig welfare scientists when assessing housing systems for pregnant sows as regards the overall welfare status of the sows. We also found a certain, but less pronounced, degree of consensus when assigning weighting factors to attributes. These findings imply that welfare scientists are able to formulate an identifiable position and perspective on animal welfare. In addition, we found that our model correlated well with expert opinion, especially for overall welfare scores. This supports the validity of our model and its underlying assessment procedure, which makes explicit the scientists’ reasoning steps when assessing animal welfare. This implies that integrated farm animal welfare assessment, rather than being purely subjective, can now be performed in a structured and transparent way based on available scientific knowledge.


    Footnotes
 
1 This work was funded by the Technology Foundation (STW) (through a joint grant from the Netherlands Organization for Scientific Research and the Netherlands Ministry of Agriculture, Nature Management and Fisheries), and the Dutch Society for the Protection of Animals. Back

Received for publication February 1, 2001. Accepted for publication January 29, 2002.


    Literature Cited
 Top
 Abstract
 Introduction
 Materials and Methods
 Results
 Discussion
 Implications
 Literature Cited
 


Anonymous. 2001. Scientists’ assessment of the impact of housing and management on animal welfare. J. Appl. Anim. Welfare Sci. 4:1–52.

Backus, G. B. C. (ed.) 1997. Comparison of four housing systems for non-lactating sows [In Dutch]. Rep. P.1.171. Research Institute for Pig Husbandry, Rosmalen.

Baxter, S. 1984. Intensive Pig Production: Environmental Management and Design. Granada, London.

Bokma, S., and H. W. J. Houwers. 1988. Studiereis Zweden. Rep. P.3.20. Research Inst. for Pig Husbandry, Rosmalen.

Bracke, M. B. M., J. H. M. Metz, and B. M. Spruijt. 1999a. Overall animal welfare reviewed. Part 2: Assessment tables and schemes. Neth. J. Agric. Sci. 47:293–305.

Bracke, M. B. M., J. H. M. Metz, B. M. Spruijt, and A. A. Dijkhuizen. 1999b. Overall welfare assessment of pregnant sow housing systems based on interviews with experts. Neth. J. Agric. Sci. 47:93–104.

Bracke, M. B. M., B. M. Spruijt, and J. H. M. Metz. 1999c. Overall animal welfare assessment reviewed. Part 1: Is it possible?. Neth. J. Agric. Sci. 47:279–291.

Bracke, M. B. M., B. M. Spruijt, and J. H. M. Metz. 1999d. Overall animal welfare reviewed. Part 3: Welfare assessment based on needs and supported by expert opinion. Neth. J. Agric. Sci. 47:307–322.

Bracke, M. B. M., B. M. Spruijt, J. H. M. Metz, and W. G. P. Schouten. 2002. Decision support system for overall welfare assessment in pregnant sows A: Model structure and weighting procedure. J. Anim. Sci. 80:1819–1834.[Abstract/Free Full Text]

Brent, G. 1986. Housing the Pig. Farming Press, Ipswich.

Fraser, D. 1995. Science, values, and animal welfare: Exploring the ‘inextricable connection’. Anim. Welf. 6:187–205.

McDowell, I., and C. Newell. 1987. Measuring Health: A Guide to Rating Scales and Questionnaires. Oxford University Press, Oxford.

Morris, J. R., and J. F. Hurnik. 1990. An alternative housing system for sows. Can. J. Anim. Sci. 70:957–961.

Nunnally, J. C. Jr. 1970. Introduction to Psychological Measurement. McGraw-Hill, New York.

Ober, J., and D. M. Blendl. 1969. Schweineställe: Plannung, Bau, Einrichting. 6th ed. BLV Verlagsgesellschaft, München.

Pig Welfare Advisory Group. 1997a. Yards or Kennels with Floor Feeding. Booklet No. 7. Ministry of Agriculture, Fisheries, and Food Publications, London.

Pig Welfare Advisory Group. 1997b. Outdoor Sows. Booklet No. 8. Ministry of Agriculture, Fisheries, and Food Publications, London.

Rist, M. 1989. Artgemässe Nutztierhaltung: Ein Schritt zum wesensgemässen Umgang mit der Natur. 2nd ed. Verlag Freies Geistesleben, Stuttgart.

Spruijt, B. M., R. van den Bos, and F. T. A. Pijlman. 2001. A concept of welfare based on reward evaluating mechanisms in the brain: Anticipatory behaviour as an indicator for the state of reward systems. Appl. Anim. Behav. Sci. 72:145–171.[Medline]

Siegel, S., and N. J. Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences. 2nd ed. McGraw-Hill, New York.

Stolba, A. 1981. A family pen system in enriching pens as a novel method of pig housing. In: Anonymous (ed.) Alternatives in Intensive Husbandry Systems. In. Proc. Universities Federation for Animal Welfare, Potters Bar. pp 52–67.

Stolba, A., and D. G. M. Wood-Gush. 1989. The behaviour of pigs in a semi-natural environment. Anim. Prod. 48:419–425.


This article has been cited by other articles:


Home page
J ANIM SCIHome page
J. L. Salak-Johnson, S. R. Niekamp, S. L. Rodriguez-Zas, M. Ellis, and S. E. Curtis
Space allowance for dry, pregnant sows in pens: Body condition, skin lesions, and performance
J Anim Sci, July 1, 2007; 85(7): 1758 - 1769.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Bracke, M. B. M.
Right arrow Articles by Schouten, W. G. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bracke, M. B. M.
Right arrow Articles by Schouten, W. G. P.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS