Facial esthetic outcome of functional followed by fixed orthodontic treatment of class II division 1 patients

Objectives To assess the perceived facial changes in class II division 1, convex profile patients treated with functional followed by fixed orthodontic appliances. Subjects and methods The study sample consisted of 36 pairs of pre- and post-treatment photographs (frontal and profile, at rest) of 12 patients treated with activator, 12 with twin-block, and 12 controls with normal profiles, treated without functional appliances. All photographs were presented in pairs to 10 orthodontists, 10 patients, 10 parents, and 10 laypersons. Visual analog scale (VAS) ratings of changes in facial appearance were assessed. Results The patient groups were similar in sex distributions, age, and treatment duration. The different rater groups showed strong to excellent agreement. There were no significant differences among treatment groups (F = 0.91; P = 0.526; Wilks lambda = 0.93), raters (F = 1.68; P = 0.054; Wilks lambda = 0.83), and when testing the combined effect of treatment and rater on the results (F = 0.72; P = 0.866; Wilks lambda = 0.85). The raters detected slightly more positive changes in the activator and twin-block groups, compared to the control group, regarding the lower face and the lips, but these findings did not reach significance. Furthermore, their magnitude hardly exceeded 1/20th of the total VAS length. Limitations Retrospective study design. Conclusions The perceived facial changes of convex profile patients treated with functional, followed by fixed orthodontic appliances, did not differ from those observed in normal profile patients, when full-face frontal and profile photos were simultaneously assessed. Consequently, professionals should be skeptical regarding the improvement of a patient’s facial appearance when this treatment option is used.


Introduction
Facial esthetics play a significant role in everyday life and interpersonal relationships [1]. Orthognathic and orthodontic irregularities are frequently accompanied by suboptimal facial esthetics. This includes class II malocclusions that have convex profiles and retruded mandibular position of hard and soft tissues [2][3][4].
During active growth, class II patients can be treated effectively with functional orthopedic appliances, where orthodontists attempt to modify the skeletal growth [5]. Activator and twin-block are two commonly used appliances aiming to enhance mandibular growth in patients with convex profiles due to a retrognathic mandible [5,6]. However, a recent systematic review revealed a relatively small improvement of the facial outline when removable functional appliances were used [5].
Improvement of facial esthetics, including the dental appearance, is the main reason for which patients seek orthodontic treatment [7]. Therefore, patients' satisfaction is fulfilled when their facial appearance is actually improved and not only when proper dentoskeletal relations are restored, according to objective measurements [8]. Thus, when assessing the orthodontic treatment outcome, which aims to improve facial esthetics, studies need to focus on the opinion of different groups of evaluators, including also the subjective layperson's opinion, which comprise the target group of our treatments. Previously, only a small favorable change in facial appearance was perceived when raters were asked to evaluate the esthetic outcome of functional orthodontic treatment on convex profile class II division 1 patients [9]. This underlines the need for more studies investigating the perceived improvement of facial appearance, achieved by treatments that have such aims.
Current literature focuses on the improvement of patients' facial profile following skeletal and dentoalveolar class II correction [2,3,5,[10][11][12][13][14][15][16]. Indeed, functional appliances may influence to some degree the patients' facial profile [5,10,11,13,15] and this might be perceivable by the human eye as more attractive [9,13,14]. However, facial attractiveness is evaluated in everyday life from different angles and not only from the profile view and this may impact esthetic assessments [17]. Furthermore, other characteristics, such as hair or skin texture, may also influence the perception of facial profile esthetics [16].
So far, there is only one study that investigated the esthetic improvement of convex profile patients, after treatment with functional followed by fixed orthodontic appliances, using actual images [9]. In that study, raters assessed actual facial profile photographs before and after the orthodontic intervention and perceived a slight improvement in the esthetic appearance. The primary aim of this study was to assess treatment outcomes taking into consideration the total facial appearance as viewed from profile and frontal photographs. Secondarily, possible differences between groups of raters, activator and twin-block appliances, and parts of the face were explored.

Material and methods
To allow for valid comparisons, the sample was identical and the design similar to that used on a previous study [9]. Data from that study were used to perform a power analysis, which showed that the sample size was adequate [9]. The sample was obtained consecutively, from the most recent patient records of the Department of Orthodontics at the Aristotle University of Thessaloniki that fulfilled the inclusion criteria. Two test groups and one control group, consisting of 12 persons each, were formed. Pre-treatment diagnostic records were used for sample selection. Post-treatment records were only reviewed to confirm availability.
The eligibility criteria for the test groups were (1) full pre-and post-treatment diagnostic records, (2) class II (more than half molar cusp bilaterally) division 1 malocclusion, (3) convex profile defined by facial contour angles (formed by the glabella-subnasale line and the extension of the subnasale-pogonion line) greater than 15°f or males and greater than 17°for females on the initial lateral cephalometric radiograph, (4) mixed dentition at start of the orthodontic intervention, (5) complete treatment with activator or twin-block followed by fixed orthodontic appliance treatment, (6) non-extraction treatment, (7) white racial background, and (8) no craniofacial malformations, syndromes, clefts, teeth absences, severe facial asymmetries, or functional mandibular shift over 1 mm [9].
The control group consisted of 12 patients who fulfilled the same criteria as the test groups but differed in the following: (1) class I or class II with less than a halfcusp distal molar relation bilaterally, (2) normal facial contour, and (3) complete treatment with fixed appliances, without the use of any functional orthodontic appliances.
As reported previously [9], the treatment groups were similar in sex distributions, age, and treatment duration. The activator and twin-block groups were also similar in facial convexity and pre-treatment overjet, but they differed significantly with the control group in these parameters. Post-treatment overjet was within normal values in all groups, suggesting successfully treated patients in this aspect. More detailed information on the sample characteristics is available in Additional file 1: Table S1, as well as in a previous publication [9].
The final sample consisted of pre-and post-treatment photographs of 36 patients (18 male and 18 female). During image acquisition, patients were positioned with the Frankfort horizontal plane parallel to the ground, teeth in maximum intercuspation, and lips at rest. All photographs were in digital form and were edited to have a white background and similar brightness and contrast. Any skin imperfections and any jewelry were digitally removed (Adobe Photoshop CS6, Adobe Systems, San Jose, CA, USA). This image processing was made to avoid bias due to factors affecting facial attractiveness which were, however, not related to the testing hypothesis.
The photographs were presented to 120 raters, which formed four different groups: 30 orthodontists (15 male, 15 female), 30 patients (15 male, 15 female), 30 parents of corresponding patients (15 male, 15 female), and 30 laypersons (15 male, 15 female). The patients' group of evaluators comprised class II division 1 patients who were treated in a local private practice during the study and were between 9 and 16 years of age. The rest of the groups consisted of adults between 20 and 65 years of age. During the distribution of the questionnaires to the parents and laypersons, it was taken into account to include adults of various socioeconomic statuses, educational levels, and fields. All raters had no relation with the patients in the sample, and orthodontists were not involved at their treatment. All raters were randomly selected and were the first 30 of each group that accepted to participate in the study.
All pre-and post-treatment photographs were presented in pairs. They were printed in a A4-size paper page, in landscape orientation, and were arranged in three photo albums (12 patients per album, six males and six females), according to a previously verified setup [9]. Each album consisted of four patients of each treatment approach (four activator, four twin-block, and four controls). This way, 10 assessments of each patient were obtained by each rater group. In each album, half of the patients were presented with the pre-treatment photographs on the left and the post-treatment photographs on the right and half were presented in reverse order. In addition, half of all patients in an album were presented with the profile photograph before the frontal one and half in reverse order. All photographs were aligned based on the lateral canthus of the eyes and were adjusted to be of the same size ( Fig. 1).
The raters were asked to complete a standardized and previously validated questionnaire [9,18,19], while looking at the presented album. First, they were asked to provide the following demographic information: gender, date of birth, profession, and education level. Afterwards, for each set of photographs, the raters answered five questions using a visual analog scale (VAS) of 0-100 mm; the left side of the scale was described as extremely negative and the right side was described as extremely positive. Each of the five questions referred to different regions of the face and was accompanied by an illustration to be easily perceived (Fig. 2).
The questionnaires were printed and distributed to all evaluators by a single researcher (M.Z.). The rating of the photographs was conducted in a quiet, nonclinical environment with adequate lighting. The researcher was in the room during the procedure, to answer potential questions, without however interfering in the evaluation. At the beginning, standardized instructions were given regarding the assessment process. Participants were not informed that the photographs showed orthodontic patients before and after treatment to ensure that their judgments were not biased. After viewing each photograph for as long as they considered necessary, raters made a mark on the VAS, according to their perception of facial change. Approximately 12 min were required for a questionnaire to be filled by an evaluator. This time is considered acceptable to avoid fatigue of the raters that might lead to unreliable responses [9].
Ratings were transformed to continuous metric variables for statistical analysis by measuring the distance between the start and the marks on the VAS with a digital caliper. All measurements were recorded in a Microsoft Excel sheet (Microsoft Office 365). In half of the sample, when the post-treatment condition was presented to the left, VAS measurements were adjusted by subtracting each value from 100 to conform with the other half of the ratings.
One month after the measurements of the questionnaires, the same researcher re-measured 30 VAS scores to assess method error. To assess repeatability of ratings, 12 raters (3 orthodontists, 3 patients, 3 parents, 3 laypeople-6 males, 6 females) reassessed the images after a 4-week washout period.

Statistical analysis
The statistical analysis was carried out by using SPSS software (version 20.0; IBM, Armonk, NY, USA). Levene's test showed homogeneity of variances in all cases. Data were tested for normality with the Shapiro-Wilk test and were not normally distributed in a few cases. Thus, parametric and nonparametric statistics were applied depending on normality.
Treatment group similarity was tested previously and proved adequate [9].
Intraexaminer agreement on the repeated VAS measurements was tested with the Wilcoxon signed-rank test. Random error was assessed with Dahlberg's formula.
Intrarater agreement (test-retest reliability) of repeated VAS ratings was tested with the Wilcoxon signed-rank Fig. 1 Photographs of a selected patient as presented to raters. The post-treatment photograph is presented to the left and the pre-treatment photograph to the right test and the intraclass correlation coefficient (two-way random model, absolute agreement, average measures). A one-sample t test was used for testing if the mean differences between the two measurements are statistically different from 0.
Internal consistency for professionals, patients, parents, and laypeople was assessed by the calculation of the Cronbach's alpha for each test group separately. The Cronbach's alpha was based on median scores of the assessors in each group. The effect of deleting each item once from a subscale on the obtained alpha values was also examined. A level above 0.8 was considered high consistency and above 0.7 was considered acceptable.
The interrater agreement among groups was determined by means of intraclass correlation coefficients (two-way mixed model, absolute agreement, average measures). Each patient was rated by 10 members from each rater group; therefore, the median VAS score for each item was used to obtain a more representative approximation of each group's assessments for the specific patient.
A level above 0.7 was considered strong agreement and moderate agreement was at 0.5 and 0.6. The validation of the specific questionnaire on a similar population has been published previously [9] and was further tested in the present study.
Two-way multivariate analysis of variance was used to evaluate differences among group ratings. The assessment score for each patient was calculated as described above for interrater agreement. Responses to the five items of the questionnaire were the five dependent variables, and the treatment groups (activator, twin-block, control group) and the rater groups (orthodontists, patients, parents, laypeople) were the independent variables. Equality of covariances of the dependent variables was tested with Levene's test for equality of error variances. Post hoc pairwise comparisons were performed with the Fisher's least significant difference test.
In all cases, a two-sided significance test was carried out at an alpha level of 0.05. The level of significance used for the study was set at 0.05. A Bonferroni correction was applied for pairwise a posteriori multiple comparison tests.

Results
There was no statistically significant difference between the first and second VAS measurements (intraexaminer error; P > 0.05); random error was minimal (0.27 mm). Fig. 2 The questionnaire provided to the raters. "Extremely negative" corresponds to 0 and "extremely positive" corresponds to 100 VAS value There was no statistically significant difference between repeated VAS ratings (intrarater agreement; P > 0.01) of all 12 raters. There was a strong to almost perfect intrarater agreement for less than half of the cases tested. Moderate to weak agreement was evident for the rest (Table 1). Mean differences between the two repeated ratings performed by 12 raters were minimal. The one-sample t test showed that in all cases, mean differences between the repeated ratings were lower than 7 VAS values (7%) and not significantly different from 0. However, the observed variation was high ( Table 2).
The internal consistency of the items of the questionnaire was generally acceptable both within and between groups, with a Cronbach's alpha value higher than 0.9 in all cases, except from the patient group of raters, where the value was lower, but still above 0.7. The explorative elimination of any item consistently did not increase alpha values significantly in any case. Thus, it was reasonable to keep all items ( Table 3).
The different rater groups showed strong to excellent agreement upon rating of each treatment group in all items, although the confidence intervals for certain cases were wide (Table 4).
Raters assessed changes induced by aging and treatment as slightly positive in all treatment groups, although a wide individual variation was evident (Figs. 3 and 4). Tests of between-subjects effects did not reveal any significant differences among treatment groups (P > 0.05).
Although changes in the activator and twin-block groups were judged as slightly more positive than in the control group, particularly in the lower face and the lips, these findings were not statistically significant. Furthermore, their magnitude was negligible, since it hardly exceeded 1/20th of the total VAS length in its highest value (Table 5).

Discussion
Class II malocclusions, which have a common occurrence in contemporary societies, are reflected in the appearance of the lower face leading to a convex facial profile. This feature may negatively affect the esthetic appearance of the face. Thus, the improvement of facial convexity comprises a major aim of any such orthodontic treatment. The present study is the continuation of a previous one [9] evaluating the esthetic improvement of convex profile patients, after treatment with functional appliances followed by fixed orthodontic appliances. In that study, raters assessed facial profile photographs before and after the orthodontic intervention and perceived an improvement in the esthetic appearance of the face. On the contrary, the present study, where the raters assessed simultaneously frontal and profile photographs, did not reveal any significant treatment effect on patients' facial appearance.
The methodology was identical to that of the previous study [9], enabling a direct comparison of the two, and allowing for an assessment of the potential influence that the addition of the frontal photograph has on evaluating orthodontic treatment outcomes. We found that any favorable treatment effects previously identified on facial profiles [9] diminished when a more global assessment of facial appearance was performed. This suggests that when the raters assessed only the profiles images, the treatment effect was evident, since the assessments focused exactly on the treatment target area, namely the facial profile. However, in the overall assessment of facial appearance, the raters probably did not only focus solely on the profile, but also on other facial features. These findings are in accordance with another study showing that facial convexity does not affect facial esthetic assessment of frontal photos at rest [20]. For proper interpretation of the findings, it should be considered that the objective profile improvement achieved in the present sample (Additional file 1: Table S1) is similar to the one reported in the literature for this treatment approach [5]. Furthermore, the overjet of the class II division 1 patients was considerably improved by treatment, reaching normal values, which also implies that the treatment was completed successfully. Our findings add doubt to the premise that functional orthodontic treatment has a substantial favorable effect on patients' facial appearance. Based on objective measurements, there was a definite improvement of the facial profile due to treatment and growth, though it did not reach control values; at T1, the median facial contour angle in the activator and twin-block groups was 17°(T0, 20°), whereas in the class I group, it was 12°( T0, 12°) (Additional file 1: Table S1). However, if this improvement is not perceivable by the human eye when the overall facial appearance is considered, then no positive effect of treatment on patients' lives is expected.
It is a common strategy in previous studies to use profile silhouettes, facial outlines, or black and white images in an attempt to control for confounding factors that may affect judgments of facial esthetics [16]. However, modified photos do not reflect the real conditions in everyday interactions and may also affect ratings inconsistently [9]. To our knowledge, both this and the previous study [9] are the only studies that used actual patient images to investigate the esthetic improvement of convex profile patients, after treatment with functional appliances followed by fixed orthodontic appliances. However, the present study design is a better simulation of actual human interaction, since people look at each other from various angels during social occurrences. It should be noted that both studies assessed the effect of treatment on static facial appearance, since that was the original aim. Facial expressions, such as the smile, may also influence the perception of facial esthetics [21]. Thus, a favorable effect of treatment in facial appearance during functioning cannot be excluded from the present findings. This might also be attributed to favorable changes in the dental appearance,  which can then affect the overall facial appearance perception, though to a limited extent [22]. The intrarater error and the variation of assessments were higher in the present study compared to the previous one [9], suggesting that an increase of the given information added complexity to the way change was perceived by the human eye. However, in an actual everyday interaction, the information that the human eye transfers to the brain is quite higher, even compared to the present setup. Thus, the present design can be considered as more representative to actual conditions, compared to the previous one of profile assessment [9], but it still represents an oversimplification of the actual interactions between people in real-life conditions.
Subjective factors, such as personal opinion, environmental influence, ethnicity, and education, can affect the assessment of beauty and attractiveness by an individual [23,24]. Experts may focus on achieving "flawless" skeletal and dentoalveolar class I relations, while laypersons may evaluate an individual's appearance based on their personal experiences [25]. Therefore, the goals of orthodontic treatment set by professionals may not meet patients' and parents' expectations and may differ from laypersons' assessments [3,9,18]. Nonetheless, orthodontic treatment should be able to improve a patient's appearance in his/her eyes and in the eyes of laypeople, when such treatment goals have been set during planning.
A previous study on convex profile patients treated with surgical advancement of the mandible reported that a favorable treatment outcome was also seen on frontal photographs, though to a lesser degree compared to the profile assessments [24]. Thus, it could be argued that the changes induced by conventional orthodontic treatment did not reach certain thresholds, in regards to magnitude, to affect facial esthetic perception considerably. Furthermore, the same study found that the perceived improvement was doubled when the raters were aware of the treatment status, which is supporting our decision to not disclose this to the raters.

Limitations
The most important limitation of the study is the retrospective collection of the rated cases. Retrospective studies are more susceptible to selection bias. To account for this, strictly defined eligibility criteria were applied to cases identified through a consecutive search of the archives. Thus, all patients that fulfilled these criteria were included, until the pre-determined sample size was reached. A further measure to minimize selection bias included the assessment of only the pre-treatment diagnostic records in the sample selection process. Thus, the risk to select cases based on the outcome was diminished. Post-treatment records were only used after the inclusion of a subject in the study. A full prospective randomized design would be ideal, but it might be unrealistic to be implemented due to time considerations.
The use of untreated class II division 1 patients could have also been an appropriate group to control for the effect of growth. We searched for such a group, but it was not possible to find one. It would have also been problematic to try to generate it, since not providing treatment to patients in need raises ethical and legal concerns. Even if available, this is not expected to have considerably affected the findings, since the effect of treatment and growth on a patient's profile was found to be minimal, even in the treated group. The present control was suitable to test the effects of aging and setting factors, and thus, it met the needs of the study. The change of facial appearance perceived in the control group was mainly due to aging, since the profile was straight before and after treatment. In the convex profile group, more favorable change would have been seen if treatment had provided the desirable outcomes, those that are perceivable by people.

Conclusions
The perceived facial changes of convex profile patients treated with functional appliances, followed by fixed orthodontic appliances, did not differ from those observed in normal profile patients, when full-face frontal and profile photos were simultaneously assessed. Consequently, professionals should be skeptical regarding the improvement of a patient's Fig. 4 Box plots showing the assessed changes from pre-to post-treatment condition in VAS values (y-axis), grouped by rater type. The upper limit of the black line represents the maximum value, the lower limit the minimum value, the boxed the interquartile range, and the horizontal black line the median value. Outliers (> ± 3SD) are shown as black dots facial appearance when this treatment option is used. Perhaps more drastic approaches should be considered in the case of convex profile patients with significantly compromised facial esthetics, especially when the patients' and parents' esthetic demands are high.