Clinical effectiveness of Invisalign® orthodontic treatment: a systematic review

Background Aim was to systematically search the literature and assess the available evidence regarding the clinical effectiveness of the Invisalign® system. Methods Electronic database searches of published and unpublished literature were performed. The reference lists of all eligible articles were examined for additional studies. Reporting of this review was based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Results Three RCTs, 8 prospective, and 11 retrospective studies were included. In general, the level of evidence was moderate and the risk of bias ranged from low to high, given the low risk of bias in included RCTs and the moderate (n = 13) or high (n = 6) risk of the other studies. The lack of standardized protocols and the high amount of clinical and methodological heterogeneity across the studies precluded a valid interpretation of the actual results through pooled estimates. However, there was substantial consistency among studies that the Invisalign® system is a viable alternative to conventional orthodontic therapy in the correction of mild to moderate malocclusions in non-growing patients that do not require extraction. Moreover, Invisalign® aligners can predictably level, tip, and derotate teeth (except for cuspids and premolars). On the other hand, limited efficacy was identified in arch expansion through bodily tooth movement, extraction space closure, corrections of occlusal contacts, and larger antero-posterior and vertical discrepancies. Conclusions Although this review included a considerable number of studies, no clear clinical recommendations can be made, based on solid scientific evidence, apart from non-extraction treatment of mild to moderate malocclusions in non-growing patients. Results should be interpreted with caution due to the high heterogeneity. Electronic supplementary material The online version of this article (10.1186/s40510-018-0235-z) contains supplementary material, which is available to authorized users.


Background
Orthodontic developments, especially during the last years, have been accompanied by a significant increase in the esthetic demands of the patients. Patients often express the need to influence, or even determine, treatment aspects or objectives, along with the orthodontist, driven by the effects that orthodontic appliances have in their appearance. Conventional orthodontic methods have been associated with a general compromise in facial appearance [1] raising a major concern among patients seeking orthodontic treatment [2]. Thus, esthetic materials and techniques have been introduced in clinical practice aiming to overcome these limitations [3].
Since its development in 1997, Invisalign® technology has been established worldwide as an esthetic alternative to labial fixed appliances [4][5][6][7]. CAD/CAM stereolithographic technology has been used to forecast treatment outcomes and fabricate a series of custom-made aligners using a single silicone or digital impression [6]. After its introduction, the system has been drastically developed and continually improved in many aspects; different attachment designs, new materials, and new auxiliaries, such as "Precision Cuts" and "Power Ridges" were designed to enable additional treatment biomechanics. According to the manufacturer, Invisalign® can effectively perform major tooth movements, such as bicuspid derotation up to 50°and root movements of upper central incisors up to 4 mm [8]. Despite the advocated efficiency of the treatment, its clinical potency still remains controversial among professionals, with advocates being convinced by the successfully demonstrated treated cases, as indicated by clinical evidence, in contrast to opponents who argue about significant limitations, especially in the treatment of complex malocclusions [5,[9][10][11].
Despite the available body of literature pertaining to Invisalign® technology, its clinical performance has been analyzed less thoroughly and a synthesis of the results still remains vague. Four systematic reviews about clear aligners exist in the literature: the first one was published back in 2005 and assessed the treatment effects of Invisalign; it included, nevertheless, only two studies [12]. More recently, another three reviews have been published. The first one was last updated in June 2014; it included 11 studies and evaluated the control of the clear aligners on orthodontic tooth movement [13]. The second one evaluated the periodontal health during clear aligner therapy and was published in the same year [14], and the most recent one was undertaken in October 2014 and included four studies, since it focused on the comparison between clear aligners and conventional braces [15].
Therefore, the purpose of the present review was to systematically search the literature and summarize the current available scientific evidence regarding the clinical effectiveness of the Invisalign® system as principal orthodontic therapy to orthodontic patients of any age treated with this method comparing either among them or those with conventional braces and evaluating the level of efficacy in various malocclusions.

Types of studies
Randomized clinical trials (RCTs), controlled clinical trials (CCTs), and prospective and retrospective studies were considered eligible for inclusion in this review. These studies concerned to the clinical part of treatment with Invisalign, with no restrictions in language, age, status of publication, and cases with teeth extractions.

Types of participants
Orthodontic patients of any age who were treated with Invi-salign® either as the intervention or as the control group.

Types of interventions
Invisalign® therapy. All other aligner systems have been excluded.

Outcome
Any effect on clinical efficiency, effectiveness, treatment outcomes, movement accuracy, or predicted tooth movement in ClinCheck® of Invisalign® treatment, including changes in alignment or occlusion, treatment duration, and completion rate, as primary outcomes. Adverse events/unwanted effects have also been recorded.

Search methods for identification of studies
Detailed search strategies were developed and appropriately revised for each database, considering the differences in controlled vocabulary and syntax rules. The following electronic databases were searched: MEDLINE (via Ovid and PubMed, Appendix, from 1946 to August 28, 2017), Embase (via Ovid), the Cochrane Oral Health Group's Trials Register, and CENTRAL.
Unpublished literature was searched on ClinicalTrials.gov, the National Research Register, and Pro-Quest Dissertation Abstracts and Thesis database.
The search attempted to identify all relevant studies irrespective of language. The reference lists of all eligible studies were examined for additional studies.

Selection of studies
Study selection was performed independently and in duplicate by the first two authors of the review, who were not blinded to the identity of the authors of the studies, their institutions, or the results of their research. Study selection procedure was comprised of title-reading, abstract-reading, and full-text-reading stages. After exclusion of not eligible studies, the full report of publications considered eligible for inclusion by either author was obtained and assessed independently. Disagreements were resolved by discussion and consultation with the third and the last author. A record of all decisions on study identification was kept.

Data extraction and management
The first two authors performed data extraction independently and in duplicate. Disagreements were resolved by discussion or the involvement of two collaborators (third author and last author). Data collection forms were used to record the desired information. The following data were collected on a customized data collection form: Author/title/year of study Design/setting of the study Number/age/gender of participants Intervention and comparator/treatment duration Type of clinical outcome Method of outcome assessment

Measures of treatment effect
For continuous outcomes, descriptive measures, such as mean differences and standard deviations, were used to summarize the data from each study. For dichotomous data, number of participants with events and total number of participants in experimental and control groups were analyzed.

Unit of analysis issues
In all cases, the unit of analysis was the patient.

Dealing with missing data
We contacted study authors per e-mail to request missing data where necessary. In case of no response or no provision of the missing data, only the available reported data were analyzed.

Data synthesis
A meta-analysis was planned only if there were at least two studies of low or unclear risk of bias, reporting similar comparisons, and similar outcomes at similar time points. Otherwise, qualitative synthesis of the included studies would be performed.

Quality assessment of included studies
The risk of bias for RCT studies was assessed by two review authors, independently and in duplicate, using the Cochrane risk of bias tool [16].
Risk of bias was assessed and judged for seven separate domains.
1. Sequence generation: was the allocation sequence adequately generated? 2. Allocation concealment: was allocation adequately concealed? 3. Blinding of participants and investigators: was knowledge of the allocated intervention adequately prevented during the study? 4. Blinding of outcome assessors: was knowledge of the allocated intervention adequately prevented before assessing the outcome? 5. Incomplete outcome data: were incomplete outcome data adequately addressed? 6. Selective outcome reporting: were reports of the study free of suggestion of selective outcome reporting? 7. Other sources of bias: was the study apparently free of other problems that could put it at a high risk of bias?
Each study received a judgment of low risk, high risk, or unclear risk of bias (indicating either lack of sufficient information to make a judgment or uncertainty over the risk of bias) for each of the seven domains. Studies were finally grouped into the following categories: -Low risk of bias (plausible bias unlikely to seriously alter the results) if all key domains of the study were at low risk of bias. -Unclear risk of bias (plausible bias that raises some doubt about the results) if one or more key domains of the study were unclear.
-High risk of bias (plausible bias that seriously weakens confidence in the results) if one or more key domains were at high risk of bias.
Prospective and retrospective studies were graded as low, moderate, or high risk of bias according to the following criteria, adapted from the Bondemark scoring system [17]: -Low risk of bias (all criteria should be met): Randomized clinical study or a prospective study with a well-defined control group. Defined diagnosis and endpoints. Diagnostic reliability tests and reproducibility tests described. Blinded outcome assessment. -Moderate risk of bias (all criteria should be met): Cohort study or retrospective cases series with defined control or reference group. Defined diagnosis and endpoints. Diagnostic reliability tests and reproducibility tests described. -High risk of bias (one or more of the following conditions): Large attrition. Unclear diagnosis and endpoints.
Poorly defined patient material.
The Grading of Recommendations Assessment, Development and Evaluation (GRADE) [16] was implemented to assess the overall quality of evidence for the studies included in this systematic review, according to which the overall evidence is rated as high, moderate, low, and very low. The outcomes included in GRADE were divided into categories regarding the different parameters that had been assessed in the primary studies.
High quality of evidence implies that the true effect lies close to that of the estimate of the effect Moderate quality of evidence implies that the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Low quality of evidence implies that our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect Very low quality of evidence implies that the true effect is likely to be substantially different from the estimate of effect.

Study selection
The electronic search initially identified 227 relevant articles. One hundred fifty-eight papers remained after exclusion on the basis of title-reading. Five articles were added through hand-searching. After 49 duplicates' removal, 114 papers were assessed for screening, and after abstract-reading, 85 studies were excluded leaving 29 articles to be read in full-text. After the application of specific inclusion and exclusion criteria, another seven articles were removed. In total, 22 studies were considered eligible for inclusion in the final analysis (Fig. 1).

Quality analysis
The quality assessment of the 22 studies is shown in Tables 3 and 4.

RCTs
The three RCTs [18][19][20] were judged to be at an overall low risk of bias, due to the low risk of bias that applied to each domain based on the Cochrane risk of bias tool [16] (Table 3).

Prospective studies
Three prospective studies [21,26,35] were graded as moderate and five [5,22,24,25,27] as high risk of bias. Although they were all studies of prospective design, no blinding in relation to outcome assessment was reported in all except one [27] study, which also lacked control, among other limitations (Table 4).
-Overcorrection in the final    include any diagnostic reliability and reproducibility tests ( Table 4).

Qualitative synthesis of the included studies Study settings
An overview of the experimental design of the included studies is presented in Table 1. Eight studies [5,21,22,24,30,[34][35][36] used patients' virtual ClinCheck® models of the predicted tooth movement as control group, aided by ToothMeasure® [5,21,22,24,[34][35][36] or Geomagic Qualify [30], in order to investigate the treatment's efficacy. More specifically, the extent that the initial and final actual models were different from the initial and final virtual models after treatment was evaluated. However, two of them had similar samples and outcomes with two other studies, namely [5] with [24,35] with [36]. We decided not to exclude any of these studies, since additional information was provided.
All studies tested mainly non-growing patients, and most of them included patients of an average age of 30 years [5, 19-21, 29-31, 34-38]. Non-extraction cases were used as study samples in nine studies [18, 28-33, 37, 38]. Treatment duration differed among and within studies, as expected according to malocclusion severity and the implemented intervention. Six studies [18,22,29,[34][35][36] did not report on treatment duration. Finally, only one study [37] reported post-retention treatment outcomes by comparing the induced changes in patients treated with Invisalign® with those treated with traditional fixed appliances. The evaluation was conducted at a maximum post-retention time of 3 years after appliance removal, with all the patients undergoing at least 1 year of retention. Table 2 gives an overview of the results of the included studies regarding clinical parameters, grouped in the following three subject categories.
B. Invisalign® vs traditional fixed appliances Seven studies [18,19,23,28,33,37,38] compared Invisa-lign® orthodontic treatment outcomes to that of conventional fixed appliances. A recent RCT study [18] found no significant difference in the amount of mandibular incisor proclination produced by Invisalign® and fixed labial appliances in mild crowding cases, supported by a retrospective study [23], which also concluded that treatment duration in these cases was similar for the two methods, though Invisalign was not so successful in root alignment. Gu et al. [28] reported similar outcomes, but shorter duration with Invisalign, for mild to moderate malocclusions. However, worse performance of Invisalign was noted in more severe cases, a finding also supported by Djeu et al. [38]. In the same line, in a RCT study, Li et al. [19] concluded that both therapeutic approaches can succeed in class I adult extraction cases, though Invisalign required more time and was less able to correct bucco-lingual inclination and occlusal contacts. The latter findings are also in agreement with those of two retrospective studies [33,38].
Differences between the two methods in post-retention alterations were investigated in one retrospective moderate risk of bias study [37]. Greater relapse was found 1-3 years posttreatment after Invi-salign® treatment compared to conventional orthodontic therapy with fixed appliances.
C. Invisalign groups only In an early exploratory study, Vlaskalic and Boyd [25] concluded that Invisalign® may be more beneficial for patients in the permanent dentition with mild to moderate malocclusions after careful treatment planning. Another early exploratory RCT study [20] also concluded that non-extraction treatment of milder malocclusions has greater chances to be successfully treated by Invisalign.
Three recent retrospective studies also tested various Invisalign groups. One showed the moderate ability of Invisalign to manage overbite [29]. More specifically, normal overbite was well maintained, but deep bite was partially corrected, through mandibular incisor proclination. Open bite was also partially corrected, but mainly through incisor extrusion. On the other hand, a second study [31] reported the ability of Invisalign to bodily distalize maxillary molars in adult nonextraction mild class II cases (≤ ½ cusp), with no changes in facial height. Finally, a third study [32] showed the ability of Invisalign to correct mild to moderate crowding nonextraction cases without causing significant changes in the mandibular incisor position and inclination. On the contrary, such changes (protrusion and proclination) were induced in cases with severe crowding (≥ 6 mm).
The Grading of Recommendations Assessment, Development and Evaluation (GRADE) [16] was implemented to assess the overall quality of evidence for the studies included in this review and for outcomes that were assessed by two or more studies. GRADE tables illustrate the outcomes that were assessed by two or more studies (Additional file 1, 2, and 3).

Quantitative synthesis of the included studies
The lack of standardized protocols impeded a valid interpretation of the actual results through pooled estimates. Substantial differences in the implemented interventions, participants' characteristics (age and gender distribution), treatment duration, and investigated outcomes indicated significant methodological heterogeneity. Therefore, a meta-analysis was not feasible.

Discussion
In order to successfully deliver orthodontic treatment, clinicians need to carefully plan an appropriate therapeutic approach based on the current scientific evidence. Although this is not the only determining factor for the final decision, as clinical experience and patient's opinion also play an important role, this information needs to be taken into consideration to assess the possibilities and limitations of each treatment modality.
With regard to Invisalign®, to date, there are four systematic reviews available, pertaining to clinical effects of the system [12][13][14][15], with one of them [14] evaluating periodontal health issues. Given the limited available evidence in certain earlier attempts [12,15] and the evaluation of the effectiveness of Invisalign® under the wider spectrum of clear aligners [13,15], strong conclusions regarding the investigated clinical efficiency of the Invisalign® system were not feasible. This ambient obscurity on a highly increasing treatment approach was the reason to perform a systematic search of the literature and assess the available scientific evidence with respect to the clinical outcomes of the Invisalign® orthodontic treatment. Due to the relatively unexplored topic, an attempt was made to conduct the present systematic review to a high standard, in order to minimize any chance of bias, but also include all the available information.
Considerable differences in participants' characteristics, types of interventions, reporting of clinical outcomes, and treatment's duration was evident, thus, preventing the implementation of a meta-analysis. More specifically, the number of patients recruited ranged from 6 [22] to 152 [19], which indicates a strong methodological difference among the study protocols and in strength of the stated results. Concerning the age of the patients that underwent treatment with Invisalign®, it varied between 13 [34] and 61 [30] years, with all studies primarily including non-growing patients, most of them having an average age of 30 years [5, 19-21, 29-31, 34-38], and most of them with moderate [21,[29][30][31][35][36][37][38] and high [5,34] risk of bias. This reveals a strong lack of information for growing individuals and indicates that Invisalign® is at present a preferred treatment option for late adolescent and adult patients, who usually have higher esthetic demands.
As for the overall treatment duration, there were different completion criteria and varying outcomes among and within studies. When compared to conventional appliances, the Invisalign® system showed significantly shorter treatment duration in three studies [28,33,38], while no difference was reported in another study [23]. All these studies evaluated nonextraction treatment of mild to moderate malocclusions and scored as moderate risk of bias. On the contrary, one study on extraction treatment reported longer duration for Invisalign treatment [19], with low risk of bias. Thus, it seems that Invisalign might treat faster mild nonextraction cases, but it requires more time than fixed appliance treatment for more complex cases.
Substantial variation in the investigated clinical outcomes was noted among studies. The majority of them focused on the accuracy of Invisalign® or its comparison to conventional fixed appliances. The first was found sufficient when certain malocclusion features, such as overjet or anterior arch length discrepancy, were tested [35,36] or for maxillary molar distalization [34]. The efficacy on maxillary molar distalization (≤ ½ cusp) was also supported by another clinical study [31]. However, important limitations were reported for bodily expansion of the maxillary posterior teeth [21,30], canine [5,24] and premolar [34] rotational movements, extrusion of maxillary incisors 5, and in overbite control [35,36]. All of these referred studies scored as moderate according to Bondemark scoring system [17]. Based on these findings, the use of additional attachments or overcorrections was commonly suggested in the literature for these types of movement. As for the comparison to fixed appliances, from studies with moderate [23,28] to low [18] risk of bias, it seems that Invisalign performs well in mild to moderate non-extraction cases [18,23,28], but it cannot equally succeed in more difficult cases, including extraction cases [19,27,28,33,38]. Teeth inclinations and occlusal contacts seem to be among the major limitations of Invisalign [19,33,38], most of them judged as moderate [23,33,38] risk of bias and only two with low [18,19]. The results from studies that included only different Invisalign groups are in agreement with the abovementioned findings [20,25,29,32].
In addition, only one study [37], graded as moderate, included a post-treatment observational period investigating the stability of treatment outcomes with Invisalign®, indicating a general lack of information with regard to retention. Although the amount of evidence is limited, this study showed more relapse in the Invisalign cases, as compared to fixed appliance treatment, that might be attributed to the inadequacies in obtaining certain bodily movements and solid occlusal contacts.
Overall, evidence was of moderate quality. Apart from the three RCTs [18][19][20], where a low risk of bias was considered, the remaining prospective and retrospective studies were graded as moderate [21,23,26,[28][29][30][31][32][33][34][35][36][37][38] or high [5,22,24,25,27,34] risk of bias. The studies' review showed high amount of heterogeneity in terms of methodology and outcome reporting that impeded a valid interpretation of the actual results through pooled estimates. However, there was substantial consistency among researchers that the Invisalign® system is a viable alternative to conventional orthodontic therapy in correcting mild to moderate malocclusions, without extractions. Moreover, when the treatment is carefully planned, Invisalign® aligners can safely straighten dental arches in terms of leveling and derotating the teeth, except for canines and premolars. Finally, crown tipping can be easily performed. On the other hand, important limitations include arch expansion through bodily tooth movements, extraction space closure, corrections of occlusal contacts, and larger antero-posterior and vertical discrepancies.
All things considered, it is evident that more high-quality research of prospective design with respect to the clinical outcomes of Invisalign® needs to be carried out in the future. A standardized methodology including control samples would be valuable in obtaining comparative results with conventional approaches. Furthermore, though more than half of the studies included in the present review have been published in the last 5 years (range 2012-2017), the findings of the review should be interpreted with some caution; the continuous improvement of the Invisalign system (especially in 2013 with SmartTrack® material) [39] may not allow for direct synthesis and valid comparisons between older studies with the most recent ones, as the inclusion of data from different iterations of Invisalign material may become a factor of bias. This is, of course, a major consideration when synthesis of studies' results for clinical evidence is concerned, in an era that software, scanners, and 3D printer costs are more affordable and potential in-house printing of aligners is rapidly growing. Last but not least, the long-term effectiveness pertaining to retention outcomes also needs further investigation, whereas complete lack of evidence is evident for growing patients.

Conclusions
Despite the fact that orthodontic treatment with Invisa-lign® is a widely used treatment option, apart from non-extraction treatment of mild to moderate malocclusions of non-growing patients, no clear recommendations about other indications of the system can be made, based on solid scientific evidence.
Although this review included a considerable number of studies, treatment outcomes need to be interpreted with caution due to the high heterogeneity. Further research with parallel arm RCTs or well-designed prospective trials are needed to form robust clinical recommendations for a wide spectrum of malocclusions and for growing patients.
Albeit the existing limitations, the following conclusions were made, based on the available evidence: Invisalign might treat faster mild non-extraction cases, but it requires more time than fixed appliance treatment for more complex cases. Invisalign® aligners can safely straighten dental arches in terms of leveling and derotating the teeth (except for canines and premolars, where a small inadequacy was reported). Crown tipping can be easily performed. Teeth inclinations and occlusal contacts seem to be among the limitations of Invisalign®, when accuracy of planned movements achieved with aligners is concerned. Use of additional-novel attachments might be more effective for various types of movement, such as bodily expansion of the maxillary posterior teeth, canine and premolar rotational movements, extrusion of maxillary incisors, and in overbite control.