Study design and patient selection
The sample comprised 79 patients (mean age, 30.8 years; SD, 12.0), of which 23 were men. The patients were treated for 9.8 months (SD, 3.8) on both arches, with an average of 27 clear aligners (SD, 15) in the maxillary arch and 25 clear aligners (SD, 11) in the mandibular arch.
Subjects were recruited prospectively at the Department of Orthodontics of the University of Turin, which was the coordinating center, and at five private Italian orthodontics offices. All co-investigating orthodontists have recognized clinical and teaching skills. In fact, practitioners completed an enrollment questionnaire to joining the trial, which collected information on the practitioners and their practices. The inclusion criteria for practitioners were as follows: certified orthodontist with huge and renewed experience in Invisalign treatments; with the ability to collect intraoral scans and upload (via internet) the files obtained to a central repository; affirming that the practice can devote sufficient time in patient scheduling to allow focused recording of all data required for the study; and does not anticipate retiring, selling the practice or moving during the study [25]. Signed, written informed consent was required before inclusion in the trial.
The five selected orthodontists had a mean age of 45.6 years (SD, 8.2) at the beginning of the study.
Patients were selected accordingly to the following inclusion criteria: complete permanent dentition, with the exception of third molars; Invisalign aligner treatment on both arches; active tooth movements programmed at the standard rate recommended by the Align Tech technician; no intermediate corrections or additional aligners; aligners change every 7 to 14 days. Exclusion criteria were the need for oral surgery or dental restorations, and for combo treatments (i.e., combination of aligners with any other orthodontic appliance); reported previous orthodontic treatment; presence of prosthetic restorations and/or periodontal problems; signs and/or symptoms of temporomandibular disorders. All participants included in this prospective observational study had Class I or mild Class II malocclusion with mild-to-moderate crowding or spacing in the maxillary and mandibular dental arches (non-extraction cases). Chewies to improve aligner seating and intermaxillary elastics were not used. Interproximal enamel reduction was performed as prescribed in each patient's virtual treatment plan.
Regarding treatment achievement, the real post-treatment.stl file, obtained by the final intraoral scan, was overlapped to the planned post-treatment.stl file, exported from the virtual setup. This procedure was repeated for each patient. Thus, a total of 2212 teeth were measured in the entire sample.
Ethics approval was obtained from the Research Ethics Board (Città della Salute e della Scienza di Torino #157/2020), and informed consent was acquired from each subject before entering the study.
The study protocol was registered on ClinicalTrials.gov (#NCT05356780).
Control appointments were fixed at 6-week interval in both the University and the private settings. At the delivery appointment, patients were instructed to wear their aligners for 22 h per day. Patients understood that they were part of a research study, and honest reporting of their compliance was critical. Compliance was also verbally confirmed at each appointment.
Measurement of predicted and obtained orthodontic tooth movement
Digital models were exported from the ClinCheck® software (Align Technology, San José, CA, USA) as stereolithography files. Final stage.stl files were labeled as “predicted outcome.” Stereolithography files were also obtained from the intraoral scans of the “refinement” stage or of the retention stage and labeled as “achieved outcome” since they represented the actual outcome after the first set of aligners [26].
All.stl files were deidentified, and soft tissues were digitally removed to ensure that the evaluation was based solely on tooth surface characteristics. The superimposition of the post-treatment.stl file (achieved outcome) on the planned final stage.stl file (predicted outcome) was performed using Geomagic® Qualify software (3D Systems, Rock Hill, SC, USA).
The dental arches were superimposed using the landmark-based method. The three anatomical landmarks were: the mesio-vestibular cusps of the first molars (1–2) and the mesial-incisal point of the right central incisor (3). The results of the overlay performed with the landmark-based method are presented in Fig. 1. Superimposition accuracy was increased by the surface-based method (best-fit alignment) using a best-fit algorithm [27].
Therefore, three reference planes were identified on the virtual treatment plan model (Fig. 2). The occlusal plane was created considering the midpoint of the right and left incisal edges and the tips of the mesio-buccal cusps of the right and left first molar [28]. The coronal plane passes through a midpoint between the facial axis (FA) points of teeth 17 and 27 and the midpoint of the right and left incisal edges. The coronal plane is perpendicular to the occlusal plane. The median plane passes through a midpoint between the incisors and is perpendicular to the occlusal and coronal planes [29].
The facial axis of the clinical crown (FACC) and FA points [30] were then placed on the post-treatment model too. The post-treatment model was segmented to isolate each tooth as a separate object. The software then superimposed each tooth from the segmented post-treatment model on the corresponding tooth of the non-segmented virtual treatment model using the best-fit surface-based algorithm. Finally, the differences between the achieved and predicted position of each tooth were calculated.
The following variables were considered for the analysis: angulation (mesial or distal tip), inclination (in–out, measured as the angle between the occlusal plane and a tangent plane passing through the FA point [30]), rotation (the rotation of each tooth was measured between the vector, which was created through the mesial and distal points, and the median plane), mesio-distal movement (distance between FA points and the coronal plane), vertical movement (distance between the FA points and the occlusal plane), and buccal/lingual movement (distance between the FA points and the median plane).
Because the software allows for differences that are too small to be clinically relevant, the threshold values were chosen with reference to the American Board of Orthodontics (ABO) model grading system for case evaluation [31]. According to the criteria of the “model grading system,” discrepancies of 0.5 mm or greater in the alignment of the contact points and marginal ridges result in point subtraction. A marginal ridge discrepancy of 0.5 mm is equivalent to a crown tip deviation of 2° for an average-sized molar. Therefore, differences of 0.5 mm or more in the mesial–distal, bucco-lingual and occlusal–gingival directions and differences of 2° or more in tip, torque and rotation were considered clinically relevant [13].
Statistical analysis
The lack of correction (LC), or the difference between the prescribed result and the achieved correction, represents the primary outcome of the study, while the amount of prescribed movement (PM), data extracted from the Invisalign website of each orthodontist and for each patient, constitutes the primary exposure. Both LC and PM represent continuous variables measured in millimeters, for linear measurements, and degrees, for angles. Another variable, type of movement (TM), identifies the movement associated to PM and LC. This comprised six categories such as angulation, inclination, rotation, and mesio-distal-, vertical- and bucco-lingual movements.
Regarding the categorical variable teeth, this was included in the datasets codifying each single type of tooth as a category, therefore 28 in total. Then, a new multi-level variable, teeth group (TG) was generated from the previous one including each tooth into a category of teeth with similar characteristics (maxillary arch: central incisors, lateral incisors, cuspids, bicuspids, first molars and second molars; mandibular arch: incisors, cuspids, bicuspids, first molars and second molars).
In addition, for each patient and for each TG, the mean values of LC and PM were estimated, averaging the corresponding values of the teeth included in the same category. Those means were incorporated in a new dataset as two variables mLC (LC means) and mPM (PM means) together with the variable TG and the patients’ identification.
The other tested predictors were age (in years), treatment time (in months), frequency of aligner change (every 7 days, 10 days or 14 days) and the three-level categorical “attachment” variable (none, conventional and optimized).
Specific aim #1
The first specific aim of the study was to establish whether the final position of the teeth achieved after the treatment was equal to that one of the virtual model obtained in the ClinCheck®. In other words, the null hypothesis stated that the variable mean lack of correction (mLC) was equal to zero. Before starting the analysis, we conducted a logarithmic transformation of the data to normalize the distribution. In fact, mLC was not normally distributed since the graph did not show a bell shape as well as the Shapiro–Wilk test was significant (P < 0.001). After that, to test the null hypothesis, we run a one-sample t test for each type of movement for every single group of teeth. The descriptive statistics of mLC and mPM stratified for group of teeth and movements has been summarized as median and interquartile range.
Specific aim #2
The second specific aim of the study was to address the question if the primary outcome LC was affected from some predictors such as the pretreatment prescription, the moved teeth, treatment duration, the employment of attachments and the frequency of aligner change as well as the patients’ age and sex. To achieve the aim, we built a multiple linear regression model using the cluster option, since we wanted to indicate that the observations were clustered into patients. Consequently, the correlation among the tooth movements belonging to the same patient was allowed. In addition, to perform a cautious analysis, the Huber–White sandwich estimator was used to obtain robust standard errors.
The primary outcome LC and the predictor PM were included in the dataset as log-transformed variables. In fact, PM as well as LC did not follow a normal distribution (Shapiro–Wilk test, P < 0.001). In addition, PM and LC comprised the observations derived from all types of movements, which were interpreted as units of movement and not as linear or angular value.
During the model building procedure, the other covariates were step-forward tested: frequency of aligners changes (every 7 days, 10 days or 14 days) and the three-level categorical variable “attachment” (none, conventional and optimized), along with age, sex and treatment time. When the result was nonsignificant, the variable was excluded from the final model. This was the case of the last three predictors or age, sex and treatment time.
The model goodness of fit was estimated by means of the coefficient of determination (R2). Afterward, we checked that the key assumptions of multiple linear regression had been respected. At first, the linear relationship between the outcome and the continuous predictors was tested with scatter plots. Then, the assumption of normality of residuals was confirmed graphing a standardized normal probability plot and collinearity was excluded after having estimated the Variance Inflation Factor (VIF; mean, 1.25). Finally, homoscedasticity was examined with the plot of standardized residuals versus predicted values. The α-level was fixed at 0.05. All data were analyzed using STATA 14.2 (StataCorp LP, College Station, Tex).
Sample size and reliability of the measurements
The sample size of the study was estimated a priori assuming an average lack of correction of 50%, or a difference of 0.5, as reported in the paper of Haouili et al. [8], a power of the test of 90% and an α-level of 0.05 (one-sample means t test). The standard deviation was assumed as 1. Under those conditions, the sample size amounted to 43 patients.
The reliability of the measurements was assessed using the intraclass correlation coefficient (ICC). The operator (MV) who performed all digital measurements repeated 20 measurements twice for each one of the six movements prescribed (240 measurements in total), with a 21-day interval between the two estimations. The ICC amounted to 0.99, showing an excellent agreement between the repeated measurements.