Abstract

The ability of government agencies to assign accurate ages of fish is important to fisheries management. Accurate ageing allows for most reliable age-based models to be used to support sustainability and maximize economic benefit. Assigning age relies on validating putative annual marks by evaluating accretional material laid down in patterns in fish ear bones, typically by marginal increment analysis. These patterns often take the shape of a sawtooth wave with an abrupt drop in accretion yearly to form an annual band and are typically validated qualitatively. Researchers have shown key interest in modeling marginal increments to verify the marks do, in fact, occur yearly. However, it has been challenging in finding the best model to predict this sawtooth wave pattern. We propose three new applications of time series models to validate the existence of the yearly sawtooth wave patterned data: autoregressive integrated moving average (ARIMA), unobserved component, and copula. These methods are expected to enable the identification of yearly patterns in accretion. ARIMA and unobserved components account for the dependence of observations and error, while copula incorporates a variety of marginal distributions and dependence structures. The unobserved component model produced the best results (AIC: −123.7, MSE 0.00626), followed by the time series model (AIC: −117.292, MSE: 0.0081), and then the copula model (AIC: −96.62, Kendall’s tau: −0.5503). The unobserved component model performed best due to the completeness of the dataset. In conclusion, all three models are effective tools to validate yearly accretional patterns in fish ear bones despite their differences in constraints and assumptions.

1. Introduction

Sustainable fisheries rely on the use of age-based demographic models that account for reproduction and growth to maximum economic benefit. To achieve this goal, it is required that fish be ascribed correct ages [1]. Ascribing correct ages relies on validating putative annual marks on fish bones [2]. The most commonly used technique to validate marks is marginal increment analysis (MIA) [2, 3]. MIA evaluates the width of newly accreted material laid down on the edge of fish ear bones measured over a year compared to material accreted over the previous year [2] (Figure 1). The ratio is typically averaged by month over all age groups and plotted over a single year. The plot shows a single dip if one band is produced yearly (Figure 2). This dip validates the pattern of a putative annual ring, allowing for accurate ageing of the fish to the nearest year.

For this pattern verification, few instances exist of statistically validated quantitative approaches because of irregular patterns observed due to changing environmental conditions. One such irregular pattern often looks like a sawtooth wave with an abrupt drop in accretion yearly to make an annual band [57]. Such sawtooth wave patterns provide challenges in fitting data to statistical models. Attempts have been made to overcome the difficulty of modeling the sawtooth wave pattern. Okamura et al. [6] used circular statistics and conversion of the month of capture to the median of the month and then another conversion of the median of the month to radians with accretion ratios as the linear vector. The circular method provides a short-term forecast that assumes the data are uniformly distributed around one preferred direction [8]. In contrast, Phelps et al. [7] used analysis of variance to identify significant differences in increment widths between months, but not in the years’ trend. Analysis with a time series model is appropriate for yearly patterned data because time series models identify seasonal patterns and should detect a yearly accretion pattern.

The advantage of time series analysis is that it accounts for the dependence of the observations and errors in the dataset [9]. Time series analysis of marginal increments can allow for the objective determination of the temporal, seasonal, and cyclic nature of growth accretion. Time series models, such as autoregressive integrated moving average (ARIMA), unobserved component models (UCMs), and Gaussian copula, can pinpoint seasonal changes through time.

A time series approach works especially well when data are complete and abundant. Other assumptions of ARIMA include the independence and identical distribution under the Gaussian nature of the error terms in the model statement, as well as the constant variance assumption or stationarity. Unfortunately, the sawtooth wave pattern in MIA data complicates finding a well-fitted time series model.

The UCM can break a time series of data into trend, cyclical, seasonal, autoregressive, and lagged components [10, 11]. Bian et al. [12] gave a summary of UCM applications but did not include analysis in fisheries research. To our knowledge, this paper is the first to use UCM to model MIAs. UCMs use smoothing models that allow for improved analysis and put more weight onto observations that are closer to each other due to the nonignorable correlation of time series measurements [13]. The single-error model is in fact broken up into its component calibration.

The Gaussian copula construction emanates from Joe [14] and other authors [15, 16]. However, just as in the UCM case, its use in fisheries research is still in its infancy [17, 18]. Copula models easily incorporate a variety of marginal distributions and different dependence structures, unlike ARIMA whose assumptions rely on identical joint distributions at all-time values and identical Gaussian marginal distributions [9]. As a first advantage, the copula approach for parameter estimation in multistage models captures dependence from the temporal samples. A second advantage is that the copula circumvents the nontrivial computations in the variances of estimates. Gaussian copulas are very common in economics, epidemiology, time series analysis, and climate change [1923].

We propose the comparison of the ARIMA, UCM, and copula-based statistical time-series methods while incorporating serial dependence. The goal is to evaluate the feasibility and preliminary efficacy of the models on fish ear bone accretion data. We are fortunate in this first test to have an unusually complete dataset without often-missing seasonal components due to conditions that prohibit fishing such as adverse weather or lack of fish availability.

The paper is organized as follows. In Section 2, the data are described with the ARIMA, UCM, and copula models and the statistical analysis methods are presented. In Section 3, the solutions are presented. In Section 4, we end with a discussion of the performance of these models using our complete dataset.

2. Materials and Methods

2.1. Data Collection

The data for this study were extracted from the data included by Barbieri et al. [4] using DataThief III, Version 1.7 (2015) to extract the data points from a graph. According to the authors’ methods, fish were collected each month between June 1988 and June 1991 from commercial fisheries in the Chesapeake Bay and brought to their laboratory. Next, individual biometrics were recorded, and ear bones were excised.

Fish ear bones (sagittal otoliths) were sectioned and then mounted on microscope slides and magnified to a range from 0.8x to 8x and photographed using an Olympus DP71 camera and program Cells Sense (Figure 1). The images were then uploaded to program Image-Pro Plus v. 6.2.0.424 (Media Cybernetics Inc.) for marginal increment measurement, in micrometers. The first marginal increment was measured from the last dark band to the outer edge of the ear bone. The second is measured from the last dark band to the previous dark band. These measurements are then combined as a ratio of the last partial accretion to the last full accretion band (Figure 1). The average ratio was then calculated for each age class by its month of capture. The average was plotted by month and visually inspected for the drop in the sawtooth wave. Fish were categorized by the age at capture, regardless of year of capture. The youngest age caught was 1 year.

2.2. Model Diagnostics and Goodness of Fit Measurements

Consider the data as observations of type , the likelihood function can be written as follows:whereis the vector of p parameters associated with the model equation.

The log-likelihood equation can then be formulated as follows:

Akaike’s information criterion is then calculated by the following equation:where is the number of parameters in the model [24, 25].

The mean squared error is formulated asand Kendall’s tau that measures the difference in the measure of concordance and discordance between the marginal CDFs isfor pairs of observations (X1, Y1) and (X2, Y2), and can also be formulated as follows:as by Sun et al. [16], where C represents the copula function that will be described in Section 2.5, with u and capturing the marginal distribution of X and Y, respectively.

2.3. Time Series Model

The data from the time series process are described as an ARIMA (p, d, q) model and are defined as follows:where p, d, and q are non-negative integers, based on a sample size of , withwhere is the transformed series with the mean retrieved and B is the backward shift operator [26].

The parameter p is associated with the autoregressive portion of the process, while q is associated with the moving average portion of the process. To help estimate the autoregressive parameter (p), the sample autocorrelation function can be used once graphed:where is the sample autocovariance function and is defined as follows:

To help estimate the moving average parameter (q), the sample partial autocorrelation function can be used once graphed:where is the last component ofwhere

The likelihood function is as follows [27]:where is the variance/covariance matrix of the observed time series data.

R package “astsa” is used to loop through possible combinations of p and q ranging from zero to three. [28]. AIC, log likelihood, and MSE were used to decide the best model.

2.4. Unobserved Component Model

Components of the models can be assumed unobserved and must be estimated under a time series model. The unobserved component model (UCM) offers such flexibility. Unobserved components can be modeled using the following equation:where represents the trend component, represents the seasonal component, represents the cycle trend, is the autoregressive term, is a regressive term involving the lagged dependent variables, is the error term assumed to be independent with an identical Gaussian distribution, and , , , and are assumed to be independent of each other [13].

The likelihood function is as follows [27]:where .

Analysis is conducted using SAS “proc ucm.” Six models are tested using various combinations of the level, slope, cycle, and season and described in Table 1. Model 1 uses a stochastic level, slope, cycle, and season; model 2 uses a stochastic level and slope, no cycle, and stochastic season; model 3 uses a stochastic level, fixed slope, and stochastic cycle and season; model 4 uses a stochastic level, no slope, and stochastic cycle and season; model 5 uses a stochastic level, fixed slope, no cycle, and stochastic season; model 6 uses a stochastic level, no slope or cycle, and stochastic season.

2.5. Copula Model

A copula is defined as follows:which satisfies the following requirement:

In addition, Sklar’s theorem states that for the following equation,a unique function of can be found if and are continuous [16, 19].

This paper will use beta marginals described as follows:where and are the shape and scale parameters and is the Gamma function. The likelihood function is as follows [29]:

The equation used in the copula analysis waswhere , , and are the trigonometric functions and are the error terms.

R package “gcmr” was used to fit the copula models [30]. For the sake of interpretability of smoothness, ARIMA (1, 1, 0) was selected. Due to time dependency, trigonometric functions were used in the model equation [3133]. The copula is more parsimonious than UCM or time series. It includes only the time series parameters in a more succinct form based on the marginal distributions of s. The copula equation was estimated and reported along with the log likelihood and AIC. Kendall’s tau was calculated to estimate a correlation between marginal accretion width over consecutive time periods.

3. Results

3.1. Data

Ear bones from a total of 1185 fish were prepared, and increments were measured by Barbieri et al. [4]. The monthly average increment by age is graphed in Figure 2. A clear decrease in accretion width can be observed in May of each year, as well as a decreasing variance over time.

3.2. ARIMA Model

The model with the highest log likelihood (65.6458), lowest AIC (−117.292), and MSE (0.00805) had an autoregressive order of 2 and a moving average order of 3 (Table 2). All the estimated parameters were significant except the third moving average term, and when this third term is removed, the constant term becomes insignificant (Table 3). The QQ plots of these two models show the majority of the data with a normal distribution (Figure 3).

3.3. Unobserved Component Model

The six models tested using various combinations are presented in Table 4. Model 4 provided the highest log likelihood (67.874) and lowest AIC (−123.7), while model 1 produced the lowest MSE (0.00626). The AIC and MSE were smaller than those of the ARIMA model, while the log likelihood is larger than the best of the ARIMA models. (Table 4). All models produced an identical graph with an extremely narrow confidence interval (Figure 4).

The residuals for models 2, 3, and 5 had a distinct sinusoidal pattern, while the others had a slightly more random appearance (Figure 5). The QQ plots of the residuals show most of the data follow a normal distribution (Figure 6). Models 2, 5, and 6 had a sinusoidal pattern in the ACF graphs (Figure 7).

3.4. Copula Model

Figure 8 describes the graph of the copula function. Tangent was added to the equation to properly model the drop of the sawtooth wave. The copula produced a log likelihood, AIC, and Kendall tau of −52.31, −188.9633, and −0.5503, respectively. Kendall’s tau is negative since as time increases, the marginal increment decreases. Such a result has not been captured under ARIMA or UCM. The AIC and log likelihood were smaller than those of ARIMA or UCM. The variance of the copula model was larger than that of either ARIMA or UCM (Figure 9). The conditional residual plot showed dips corresponding to May of each year, and QQ plots showed the majority of the data within a normal distribution (Figure 9). The marginal residual plot showed dips corresponding to May of each year, and the QQ plot showed the majority of the data within a normal distribution (Figure 9). The pattern in the MIA residuals shows that alternative models are better fits.

4. Discussion

The sawtooth wave pattern of MIA data proved challenging to model, but when analyzed with ARIMA, UCM, and copula, these methods provided precise timing of accretional patterns in fish ear bones. The models demonstrated that, for Atlantic croaker, dark bands had formed by May and occurred only once during the year. Thus, providing a model to validate the formation of dark bands was evident after a sharp drop in accretion while also providing statistical metrics of model fit, attributes missing from qualitative measures.

The first step in our approach was to formulate more flexible assumptions about the dependence structure of the process. More precisely, the joint density of the accretion process could directly and conveniently describe the stochastic process. Using the dependence structure of ARIMA, the conditional probability of the sawtooth pattern was well described. The UCM model matched the results from ARIMA as the autocorrelation was captured under Gaussian white noise and is also extendable when one considers the copula types of distributions. In this paper, we applied the copula ideas of Salinas-Gutierrez et al. [34] and Alqawba and Diawara [35], with beta marginals.

ARIMA and UCM produced the best results. ARIMA is well established and widely used in analysis of complete datasets. The UCMs are valuable extensions of the time series with variance decreasing over time. The confidence intervals in both contain most of the recorded data and matched the seasonal pattern well. In both models, parsimony is obtained mainly because of complete data. The cycle was automatically obtained, giving us close to perfect time-varying predictions.

Although the copula did not perform well, the results were still acceptable for its use in MIA. The variance estimate is higher in this case than in ARIMA and UCM. The challenge may be due to the choice of the marginal distribution not being as good fit to these data. The way ARIMA and UCM components were captured, as regularity of cycle and seasonality, provides less of an emphasis on sampling. The strength of copulas is in capturing a flexible correlation structure when additional variables, such as temperature or length, are measurable. In conclusion, all three models are effective tools to validate yearly accretional patterns in fish ear bone despite their differences in constraints and assumptions.

Overall, we validated the annual pattern of the MIA data with all three models (ARIMA, UCM, and copula). In this case, we had a full dataset and have shown we can use these three methods with quantitative results that validate the qualitative visual results shown by Barbieri et al. [4] and Foster [3]. We anticipate that copulas will outperform ARIMA and UCMs when challenged with incomplete data where imputation is necessary, when some variable transformations are not recommended, when missingness cannot be avoided, and the effects of covariates are not removable. In the future, we will test the performance of copulas when challenged with both the sawtooth wave pattern and incomplete datasets (Kirch et al., in process). We hope to increase the use of copula in this field and generalize this method of estimation for higher dimension problems. In further research, the exploration of incomplete datasets and other copulas should bring in interesting results.

Data Availability

The data used to support the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.