Abstract
This paper aims at developing a macroscopic cell-based lane change prediction model in a complex urban environment and integrating it into cell transmission model (CTM) to improve the accuracy of macroscopic traffic state estimation. To achieve these objectives, first, based on the observed traffic data, the binary logistic lane change model is developed to formulate the lane change occurrence. Second, the binary logistic lane change is integrated into CTM by refining CTM formulations on how the vehicles in the cell are moving from one cell to another in a longitudinal manner and how cell occupancy is updated after lane change occurrences. The performance of the proposed model is evaluated by comparing the simulated cell occupancy of the proposed model with cell occupancy of US-101 next generation simulation (NGSIM) data. The results indicated no significant difference between the mean of the cell occupancies of the proposed model and the mean of cell occupancies of actual data with a root-mean-square-error (RMSE) of 0.04. Similar results are found when the proposed model was further tested with I80 highway data. It is suggested that the mean of cell occupancies of I80 highway data was not different from the mean of cell occupancies of the proposed model with 0.074 RMSE (0.3 on average).
1. Introduction
The autonomous vehicle (AV) is able to detect its surroundings, identify obstacle, and signages that, in turn, enable it to develop its navigation paths and move along the road network without human control. Current AV research projects have focused on developing AV technologies to improve vehicle safety, particularly when performing its fundamental tasks, including car following, lane-keeping, and lane-changing [1–6]. Due to its complex process, which involves interactions of host vehicles and its following and leading vehicles, little attention has been given to the AV lane change prediction model. The existing lane-change algorithm (e.g., trajectory planning and maneuver planning algorithm) focused on maximizing individual vehicles’ benefit [1–6]. In addition to that, these algorithms required detailed microscopic traffic variables (i.e., relative speed and positions) of the surrounding subject vehicles, gaps between the host and following and leading vehicles, which often depends not only on the behavior and movement of the surrounding vehicles but also on the macroscopic traffic dynamics.
To assist AV control systems in predicting lane change occurrence under various traffic conditions, a macroscopic cell-based lane change prediction model that considers lateral vehicle movement from one lane to the other is introduced [7]. This cell-based macroscopic model discretizes the road section into a uniform cell form in which macroscopic traffic variables (e.g., traffic density and average vehicle speed) can be easily generated. These macroscopic traffic variables have been considered as traffic dynamic indicators that can be further used to predict lane change occurrence. By understanding the behaviour of the surrounding traffic characteristics, the aggregate behaviour of lane change vehicles in any traffic stream is much easier to be observed than the disaggregate behaviour of individual vehicles. This way is also carried out due to the lack of studies that include the dynamics of lane change to compute multilane traffic flow under macroscopic conditions. To compute the possible solution of multilane traffic that fulfils the conservation law of traffic, a suitable macroscopic traffic flow model that can describe drivers’ lane change behaviour in multilane highways is therefore needed.
Therefore, this study proposes a macroscopic cell-based lane change prediction model in a complex urban environment and integrates it into the cell transmission model (CTM). It has been done by refining the CTM flow and occupancy formulations to facilitate flow formulation when vehicles in the cell transfer from one cell to another in a longitudinal manner and update cell occupancy after a lane change.
The remainder of this paper is organized as follows. Section 2 introduces the CTM and development of the lane change prediction model, followed by basic terminology of the cell transmission model. This is then followed by the proposed methodology, leading to the development of the proposed macroscopic traffic simulation tool. Particularly, this section will guide us on how to develop the binary logistic lane change model and formulate the lane change occurrence. It will be followed by model development to integrate it into the cell transmission model by refining the CTM flow and occupancy formulations. Section 4 provides the steps undertaken in aggregating the real data in the preparation for the validation with the simulated CTM. Numerical results comparing the CTM and the field data are given in Section 5. Last, the discussion and conclusion of the paper are provided in Section 6.
2. Cell Transmission Model and CTM Based Lane Change Model
CTM is one the most widely used cell-based macroscopic traffic flow models developed by [8, 9]. It provides a convergent approximation to a simplified version of the Lighthill, Witham, and Richard (LWR) hydrodynamic model [10]. This model has been recognized as the most straightforward means, where only a few parameters are needed to explain the evolution of traffic features and dynamics [8]. It is known as the numerical estimation model of the LWR model. While fewer parameters are needed compared to the microscopic models, some limitations still exist. In the original CTM model, traffic flow has been modelled uniformly in a single stream over multilane highway traffic, assuming uniform and lateral distribution [8, 9]. The assumption as such may inaccurately represent the feature of the heterogeneous traffic flow distribution (i.e., lane-specific flow, density, speed, and vehicle types) observed in real-world traffic, which varies over time and space [11, 12]. Hence, the complex traffic feature of multilane roadways, specifically the lane change behavior, may not be well captured when modeling them as a single-lane traffic stream [13, 14]. It is also known that frequent LC events result in traffic flow instabilities as a result of the drop in capacity as well as the formation and propagation of stop-and-go waves [15–20].
To fill the gaps, [21], [22], and [23] introduced the multiple lanes LWR model in which single links consist of two or more lanes in parallel. This extension enables the inflows from the neighboring lanes and outflows to neighboring lanes to be included in the conservation equation at each point that allows lane-changing occurrences. A multicommodity model was developed by [24] based on the theory of the LWR [10, 25] model. In this model, a new concept of lane change intensity was introduced, in addition to a new definition of the fundamental diagram and the introduction of an entropy condition. Other methods use the gas-kinetic traffic flow model [26], as well as a continuum and noncontinuum models [27]. The subject of modeling the lane flow distribution on specific sites of the traffic network has also been addressed in certain works [28]. During the free flow and congested regimes, [28] established a link between total density and lane densities. Following that, the number of lane changes is evaluated as a function of numerous incentives, with the findings indicating that the densities in the origin and target lanes are the most significant [29]. Duret et al. [30] examined real data from a three-lane highway in free-flow settings and came up with a simple linear model that took the lane distribution of traffic flow into account. Other previous research that looked at lane change in macroscopic models can be found in [24, 26, 27, 31, 32], and others that focused on specific locations of the road can be found in [28, 30, 33].
A few studies have considered the lane change features in the development of the CTM model. Wang et al. [34], for instance, assigned a fixed percentage of left-turn flow (i.e., 30%) when formulating the diverge movement to simulate oversaturated arterials. The improved form of the CTM model, which introduced a novel conditional cell at the intersection, has proven the reliability with the actual traffic flow conditions found in Highway Capacity Manual (HCM). Carey et al. [31] developed analytic network models that include lane-changing by extending the single-lane discretized CTM to two or more lanes and allowing traffic movement between lanes. This extension focused on mandatory lane changing by assuming that lane-changing motivation was influenced by driver’s relative position to the entry and exit links and the corresponding levels of urgency. Carey et al. [31] introduced the fixed value of (i.e., the number of vehicles that wish to change lane at cell , time step ) as a variable to determine the cell occupancies in the following time step. While the theoretical development of the CTM has been continuously improved over years, majority of them only focused on improving the accuracy and computational effort of the previous method, but the accuracy and stability of the method have not been validated with actual field data [35].
Pan et al. [36] developed a mesoscopic multilane CTM considering the simultaneous simulation of both discretionary and mandatory lane change behaviours. The model realistically simulates the dynamics of heterogeneous lane flow distribution based on a lane-specific fundamental diagram. Different priority levels are recognized based on varying lane change motivations and respective levels of urgency. A recursive lane change demand estimating technique was devised, which takes into account the role of urgency on the longitudinal probability distribution function of lane change movements. Most present models assumed that drivers are unaware of downstream traffic conditions before deciding to make a lane shift, which is impractical. However, [36] proposed a model that posits that drivers can make lane change decisions, with its effect decreasing exponentially as the distance from their current location increases. Meanwhile, their empirical analysis included some major lane change implications (e.g., capacity loss and flow balancing impact of discretionary lane shift).
Limitations still exist when incorporating lane change into CTM, particularly when fixed values were assigned to estimate the influence of lane change on inflow and outflow traffic. This assumption might not empirically represent actual lane change events as the surrounding traffic environment (i.e., speed and density between lanes) [13, 37–39] or some unknown factors (i.e., driving attitude) might affect the occurrence of a lane change.
3. Development of Cell Transmission Model Based Logistic Lane Change Model
Before introducing the extended CTM with lane changing, a basic principle and basic formulation of single-lane CTM are presented in Section 2.
3.1. Basic Principles of Cell Transmission Model
The original CTM [8, 9] is derived based on the improved version of the LWR model, which describes a simplified form of relationship based on the macroscopic fundamental diagram.
This relationship is represented as follows:where f is the traffic flow, is the free-flow speed (free-flow represents the speed of vehicles when density is zero), is the backward propagation speed of disturbances when traffic is congested, is the capacity flow into the next section, and is the jam density of traffic.
CTM discretized road section into homogenous cells with equal-length cells and time intervals. The cell length, , is the distance that a typical vehicle traveled at free-flow speed in a single time interval . Under free-flow condition and no congestion, all the vehicles in a cell are expected to advance to the next with each time interval. This can be represented by the cell evolution given as follows:where is the number of vehicles in cell at time .
The boundary conditions of the cells are provided to control the flow of vehicles entering and exiting the first and last cell of all lanes, respectively. These are, respectively, given as the entrance and exit gates. The entrance gates could prevent excessive vehicles from entering and overflowing the first cell, while the exit gates could control the accumulated queue in the last cell. A detail of the cell representation and their respective notation can be observed in Figure 1.

Each cell denoted by the position, (where refers to a specific lane) in every time steps, has its properties characterized as follows:(1), the number of vehicles in each cell (also known as the occupancy). Based on the directional movement of vehicles in each cell, is updated in every time step, after the flows (in and out) of each cell are known.(2) is the maximum number of vehicles that can be stored in the cell. It is expressed as , where is empirically identified in the actual data that have been processed in the form of discretized cells.(3) is the maximum flow into the downstream cell at time . It is expressed as , where is the saturation flow rate (veh/h). Considering the change in the saturation flow rate observed empirically from the datasets, a random value of fixed at a specific range based on the quartiles is applied differently for each cell. For now, it is assumed that the behavior of is the same for the rest of the time steps.(4) is the density found in cell at time , and it is expressed as (veh/km).(5) is the backward propagation speed ratio over the free-flow speed found in each cell and is determined as .
3.2. The Improved Cell Transmission Model with Lane Change
Figure 2 illustrates trajectories of vehicles in cells in Lane 1 and 2 in the time windows (T1, T2, and T3). The lateral movement between lanes can be illustrated as follows: considering L1 and L2 as the origin and target lane, respectively, the mean lane change occurred from a cell in lane 1 (L1) to a cell in lane 2 (L2). As shown in Figure 2, the end of a curve in a cell denotes a lane change to a different lane, whereas the beginning of a new curve denotes the arrival of a vehicle from a different lane. Figure 2 demonstrates that no lane change happens in windows C1T1 and C1T2, and a lane change happens in C1T3. A vertical red line is represented as a reference line at the instant when lane-changing starts to occur.

Density at the instant when the vehicle just makes a lane changing is identified from the number of intersecting points crossing the reference line. In Figure 2, the density for C1T3 in lane 1 is 5 vehicles, including the LC vehicle, while for lane 2, the density is 2 vehicles. Due to the lane change occurrence, the density of C1T3 in lane 1 reduces to 4 vehicles, while for lane 2, it is increased to 3 vehicles. Figure 1 also illustrates the macroscopic schematic diagram of lane change between cells in the multilane road, which also explains why the traditional CTM model cannot incorporate the addition or reduction of inflow traffic due to lane change occurrence. This lateral traffic between cells has not been well represented in the traditional CTM model, and its affects have not been well explored and studied previously. Therefore, the extended CTM model improves three main formulations in traditional CTM models to accommodate the change in inflow and cell occupancy.
To incorporate the vehicle’s lateral movement or for changing the lane from one lane to another in the existing CTM model, the improved cell transmission formulation is introduced. The formulation consists of three main components divided as follows: (i) the flow model, (ii) the lane change model, and (iii) the occupancy model. The flow model shows the formulation of how the vehicles in the cell are transferred from one cell to another in a longitudinal manner, the lane change model provided the formulation of vehicle lane change event, and the occupancy model describes how cell occupancy is updated considering the lane change component.
3.2.1. The Flow Model
Three components limit the flow of vehicles from one cell to another, : vehicles at the upstream cell waiting to enter , the capacity of the successor cell , and the space available in the successor cell when a queue forms. This is in accordance with the original concept defined by [8, 9], which is described mathematically as follows:
Here, a slight variation is applied to the formulation for the inflows and outflows of the first and the last cell, respectively. For the inflow at the first cell, the flow is identified based on the minimum between the actual demand, , entering the first cell at time step , and the space available in the successor cell . This is given as
For the outflow leaving the last cell, , the formulation is represented by the minimum between the number of vehicles in the upstream cell, , and the actual outflow, , leaving the last cell at time step since the empty space in the successor cell is unknown. This is given as
3.2.2. The Lane Change Model
The lane change model was developed macroscopically based on the US-101 NGSIM microscopic trajectory data, which are generalized with refined binary logistic regression [7, 33]. From these data, the main contributing factors of traffic characteristics were aggregated over the varying cell size configurations to identify the parameters for average traffic density, flow, and speed values. A general equation of the basic logistic regression model is given as follows:where is the probability of an event occurring; , , …, and refer to the explanatory variables; , , , …, are model parameters (or coefficients) that may be determined using the so-called maximum likelihood estimation approach [40]; is the number of variables to be considered; and is a logit transformation of by the natural log of the odds (defined by the ratio of occurrence probability over the nonoccurrence probability).
The so-called maximum likelihood estimation approach can be used to calculate the beta values (i.e., the model’s parameters) for fitting the logistic regression model. In this study, the value of is defined as the estimated probability of lane change, P (LC), estimated from the actual LC status (0 (“LC”) or 1 (“NLC”)) with their corresponding independent variables observed from the field data. In this case, the independent variables, , , …, and , used are density difference () and speed difference (), and lane change direction , which were the main contributing factors of traffic characteristics at the macroscopic level.
and are defined as the differences between the origin and the target cell. Both are, respectively, computed using equations (8) and (9). Aggregated cell density at the instant when the vehicle makes lane change is identified from the number of intersecting points crossing the reference line. Figure 2 shows the density of in lane 1 = 5 vehicles, including the lane change vehicle, while for lane 2 = 2 vehicles. The speed differences of the vehicles from the origin to the target cell were also obtained at the instant when lane change occurred. From Figure 2, the reference line separates the path of the vehicle from the origin to the target cell. The instantaneous speed of the vehicles before and after lane change can be obtained manually from the gradient of the trajectory, which is plotted for displacement, against time, . The slope at the left section of the reference line in the origin lane is referred to as the instantaneous speed of the vehicle before making a lane change, whereas, the slope at the right section of the reference line in the target lane is referred to as the instantaneous speed of the vehicle after lane change. A positive density difference (DP) meant that the origin lane had a higher density over the target lane, while a positive speed difference (SP) meant that the origin lane had a higher speed over the target lane,
A fixed-size CTM-based model was developed and validated with observed data considering the speed, density differences, and lane change direction. Figure 3 summarizes the steps to identify the coefficients of the parameters for the base model using actual datasets.

Given the probability of lane change, formulated by equation (7) obtained from the observed data, the occurrence of lane change is determined from the receiver operating curve (ROC) as a classification measurement [41]. The deciding factor that determines the flow of vehicles from one lane to another, , is decided based on the comparison of the predicted probability with the cutoff value , obtained from the binary logistic regression. Optimum , based on the ability of a model to discriminate the correct prediction between the presence and absence of LC, was used as the deciding factor [41]. Refer to Ng et al. [7] for details about the identification of optimum . The following condition thus provide the decision as per discussed:
For checking,
Note that the independent variables, speed difference () and density difference (), are each defined by the difference in the origin over the destination lane. Thus, the formulation of P (LC) in the cell transmission model differs between LC for the left to right and right to left lanes.
Inflows from the on-ramp refer to the rate of lane-change defined by the duration of time per lane change event (sec/LC) taken estimated from the US-101 data. Given that the inflow from the on-ramp was at an average of 140 veh per 15 minutes, a rate of 1 veh per 6 seconds time step is, therefore, assumed in this study.
3.2.3. The Occupancy Model
The occupancy for the number of vehicles in the cell at the following time step, , is updated based on the conservation of traffic flow of cell [8, 9] in which the number of vehicles in cell in the next time step, , is equal to the number of vehicles in cell at the current time step, plus the number of vehicles that entered, and minus the number of vehicles that left,where indicates the cell ; and represent the downstream and upstream of cell , respectively; is the inflow of vehicles entering cell during time , while is the outflow of vehicles leaving cell into cell at time . However, the equation above considers only the longitudinal movement of vehicle flows and ignores the lateral movement between lanes. By considering the lateral movement, the equation is updated as follows:
Figure 4 shows an example of the flow surrounding a single cell. The occupancy of the cells is updated as follows:

Note that some movements may not be available depending on the road layout, particularly at the leftmost and rightmost lanes. However, the theory behind adding inflows and minus outflows still holds.
Generally, the first step in developing the cell transmission model starts by filling the initial state’s occupancy at the beginning of the time step, . To implement this method in a real-life scenario, the occupancy of the initial state can be obtained from the actual field data, by counting the vehicles in each discretized cell during the period . Meanwhile, the flow of vehicles between cells is then computed for . For the next time step, , the occupancy of cells is then updated based on the discussion made in the previous section. For the remaining time steps, the formulation of flow and occupancy is then duplicated as many times as desired to compare with the actual traffic flow conditions.
4. Preparation of Field Data for Validation with CTM
This section demonstrates how the raw microscopic data from the field are postprocessed to extract the relevant information needed to be used as the input for the CTM model.
4.1. US-101 NGSIM Data
In this study, a dataset containing a series of individual microscopic trajectories from the well-known and widely accessible US-101 NGSIM database was used to extract the information needed as an input used in developing the cell transmission model. In this study, the dataset collected at the US Highway 101 (Hollywood Freeway) consists of six lanes in a single direction. An on-off ramp connected by a short auxiliary lane will be used to validate the simulated CTM. The lane numbering of the study area can be observed in Figure 5 below.

(a)

(b)
For validation, the dataset is resorted and discretized into a cell form defined by space over the time length. The cell configuration of the cell size and the start and endpoint of the cells were assured matched between the processed US-101 data with the simulated CTM. Here, the cell size configuration of the study area is fixed at t = 6 sec, L = 110 m (eq. 361 ft) [42, 43]. In the original field dataset, Local_X_f (t) defines the position of the lane starting from lane 1 (in ft), while Local_Y_f (t) defines (in ft) the position of cell starting approximately 150 meters before Ventura Blvd. To avoid the noise in the initial and end-stage of processing, the first cell was set at the Local_Y_f (t) coordinates from 60 ft and ended at 2,028 ft. The lane position and cell position given by the specified cell size is redefined as below.
4.2. Extraction of Input Data
To simulate the CTM model, some information from US-101 is used as the input data. These are: (i) number of vehicles in the initial state condition, (ii) demand entering the first cell, (iii) inflow at on-ramp, and (v) outflow leaving the last cell. The raw data consist of vehicle IDs available for every time frame of the study time. With each time step, the number of vehicles entering and leaving each cell can be counted by tracing the vehicle ID in all the time steps. An algorithm using MATLAB has been developed to process the vehicle ID. This algorithm can recognize the difference in vehicle ID found in each cell and at every time step, which can further assist in removing the duplication of vehicle ID between time steps to give information on the demand entering and outflows leaving each cell.
Since the highway was not empty at the initial time of the study period, it was necessary to estimate the initial occupancy of each link by using the same method as above. Otherwise, there would be fewer vehicles in the simulation than under actual traffic conditions. Initial occupancy estimates were obtained by measuring the number of vehicles on the highway at the start time of the studied period. An equivalent unit of the passenger car unit where 1HV 3.5 passenger cars, 1 motorcycle 0.5 passenger cars, is used in simulating the traffic flow [44].
5. Numerical Results
To test the performance of the proposed model against the observed data, this section presented the integration of the results of the lane change model-based logistic regression into the CTM model for various initial demand and initial occupancy scenarios. The first scenario to test was that a scenario of 5 vehicles/time step and 0 vehicles/cell as initial demand and occupancy, respectively, for a cell size fixed at t = 6 sec, L = 110 m. The selection of this cell size was first proposed following the logistic regression lane change model by [37], for which the author then suggested to use longer time steps to produce stronger probability. With increasing cell size, [33] validated the presence of multiple lane changes in the same dataset for a cell size greater than the time horizon, t = 6 sec and cell length, L = 110 m, which results in the reliability of the model accuracy.
Figure 6 shows the changes in the total cell occupancies against increasing time steps for different lanes. These occupancies were taken based on the occupants of all cells within the same lane compiled for a period of 348 time steps. In Figure 6, it is observed that the model was able to capture the traffic behavior in the process of the formation, propagation, and dissipation of queues within lanes and cells. The increasing trend represents the increased number of vehicles in the traffic lanes where the queues are formed, while the declining trend shows the dissipation of these queues. The steep slope in the first 10–20 seconds represents the initialization period or the warm-up period of the simulation model for the traffic to reach equilibrium.

5.1. Validation of CTM with US-101
Next, a comparison based on RMSE and analysis of variance (ANOVA) test was made to compare the cases of the CTM with or without the LC models in terms of prediction accuracy according to the actual dataset. Specifically, comparisons based on cell occupancies were made between a few cases of experiments. Case 1: CTM without LC model Case 2: CTM with fixed percentage LC Case 3: CTM with the percentage of LC fixed at 60% (RL) and 35%(LR) Case 4: CTM with logistic regression
The network topology including lane and cell position used for the above cases is shown in Appendix A1. Boundary cells are applied on each end side, respectively, before and after the first and the last cell of the simulation model.
To avoid a bias from training data when comparing both CTMs, the same dataset fixed by the same demand at the same start and ending times were used. In this study, the discretized form of the real-world data from US-101 in the cell form of the same size is adopted as the ground truth for evaluation. The simulated occupancies for different cases of CTM (Case 1–4) are compared with the observed occupancies from discretized US-101 data to compute the root-mean-square errors (RMSE) as shown in the following equation:where are the simulated occupancies of cells from the CTM model and are the observed occupancies of the cells. is the total number of observations.
It should be noted that the integrated binary logistic lane change model has been validated with the NGSIM data itself in the macroscopic cell-based form [7]. The accuracy of the lane change model was accessed through the area under the curve (AUC) of the receiver operating curve as the performance measures to which its findings show that the lane change model fits the test data reasonably well with an AUC of 0.76. Case 1: CTM without LC model
In Case 1, only the flow model, which allows vehicles to flow from one cell to another is considered to be given the same input demand, initial state, and outflow for every time step. In this case, all vehicles were assumed to travel within the same lane, and no vehicles were expected to change lanes.
Appendix A2 is provided with the results of Case 1 on the trend of the cell occupancy for all cells and compares the CTM and US-101 data in the 50th, 100th, and 200th steps, respectively. The average occupancies over the 348-time step are also provided, whereby the x-axis represents the different cell positions that are compared between the CTM (based on Case 1) and the US-101 data.
To observe further if there is a difference in the occupancies estimated between the CTM and the US-101, an analysis of variance (ANOVA) was conducted. The ANOVA tests whether or not the mean occupancies are the same aswhere is the mean probability value at any approach . Suppose a type I error is controlled at = 0.05, then F (0.95, 1, 65) = 3.99 with 1 and 65 as the degrees of freedom associated with the factor level and the error term of the given data. The decision rule is, thus, Case 2 and Case 3: CTM with fixed LC percentage
Case 2 and 3 assigned a fixed percentage of LC to the CTM model. In Case 2, a fixed probability of 30% is allocated for the lateral movement between lanes from both left to right and right to left lanes [25]. This percentage has been given in the past studies to develop the CTM model. The decision to change the lane (either 0 or 1) is predicted based on a random probability assigned to the specific cells.
In Case 3, 60% of the lane change is assigned for vehicles making lane change from the right to the left lanes, while 35% is assigned for LC in the opposite directions. As defined by the right-hand traffic rule, the leftmost lane has the lane with the highest speed, which reduces gradually towards the right. In this case, it is assumed that the study area is located within the ramp, and not studied far off before these areas, a higher proportion of the discretionary lane change (i.e., for speed comfort) is expected for LC from the left to right lanes. The comparison of occupancies between CTM and US-101 for Case 2 and Case 3 is shown in Appendix A3 and A4, respectively.
5.1.1. CTM with Logistic Regression
In Case 4, the same formulation of traffic flows is applied to the CTM model. However, instead of assigning a fixed percentage of the lane change as before, a lane change model that has been generalized with real data is added into the flow model to allow lane change to be predicted based on the difference of speed and density between lanes. Figures 7–9 show the comparison of the cell occupancies in Case 4.

(a)

(b)

(c)


(a)

(b)

(c)
Table 1 summarizes the estimated average occupancy, the RMSE, and the results of ANOVA tests for all cases. It is observed that the average occupancy per cell for all four cases was underestimated from the actual US-101 data, which was estimated at an average of 6.93 veh/cell. When comparing all cases with the US-101, Case 4 exhibited the closest estimation with the actual data by approximately 10.9%. This is followed by Case 1 (14.0%), Case 3 (15.7%), and Case 2 (16.3%). Such a difference is further validated by its small error found in the RMSE. In general, the overall RMSE between the CTM and US-101 was found at around 0.04 (0.2 on average). This indicates a good prediction of these models in accurately predicting the cell occupancies.
From the results conducted using the test of ANOVA, it is observed that the value in all cases is greater than 0.05, and the Fcrit = 3.99 were found greater than all the Fs estimated for all cases. This shows that the null hypothesis that all means are equal cannot be rejected. Thus, the estimated occupancies estimated from the CTM are consistent with the hypothesis that the population means are equal between groups. In other words, the estimated occupancies do not differ much between them.
To validate the lane change behavior estimated from logistic regression in the CTM model, the total difference in LC is obtained by summing the number of LC in each direction of each lane for the whole simulation run. The estimated number of LC predicted from the US-101 data is then compared with the predicted total number of LC found in the CTM simulation run. In the CTM simulated results, an average difference of 73% is predicted with the left LC higher over the right LC. A similar observation was found in the number of LC estimated from US-101 with a difference of 72% between the two LC directions. In the outcomes of both the proposed CTM model and the field data, the estimated number of LC vehicles for the right lane to the left lane is higher than the other directions.
5.2. Comparison with I80 Highway
In addition to comparing the simulated occupancies estimated between the CTM with the US-101 data, the CTM-based logistic regression is also applied on a different highway. Here, the I80 highway also found in NGSIM data is used. This highway has a similar layout and approximately the same length as that of the US-101 highway, minus the off-ramp plus and the auxiliary lanes, plus an additional main lane. Figures 10(a) and 10(b) show the layout and the CTM topology of this highway, given the same cell size (set at t = 6, L = 110) as in the previous cases is used.

(a)

(b)
The computation of the CTM-based logistic regression is carried out in the same way as before, excluding the input, initial state, and output traffic flow, which is based on the actual conditions of the I80 highway. Table 2 provides the parameters of each variable, and its significances are generalized based on the US-101 dataset, and are applied in this highway to see whether the model can accurately predict the same outcome with the actual I80 highway. Under the variable “direction of lane change”, the lane change between the left lane change and right lane change was taken into consideration to reduce the biases of the lane change prediction between them. The direction of lane change was introduced as a categorical variable, where a value of 0 or 1 refers to the left lane change and right lane change, respectively. With the direction of lane change as a categorical variable, the outcomes of left and right merge can be identified from the probabilities estimated in the model.
Figures 11–13 show a comparison of the estimated occupancies between the simulated CTM and the actual I80 data. The average occupancy simulated from CTM was underestimated from the occupancy of the actual data by approximately 5.2%. The average occupancy estimated from the simulated CTM was found at 7.38 veh/cell, while the actual I80 has an average of 7.76 veh/cell. Comparing both results, the overall RMSE was estimated at 0.074 (0.3 on average), and still reasonably small in error. Furthermore, as shown in Table 3, the ANOVA test based on F statistics 0.97 observed a relationship that there are no significant differences between the occupancies estimated from the simulated CTM and the occupancies observed from the actual I80 data.

(a)

(b)

(c)


(a)

(b)

(c)
For evaluation and validation, a split-sample method was used in which the data were randomly divided into two sections: (i) training data to develop the model and (ii) test data to measure its performance. Since the sample size was large enough, 80% of the samples were used to train the model, while 20% of the samples were split into an independent section for test evaluation of the logistic regression model. The splitting was done once and randomly to ensure that the characteristics were similar to the training dataset. The degree to which the predicted lane change probabilities agree with the actual outcomes gives an overall true correct prediction at 66.41%. Even though the lane change model-based logistic regression has been trained in a different layout, the tested model still managed to estimate the occupancies as close as the actual data of another location. This implies that in order to estimate the traffic state well in various locations, the use of the traffic variables (i.e., speed and density differences) used to identify the prediction of lane change is viable.




6. Conclusion
A good traffic simulation model should be robust and reliable if variations in traffic behavior are adequately taken into account. This paper concerns whether the existing cell transmission model, with the addition of lane change model-based logistic regression, can be further improved to realistically replicate the simulated model with the actual traffic conditions. The use of logistic regression to model the lane change model has been done in previous studies, but the integration of this model into traffic simulation models has not been done before.
Given the potential of a generalized lane change model that can predict lane change events based on the difference in speed and density between lanes, a new methodology that provide the method to integrate the logistic lane change model into the cell transmission model has been proposed. The logistic lane change model has been modified to the extent that it can produce reasonable estimations of the lane change probability following any change in cell sizes. The modified lane change model, which is generalized, simple, and easy to construct, makes them even easier to be integrated into the formulation of the CTM model.
To demonstrate the usefulness of the proposed methodology, the reliability of the improved CTM model is tested by comparing the outcome generated from the model with the real data. The findings from this study suggested that the improved model can potentially simulate the traffic state as close as the real one. Particularly, the simulated occupancies based on the CTM have reasonably followed the patterns of actual observation with an overall RMSE were found at around 0.03–0.04 (0.2 on average) between the CTM and US-101. This indicates a good prediction of the model in accurately predicting the cell occupancies, as well as proven its reliability on another set of locations with an overall RMSE estimated at 0.074 (0.3 on average). However, results have shown that the simulated CTM, in general, underestimated their occupancies. This could be due to the presence of human error involved in the trajectory dataset when it is discretized into a cell-based form. The discretization of the trajectory data involved traces of vehicle IDs to identify the lane-by-lane input demands and outflows at every simulation time interval. The involvement of large datasets further complicates the identification of vehicle IDs due to the mismatch of missing vehicles or random data errors in the video processing of the trajectory. A similar work by [45], who has also developed CTM for arterial traffic modeling using NGSIM Lankershim data, also mentioned the lack of necessary traffic data. From this simulation, it was realized that some of the inputs to a simple macroscopic model were still inaccessible or directly measurable. Though detailed trajectory NGSIM data have been used to get most of the model inputs, the time period of these data is still short, and the network coverage is small, which does not allow a more comprehensive simulation study that includes more traffic phenomena over a large scope and time period to be carried out.
Moreover, traffic data measured from point detectors (like loop detectors) may not be available in most of the freeways, which causes model inputs such as initial occupancy not available. In this case, the boundary flow of the highway and maximum exiting flows cannot be measured as well. In order to construct a freeway traffic model, detectors are, therefore, needed to provide necessary and accurate measurements.
The model accuracy can be improved if the collection of the demand inflow and outflows is less strenuous to obtain. Thus, fitting the inflows and outflows with a well-fitted distribution is a way to go to the next chapter. This would have involved developing a different set of models that will be overloaded to discuss within this chapter. Furthermore, other possible outcomes such as lane change rates, delays, queues, and travel time may also be useful to perform the comparison between the simulated and real data. This will be further discussed in the future research.
Appendix
A1. Lane and Cell Position for US-101 Highway
Lane and cell position for US-101 highway is shown in Table 4 and is explained in Figure 14.
A2. Validation of CTM with US-101 for Case 1
Comparison of the occupancies between CTM and US-101 at the: (a) 50th, (b) 100th, and (c) 200th time step of all cells.
Comparison of the average occupancies/total time duration in each cell found between CTM and US-101.
Comparison of the occupancies between CTM and US-101 in the specific cell: (a) C2L3, (b) C3L3, and (c) C4L3 (Figure 15).
A3. Validation of CTM with US-101 for Case 2
Comparison of the occupancies between CTM and US-101 at the: (a) 50th, (b) 100th, and (c) 200th time step.
Comparison of the average occupancies/total time duration in each cell found between CTM and US-101.
Comparison of the occupancies between CTM and US-101 in the specific cell: (a) C2L3, (b) C3L3, and (c) C4L3 (Figure 16).
A4. Validation of CTM with US-101 for Case 3
Comparison of the occupancies between CTM and US-101 at the: (a) 50th, (b) 100th, and (c) 200th time step.
Comparison of the average occupancies/total time duration in each cell found between CTM and US-101.
Comparison of the occupancies between CTM and US-101 in the specific cell: (a) C2L3, (b) C3L3, and (c) C4L3 (Figure 17).
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Authors’ Contributions
CN, SS, and KM conceptualized the study. CN and SS developed the methodology. CN and SS performed formal analysis. CN and SS performed validation. CN, SS, and KM performed investigation. CN wrotethe original draft. CN, SS, and KM reviewed and edited the article. CN performed visualization. SS, KM, and IC supervised the study. SS performed project administration; . SS, KM, and IC were responsible for funding acquisition.
Acknowledgments
This work was supported through NGSIM data provided by the Federal Highway Administration (FHWA) of the US Department of Transportation. The authors wish to thank Mr. Chua Boon Liang for his technical contribution and Mr. Joseph Ng for reviewing this paper. This research was funded by the Ministry of Education Malaysia (MOHE) under the Fundamental Research Grant Scheme (FRGS) [Project code FRGS/1/2019/TK01/MUSM/03/1] and was supported in part by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aids for Scientific Research (C) 20K04531.