Abstract
In this study, we developed the solution of roadside LiDAR object detection using a combination of two unsupervised learning algorithms. The 3D point clouds are firstly converted into spherical coordinates and filled into the elevation-azimuth matrix using a hash function. After that, the raw LiDAR data were rearranged into the new data structure to store the information of range, azimuth, and intensity. Then, the dynamic mode decomposition method is applied to decompose the LiDAR data into low-rank backgrounds and sparse foregrounds based on intensity channel pattern recognition. The coarse-fine triangle algorithm (CFTA) automatically finds the dividing value to separate the moving targets from static background according to range information. After intensity and range background subtraction, the foreground moving objects will be detected using a density-based detector and encoded into the state-space model for tracking. The output of the proposed solution includes vehicle trajectories that can enable many mobility and safety applications. The method was validated at both path and point levels and outperformed the state of the art. In contrast to the previous methods that process directly on the scattered and discrete point clouds, the dynamic classification method can establish the less sophisticated linear relationship of the 3D measurement data, which captures the spatial-temporal structure that we often desire.
1. Introduction
Light detection and ranging (LiDAR) is a high-precision sensor that uses a laser transmitter and receiver to detect the distance of the surrounding object and provides 3D information of the environment. The LiDAR sensor can meet the requirements for most scenarios, particularly suitable for moving object detection and localization. The LiDAR sensor has recently gained escalating traction for smart city and connected infrastructure applications such as intelligent intersections to ensure pedestrian and bicycle safety, parking and construction management, and drone-based traffic monitoring. LiDAR technology is beneficial for object motion detection, especially under some low-light conditions, as it can see the surrounding environment both day and night. LiDAR sensors can provide high-resolution results, while radar does not have enough resolution. Although the accuracy of the LiDAR sensor could be impacted by specific scenarios due to phantom reflections, for instance, the fog weather, the LiDAR sensor is very reliable for data collection in every lighting for the multimodal traffic monitoring system. LiDAR sensors generate data for scene depth understanding, whereas the camera-based system cannot generate precise depth estimation directly. Another advantage of the LiDAR sensor is that the 3D point cloud data do not introduce any privacy concerns, which is critical for security purposes. The detection results could be used for real-time traffic signal optimization to reduce pedestrian/cyclists’ waiting time and protect vulnerable road users at the signalized intersection. With the LiDAR data collection tool, the traffic manager can learn the mobility patterns to understand the causes for nonrecurrent or recurrent congestion. Connected vehicle applications also rely on real-time data acquisition capability to enable connected vehicle (CV) applications through vehicle-to-everything (V2X) communications to address safety and mobility challenges.
The usage of LiDAR technologies has also been questioned whether the LiDAR sensor has enough value to be a good investment. LiDAR is viewed as a complementary sensor to the camera or radars for the connected infrastructure solution. Critics have argued that the LiDAR sensor is a crutch to the vision zero of future traffic. In general, combining different sensors will increase the reliability and benefit the analyzing layer. It also serves as a nonintrusive approach to provide sufficient coverage. If powered with the next-generation communication network, the 3D data can be accessed in real time to enable safety-critical applications and cyber-physical modeling for computational decision-making.
The majority of the LiDAR-based object detection models were developed for self-driving applications. Current deep learning approaches for autonomous driving LiDAR application are often inadequate for roadside LiDAR because the models often make ineffective predictions on new scenarios with no training data. Infrastructure-based LiDAR has its unique characteristics, compared with mobile LiDAR or airborne LiDAR, leading to different object detection approaches. As the roadside LiDAR contains mostly static backgrounds, an efficient and robust background modeling approach is proposed in this study. More specifically, our proposed method can automatically extract background features through intensity and range information. As an unsupervised learning method for roadside LiDAR application, this method has the better explanatory capability and does not need any labeled data. The data-driven algorithm was built on a solid theoretical foundation than earlier roadside LiDAR methods that rely on observational indicators. This method required few parameters, making it suitable to be used as the benchmark for future works. After a thorough evaluation, this method has achieved the new state of the art by relating the rich body of background modeling techniques to the LiDAR point clouds.
1.1. Related Work
1.1.1. Roadside LiDAR Detection
LiDAR-based object detection algorithms are categorized as mobile LiDAR object detection methods and static LiDAR object detection methods. Most LiDAR data processing algorithms were developed to use precise 3D geometric information for autonomous driving vehicles. Commonly used data structures include point-based [1], voxel-based [2], pillar-based [3], project-based [4–7], and graph-based [8] methods. The mobile LiDAR sensors are carried through complex environments with all captured data for scene understanding. Given the many differences between LiDAR autonomous driving and roadside applications, roadside LiDAR processing methods are primarily based on the background filtering method. The first step is to separate foreground and background; the second step is to cluster the moving points into vehicles or non-vehicles. Then, tracked road users are used for speed estimation and safety analysis. Similar to autonomous driving LiDAR, the roadside LiDAR methods could also be categorized according to different ways of data representations, including point-based, voxel-based, projection-based, and spherical angular-based.(i)Point-Based Method: Xiao et al. [9] developed nearest point methods to subtract background points based on the assumptions that background points have more neighbors than moving targets within a time window. Zhang et al. [10] developed roadside vehicle detection and tracking method based on ground plane removal. The data processing steps include the selection of the region of interest, ground plane point removal, vehicle clustering, and tracking. The data association (DA) method [11] preserved a pure background frame and identified background points in new frames with a predefined threshold (T) compared with all points in the reference frame.(ii)Voxel-Based Method: the 3D density statistic filtering (3D-DSF) method in [12] was developed to track vehicles using roadside LiDAR sensors for connected vehicle applications. The algorithms include background filtering, lane identification, and vehicle speed tracking. In the background filtering step, the 3D space is divided into multiple cubes to estimate point density. The tracking step uses the average point as a tracking point to represent the detected vehicle. Rasterization is another technique used for voxel-based background modeling [13], which stores the voxel cube into an array format. Zhao et al. [14] implemented the 3D-DSF voxel-based method to detect and track pedestrians and vehicles at intersections. The performance of their proposed model is impacted by the density of points, occlusions, and perspective shadow.(iii)Projection-Based Method: this method is to reduce the 3D LiDAR to a 2D plane to leverage image-based modeling methods at the cost of losing 3D information. Zhang et al. [15] developed an image-based vehicle tracking method from roadside LiDAR after converting the 3D data into 2D images. To retrieve the transformation parameters, image registration is exploited.(iv)Spherical Angular-Based Method: Lee and Coifman [16] developed a vehicle detection and classification method, assuming that background’s range at a given angle is constant. The background range is set as the median value at each angle as observed over multiple frames. Zhang et al. [17] proposed an automatic background construction and object detection method for roadside LiDAR data. The background dataset is constructed with the farthest distance in each horizontal-vertical angular value. By comparing the background dataset with new data according to the same horizontal and vertical angular value, object points were extracted and clustered to pedestrian and vehicle detection. Reference [18] created an azimuth-height background filtering method using an azimuth-height table. The azimuth-height background filtering method compares the height of each point with the height of backgrounds.
Some other researchers advanced this area by integrating roadside LiDAR into traffic management. Zhao et al. [19] researched lane and movement-based traffic volume data collection using infrastructure-based LiDAR under different congestion levels and traffic compositions, covering signalized intersections, pedestrian crossings, work zones, stop-sign intersections, metered/unmetered ramps, and rural highways. Lv et al. [20] proposed a LiDAR-enhanced connected infrastructure solution to collect traffic data of traffic participants using roadside LiDAR and broadcast the message through DSRC to enable connected vehicle application. Reference [21] used LiDAR-detected vehicle trajectories to generate connected vehicle messages through the roadside unit to help connected vehicle applications.
The aforementioned methods generally use basic summaries from aggregated frames, such as maximum value, mean value, distance threshold, gradients, or density, to filter out backgrounds. Most methods lack transferability as parameters mainly rely on engineers’ experience. For instance, the size of the 3D cube has a significant influence on detection performance, but cube size is hard to calibrate. Smaller cube sizes will significantly increase computational loads; larger cube size lowers the accuracy. Given the limitations of the earlier methods, a robust and intelligent LiDAR background modeling method needs to be developed. We developed advanced techniques through pattern decomposition and dynamic clustering with scalability and easy maintenance to address the challenges.
1.2. Background Modeling
The background modeling method is the first step of many video surveillance applications to understand video sequences. Each video frame is compared with the background video model to identify foreground objects with precise localization information. Bouwmans [22] provides a comprehensive survey paper. The background modeling method has three main steps: 1. background initialization using first N frames; 2. classification of pixels into foreground and background; and 3. background model maintenance over time. The paper also identified 13 challenging situations for background modeling: noisy image; camera jitter; automatic camera adjustment; illumination changes; bootstrapping; camouflage; foreground aperture; moved background objects; inserted background objects; dynamic backgrounds; beginning moving object; sleeping foreground object; and shadows. The paper [23] classified the background modeling method into the following categories: basic background modeling; statistical background modeling; fuzzy background modeling; background clustering; neural network background modeling; wavelet background modeling; and background estimation. The basic model uses mean [24], median [25], or histogram [26] to describe background pixels. The statistical model uses statistical variables such as Gaussian distribution [27, 28] kernel density estimation [29] to classify pixels. The fuzzy background model [30, 31] uses a fuzzy running average or type 2 fuzzy mixture of Gaussian. The background clustering model uses the K-means algorithm [32] or codebook [33]. The neural network background modeling [34, 35] trains a set of weights on N clean background frames. The wavelet background model uses discrete wavelet transformation (DWT) [36]. The background estimation model is estimated with a filter such as a Wiener filter [37], a Kalman filter [38], or a Chebyshev filter [39]. Goyal and Singhai [40] reviewed several Gaussian mixture models for background/foreground detection, conducted comparative analysis, and analyzed the scope to improve them.
2. Methodology
In this section, two data-driven algorithms, dynamic mode decomposition (DMD) and coarse-fine triangle algorithm (CFTA), were applied for roadside LiDAR moving object detection and tracking. The first step of using the background modeling methods is to transform and reorganize LiDAR data from a packet of to azimuth, range, and intensity matrices.
2.1. Data Transformation
The LiDAR sensor records distance relative to itself and intensity values (depending on the reflectivity of the object and the wavelength used by the LiDAR). Two types of packets were created, data packets and position packets. The position packets are referred to as GPS packets, and the data packets contain distance and intensity information. The LiDAR system uses a spherical coordinate system initially, and then, the spherical format data are transformed in the format of XYZ coordinate. The 3D coordinates of each point, in the Cartesian and spherical coordinates, are calculated as follows:where is the measured distance, is the yaw angle around z-axis, and is the fixed pitch angle of each laser emitter.
The testing Velodyne LiDAR sensor generates one data frame after completing a scanning, with the theoretical azimuth resolution of . However, the azimuth angles between two emits often vary and deviate from the theoretical resolution. In this step, the original LiDAR data packet will be arranged into 1800 grids, based on an azimuth resolution of . The range of azimuth is from , which needs to convert to . A hash function rearranges the LiDAR point to its corresponding grid.
has collision, and we will compare the range between two points that crash into the same azimuth grid. The smaller-range point will be preserved because the background points are often farther than foreground objects, and we want to keep the most informative data.
As shown in Figure 1, the LiDAR streaming data in file format are stored in Cartesian coordinates with additional information of intensity value. The number of beams and the elevation of each LiDAR beam are fixed for the LiDAR device. Therefore, we regard the elevation value as a known factor and do not need to keep a set of matrices for elevation data.

2.2. Dynamic Mode Decomposition
Dynamic mode decomposition (DMD) is a data-driven technique for discovering underlying patterns from high-dimensional data. It was first defined by Schmid and Sesterhenn [41, 42] to extract dynamic information from flow fields that can describe the physical mechanisms captured in the data sequence. The DMD methods are connected to the mathematical foundation that is readily interpretable using standard dynamic system techniques. The goal of the DMD method is to extract the background mode for each channel of LiDAR. Then, we will use the background mode to match the background points and filter out moving objects.
The LiDAR data at each beam can be considered as one slice of environmental information for each spin. The intensity data of at frame is assumed to relate to previous intensity measurement of by linear operator A, and the linear operator is a time-independent operator that reflects the time evolution of each beam’s intensity value.
The DMD algorithm is a regression method to estimate A that can characterize the intensity changes captured by each frame. The problem is formulated as follows:where is called the left intensity matrix and is called the right intensity matrix. has one frame difference compared with . represents the time evolution of matrix . The DMD algorithm seeks to find the best fit between the two matrices using a linear operator .
To solve , the problem is converted to the following least-squares problem.
Using the Moore–Penrose pseudoinverse, we obtain the estimator :
The DMD mode that contains intensity information is the eigenvector of . Each DMD mode corresponds to an eigenvalue of . By finding the eigenvectors and eigenvalues of the matrix , we obtain the DMD mode .
The column of is eigenvectors comprising of the dominant mode and is the diagonal matrix of eigenvalues . The spatial-temporal intensity matrix is reconstructed using first modes, where .where is dominant modes from the spatial-temporal map and matrix B is the matrix of amplitudes. is the Vandermonde matrix representing the time evolution of DMD modes.
An intensity measurement at frame can be estimated as follows:where is amplitude, is each DMD mode, and is the time evolution of each intensity mode.
Let , and we obtain the following equation.so that the matrix B can be estimated as a least-squares problem using the first scanline as the initial state.
Any DMD mode that does not change in time will have , which forms the background of the intensity diagram.
In the intensity diagram, the intensity values of the background are highly correlated from one column vector to the next, suggesting the low-rank structure. The DMD algorithm separates background and foreground by decomposing the intensity diagram into low-rank (background) and sparse (foreground) components [43].where is the data frame sequence.
The separation results are shown in Figure 2. The y-axis shows azimuth units of the LiDAR beam, and the x-axis is the accumulated data frame. The left figure is the original intensity image, the middle image is the background that is time-independent, and the right figure is the foreground moving objects. After obtaining the background intensity modes for all channels, we can use the background intensity value as a filter to detect the moving objects.

The DMD method will be applied to each beam to build a background filter, which will be used to separate moving objects from background objects.
|
2.3. Coarse-Fine Triangle Algorithm (CFTA)
The triangle algorithm is a dynamic clustering method based on histogram analysis, considering that the static infrastructure backgrounds are the farthest objects hit by the laser beams. The first step is to construct a histogram of ranges vs. frequency for all elevation-azimuth units. Figure 3 is generated after accumulating 4000 frames using testing LiDAR, and the range information from Beam ID 90 at Azimuth Grid 80 was randomly selected. Like in the histogram, the background range value is around 19 meters, while the foreground moving object’s ranges are about 15 meters. The background/foreground ratio is about 7 : 1 under heavy traffic conditions. A line is drawn between the highest range value of the histogram and the minimum range histogram (for LiDAR data, the minimum histogram is by default at a distance 0 because the emitted laser points either hit an object and returned with a positive range value or never return). Then, the algorithm will calculate the point to line distance and increase and repeat for all histogram until . The threshold value becomes the bin edge of range value for which the distance is maximized. For the triangle algorithms, the only concern is the bin size of the histogram to make sure the background counts will fall into the same bin, at the same time separable from moving objects. We developed a coarse-fine triangle algorithm (CFTA) to automate the bin size selection process. For the coarse step, the algorithm estimates where the highest peak is. In the fine step, outliers were removed, and a new bin size was determined to find the threshold value.

The triangle thresholding aims to classify point clouds into either background points or moving objects based on the range information. The CFTA can automatically select the threshold range values for the moving vehicle detection. The method is developed on two assumptions: 1. the static background objects occupied most of the frequencies in the LiDAR point clouds and 2. the background points have the farthest distance and are distributed normally with standard measurement errors. This technique is particularly effective when the background objects point clouds produce a dominant peak in the histogram [44]. Figure 4(a) illustrates the situation that the laser beam reached the farthest object in the environment, which often becomes the background. Figure 4(b) describes the situation that laser beam hit on a passenger car, which is our target for the segmentation task.

(a)

(b)
|
2.4. Experimental Design
This data collection is to test intelligent transportation system infrastructure at intersections through immediate data collection and analysis capability. This project aims to test innovative sensing and detection methods to evaluate and monitor signal performance in real time. The outcome of the project will be used to improve intersection safety, reduce congestion, and have an environmental impact. The testing site is selected from a key New Jersey arterial corridor on October 20, 2021, from 3 to 6 pm at US1 & Bakers Basin. The 3-hour data include high-resolution GoPro video, 128 beam Velodyne Alpha Prime LiDAR data, connected vehicle SPaT, and MAP data. The camera was mounted on the roadside pole, and the LiDAR sensor was mounted with a tripod at the walkway, powered by high-capacity batteries and a solar panel. The GoPro generated 80G data at 60 frames per second during the three-hour periods, and the Alpha Prime Velodyne LiDAR generated 70G data at 10 Hz. The LiDAR has similar data storage efficiency and somewhat reduced the data size compared with video data.
In Figure 5, the experiment setup was displayed to show the coverage for video and LiDAR detection at the same timestamp. The vehicle detection and tracking are processed using YOLOv5 and DeepSORT for comparative analysis. The LiDAR sensor was only installed at the height of 1.7 meters and already can sufficiently provide a wide range of coverage and holistic 3D measurements of the surrounding environment. Although the camera is installed at a height of about 5 meters, it is still difficult to cover the entire intersection area from all directions. Compared with the camera detector, the LiDAR sensor shows excellent potential and will play a significant role as an intelligent infrastructure solution in the following years.

2.5. Model Evaluation
In this session, we will break the entire solution into sequential steps and examine the model results in detail. The overall workflow is shown in Figure 6. The ROI filter, noise removal, clustering, and bounding box detector are considered general approaches. The tracking module was implemented with fine-tuned parameters from off-the-shelf packages. The two new algorithms, including DMD intensity background subtraction and CFTA for range background subtraction, are integrated as one module. The extracted vehicle movements can be applied to many mobility or safety applications. For example, the vehicle counts for each turning movement could be used for signal optimization to assess whether the phase split is efficient.

2.6. ROI Filter
As the infrastructure of LiDAR is static, accurate GPS coordinates can be obtained in practice. Therefore, the non-drivable space within the monitored area could be removed using geofencing methods. In Figure 7, the raw LiDAR data are filtered by projecting all points into the X-o-Y plane using a binary mask.

2.7. Background Subtraction
The background subtraction methods are directly performed on spherical coordinates, which are the original coordinates of data collected from the LiDAR sensor. Therefore, it would save the computation from converting spherical data to Cartesian coordinates within the sensor chip. Combing intensity and range information provides redundancy for background filtering. With efficient background subtraction, more than 90% of data can be eliminated. Using the spherical coordinates system could significantly improve the LiDAR point cloud acquisition and transmission efficiency.
2.8. Noise Removal and Clustering
The noise removal is based on the local outlier factor (LOF) algorithm. A point will be considered as noise if its K distance, the distance between the point and its nearest neighbor, is smaller than a threshold. The point cloud clustering step is also distance-based and segments all point cloud data into clusters and returns cluster labels of all cloud points.
2.9. Bounding Box Detector and Tracking
Upon finishing clustering, we then estimate the bounding box to each cluster that is greater than the minimum threshold number of points. After that, the detected object is encoded into the state-space model that contains the objects’ corresponding measurements and transition of state (speed in x, y z dimension, and turning rate). A joint probabilistic data association (JPDA) tracker with an interacting multiple model (IMM) filter is applied to update the tracked list of objects for each frame. In Figure 8, the model outputs after all steps are presented, showing the vehicle detection and tracking results from three phases of the signalized intersection. In the first column, the foreground moving vehicles are colored green, and the background LiDAR point clouds are colored in purple. The middle column pictures contain tracking module outputs, where the red boxes are detected objects from the detector module, and the green box is confirmed tracks with certain confidence with the tracking history.

We also presented the video detections at the same timestamp with LiDAR data using YOLOv5, which was trained on the coco dataset and DeepSORT for real-time vehicle detection and tracking. As you can see, the LiDAR sensor provides a broader field of view than the GoPro camera. For phase B, the pretrained deep learning model missed three vehicles passing the intersection due to the impact of the streetlight pole. However, the streetlight pole has little effect on LiDAR detection. The proposed LiDAR detection model showed excellent reliability and was comparable to one of the most advanced video detectors.
2.10. Path-Level Evaluation
Movement counting is an essential input for the signalized intersection to optimize the timing parameters. Figure 9 shows LiDAR-detected vehicle trajectories grouped by traveling paths in different colors. The second half of the picture is vehicle detection and tracking results from a commercial AI data collection platform [45], which can generate a report of traffic counts at 15-minute intervals with more than 95% accuracy.

(a)

(b)
In Table 1, vehicle movement countings for all four directions are presented. The overall counting accuracy is 94.74%. As roadside LiDAR traffic detection is still an emerging application, no previous research has been done by validating roadside LiDAR traffic detection with a commercial AI-based video detection platform. This movement count testing assesses the background subtraction performances and tests the detection and tracking modules. The main reason that causes counting errors is that the vehicles in the farther lane to the LiDAR sensors are often blocked by nearby vehicles from other paths. The blind zones on the LiDAR point clouds pose significant challenges to the tracking modules due to the hit-and-miss vehicle presences and partial occlusions.
2.11. Point-Level Evaluation
The following Table 2 was concerned with segmentation results at point level compared with the SOTA method [17]. In the training process, the baseline model accumulated 2000 frames and applied mean and max values to remove foreground points. The preserved background points are stored and used to judge whether new points are moving objects by comparing them with reference points. We then randomly selected data frames and manually processed background removal to generate ground truth data. By comparing model-filtered points with ground truth points, we can say whether the detected point is true positive, false positive, or false negative. We created three detection evaluations using the common classification metrics, precision, recall, and F1 scores. The precision score tells us what percentage of detected points is correct. The recall score tells us what the percentage of detected foreground points is. The score is a harmonic mean of precision and recall scores to balance two measurements.
It can be seen from Table 2 that our new methods have the best precision on both and meter ranges. The high precision and relatively low recall scores suggest that our model is more rigorous than the baseline model. Overall, our approach surpassed the SOTA baseline on 4/6 evaluation categories. The reference model takes 2.98 seconds to process one frame since it compares new data points with accumulated background points. Our method runs at 0.26 seconds per frame, which is faster than baseline models. Figure 10(a) is the manually processed ground truth data; Figures 10(b) and 10(c) are our model detection results and baseline model detection results. From Figure 10, we can see that the reference model tends to preserve more background points than our proposed methods.

(a)

(b)

(c)
3. Discussion
The roadside LiDAR data have different characteristics than mobile LiDAR data. First, most of the point clouds in the roadside LiDAR model are static background, while the mobile LiDAR point cloud model contains mainly the changing environment; second, with the increasing distance between road users and the LiDAR sensor, the gaps among laser beams get more significant, resulting in the more unseen area and fewer rings on detected objects. Third, the roadside LiDAR is usually installed at an elevated location to monitor a large area, while autonomous driving LiDAR mainly scans side-by-side vehicles. Figure 11 presents the experimental results using PointPillars [3] deep neural networks trained on PandaSet [46], which contains 2560 preprocessed organized LiDAR scans of various driving scenes. The dataset provides 3D bounding box labels for different object classes, including car, truck, and pedestrian. As shown in Figure 11, the PointPillars attains effective results on the mobile LiDAR dataset (Table 3). When the confidence threshold was set as the default value of 0.5, the pretrained deep learning model generates zero detections, which is not applicable to our roadside LiDAR purpose (Figure 11(b)). After lowering the confidence level to 0.3, the pretrained model only gives a few disoriented detection results (Figure 11(c)). The autonomous driving LiDAR training data are generated with 64 beam devices, while the roadside LiDAR has 128 beams. Another observation is that autonomous driving LiDAR datasets are biased on vehicles with nearby vehicles with the same traveling direction, while roadside LiDAR dataset contains vehicles coming from any direction.

(a)

(b)

(c)
The high-resolution point cloud data will support the next-generation research on 3D big data sensing and analytics by creating the digital twin of infrastructure systems in a holistic 3D environment. The roadside LiDAR object detection could be used to explore many underlying scientific problems, including transportation, infrastructure, energy, public service, and human activity systems and their interactions. To examine the model performance on different scenarios, the method was also implemented in an urban environment as part of the Middlesex County Smart Mobility Testing Ground (SMTG) to establish a living laboratory for smart mobility and smart city technology research. Our model was further tested at an intersection in downtown New Brunswick, New Jersey. The proposed method can adapt effortlessly to a new scenario with less than a couple of hours for preparation and recalibration. Figure 12 shows vehicle detection results from the portable setting and the urban scenario with a permanent power supply and communication cables. The animated visualizations of over 1000 frames can be found in this project’s public repository [47].

(a)

(b)
4. Concluding Remarks
In this study, we developed a novel background subtraction method with unsupervised learning algorithms for infrastructure LiDAR object detection and tracking. The main contributions of this study to the existing works of literature are summarized as follows:(1)Our method integrates the range and intensity information for point cloud object detection, which can also be used independently. As a result, this method can reduce 90% redundant background points and increase the data acquisition efficiency.(2)Instead of converting the point clouds into 3D voxels, our methods transform the unstructured point clouds into structured representation to be processed by 3D object detector with reduced dimensions. With proper data transformation, we bridge the gap between image-based background modeling and point cloud background modeling, making a rich body of well-studied image-based techniques suitable for LiDAR data.(3)The proposed methods are built on unsupervised learning that automatically discovers the structures from data. The two algorithms require very few parameters, which means more robust and easier autocalibration and deployment. For intensity-based algorithms, the only parameter is the intensity threshold that differentiates sparse foregrounds from the low-rank background modes. The coarse-fine triangle algorithm is even better as a parameter-free algorithm.(4)Compared with the deep learning-based LiDAR object detection methods, our method shows more heuristics and better expandability. It does not need to collect large amounts of training data, sophisticated network design, and GPU to support the functionality.(5)Compared with the SOTA roadside LiDAR background modeling methods, our method runs faster with better performance, evidenced by point-level assessment. In addition, the proposed background model is easy to maintain.
LiDAR-based 3D object detection is a challenging task, which requires high detection performance and fast inference. The occlusion issue is the main challenge of infrastructure LiDAR detection and tracking due to blind zones and increased monitoring area for roadside application, making the inherently sparse point clouds even more complicated to process. A potential solution is to stitch multiple point clouds together to increase data points or fuse information from different sensors.
Data Availability
The data will be accessed upon reasonable request [47].
Disclosure
The preprint version of this article could be found in [48].
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was funded by the New Jersey DOT Real-Time Signal Performance Measures (project no. 2016–14) and New Brunswick Innovation Hub Smart Mobility Testing Ground (SMTG), contract no. 21–60168.