Abstract
Vision impairment is a major challenge faced by humanity on a large scale throughout the world. Affected people find independently navigating and detecting obstacles extremely tedious. Thus, a potential solution for accurately detecting obstacles requires an integrated deployment of the Internet of Things and predictive analytics. This research introduces “Vision Navigator,” a novel framework for assisting visually impaired users in obstacle analysis and tracking so that they can move independently. An intelligent stick named “Smart-fold Cane” and sensor-equipped shoes called “Smart-alert Walker” are the main constituents of our proposed model. For object detection and classification, the stick uses a single-shot detection (SSD) mechanism, which is followed by frame generation using the recurrent neural network (RNN) model. Smart-alert Walker is a lightweight shoe that acts as an emergency unit that notifies the user regarding the presence of any obstacle within a short distance range. This intelligent obstacle detection model using the SSD-RNN approach was deployed in real time and its performance was validated in indoor and outdoor environments. The SSD-RNN model computed an optimum accuracy of 95.06% and 87.68% indoors and outdoors, respectively. The model was also evaluated in the context of users’ distance from obstacles. The proposed SSD-RNN model had an accuracy rate of 96.4% and 86.8% for close and distant obstacles, respectively, outperforming other models. Execution time for the SSD-RNN model was 4.82 s with the highest mean accuracy rate of 95.54% considering all common obstacles.
1. Introduction
In an extensive survey conducted worldwide, approximately 940 million people were detected to have a certain level of vision-related issues. Approximately 240 million among them suffered from extremely low vision, while approximately 39 million lacked complete vision [1]. Vision impairment, which restricts the ability to perceive, and is seldom curable, is a major global concern in recent times. Difficulties in normal movement, perceiving and detecting surrounding objects, and proper indoor and outdoor navigation are some of the basic complexities faced by affected people. Specifically, perceiving and identifying obstacles, and reacting to them in real time, is a challenge for them. These vision issues prevail mainly in underdeveloped nations because they are unable to afford the latest devices, which are often expensive. The problem is mainly dominant in old age people of densely populated underdeveloped countries [2]. Cataracts, refractive issues, glaucoma, retina problems, or age-based eye disorders are some common factors that lead to vision impairment [3, 4].
Navigation becomes a major concern for these people in unfamiliar environments. In addition to medical help, other services like awareness campaigns, regular rehabilitation programs, and social inclusion are used by these specially abled individuals. Many people use a white cane, where the cane’s length is dependent on the touch sensation. However, its use is limited during traveling. Furthermore, its lack of flexibility results in cracks over regular use. In many scenarios, people use guide dogs to assist in walking. These dogs alert the user of any potential obstacles in the way. However, these guide dogs may not always give accurate directions in crowded and complicated surroundings. GPS-enabled devices are also used by many people as assistive tools. These devices act as a navigation and orientation interface to locate the desired areas. Although they are effective in locating specific regions, they are not precise in avoiding and identifying obstacles. In many cases, an echolocation technique is used by visually disabled communities [5]. Here, sound echoes of mouth clicks are applied for the detection of obstacles in front of them. Short-distance range and leakage of data are the two main limitations of this approach. Other obstacle detection approaches, which are based on Quick Response codes and bar codes, are used to recognize various types of obstacles in crowded areas [6–8]. However, these approaches need a technologically advanced infrastructure and the support of a third person to accomplish tasks. Similarly, many devices exist to help users with visually disabled, but many of them are either used to detect obstacles through machine vision or used sensory modules like GPS and distance sensors. With recent advancements in science and technology, the lives of people with vision impairments can be improved and navigation can be made easier and more effective [9, 10]. Effective integration of sensor methodologies with computational vision and image processing can help in developing a real-time robust, cost-effective, and reliable model to assist users with vision impairments, thereby making them aware of any potential danger. The development of innovative modern technologies, such as the Internet of Things (IoT) and predictive analytics, has opened up possibilities for providing an interactive system to assist a person with vision concerns to independently navigate in all environments. The system or devices should be able to process data quickly, have a wide coverage area with enhanced detection of static and dynamic obstacles, and operate indoors and outdoors, depending on the needs of the user.
The major contributions of this article are as follows:(1)This research analysis addresses this obstacle identification issue for users with major visual concerns and proposes a smart and intelligent obstacle recognition framework for them. In this research, a novel obstacle detection model named “Vision Navigator” is designed, which can assist users with vision concerns users to detect and recognize different types of obstacles in navigating indoors and outdoors.(2)The integrated framework comprises an intelligent folded stick termed “Smart-fold Cane” and a pair of lightweight sneakers with two built-in ultrasonic sensors called “Smart-alert Walker.” With the help of Smart-fold Cane, the distance between the user and obstacles can be conveniently computed, and water bodies on the way can be identified. In addition, the detected obstacle images can be captured and classified with notification to the user quickly.(3)The second component, namely, Smart-alert Walker, is a pair of lightweight sneakers equipped with two ultrasonic sensors located at the front and left end of the sneakers. The sensors are emergency modules that determine the presence of any potential obstacle within a short range so that the user will be instantly informed about the danger to avoid any mishap.(4)Smart-fold Cane and Smart-alert Walker combine to form the pillars of the proposed assistive obstacle detection model.(5)The model was deployed in real time and its outcome was promising. The combined hybrid SSD-RNN model was successfully able to recognize, classify, and notify users about the presence of obstacles with excellent accuracy. Figure 1 depicts a skeleton model and the real-time usage of the proposed model.

This paper is organized as follows. Section 1 introduces the domain and addresses the importance of a new, efficient model for obstacle detection for people with vision concerns. Section 2 discusses the crucial background studies and related work by different researchers in the domain. Section 3 highlights the proposed working model in detail with the components and its functionalities discussed. Section 4 provides the results of deploying the model in a real-time scenario and analyzes the outcomes. Section 5 concludes and emphasizes the major inferences of the research.
2. Related Work
Detecting and carefully avoiding an obstacle at the right time is critical for visually impaired individuals. With the rise of modern technologies, a variety of working models have been developed and presented in this domain [11]. Many innovations have been proposed during the last two decades to help visually affected people effectively move around their surroundings. Therefore, numerous system models have appeared on the market. Some of them are wireless technologies. Visually impaired people face many challenges in their daily lives, but most of these challenges have been addressed through technological advancements [12]. In this section, extensive research is undertaken to describe the relevant work in the domain. The overall literature survey is partitioned into two sub-categories which are discussed below.
2.1. Sensors Based Existing Works
Some significant existing works based on usage of smart sensors are presented. The usage of modern technologies for assisting users with vision impairments has generated positive reviews like the use of wearables [13], electronic traveling support [14], mobile assisting tools [15], machine vision–enabled models [16], haptics usage and design [17], substituting sensory equipment [18], and electronic navigation help [19]. With model-based state-feedback control, a smart cane and obstacle detection device for visually impaired people with multiple sensors has been developed. In Ref. [20], a white cane system composed of IC tags is designed to help visually disabled people walk comfortably in indoor areas. The authors of Ref. [21] proposed a system for determining TDOA using various signaling techniques. In Ref. [22], the authors introduced a nonlinear cost function for determining an object’s indoor position by reducing the number of squares of a nonlinear cost function, such as least-squares algorithms. Other algorithms for calculating an object’s indoor position include residual weighting and closest neighbor, which assess the location in comparison to reference points or base station coordinates [23]. Santos [24] proposed a module that uses a smartphone and an embedded device to communicate with public transit through wireless channels. Mutriara et al. [25] mentioned a tool that uses a GPS module to notify the position of a building that visually impaired users want to enter. Rehabilitative shoes and spectacles constituted another new method introduced by Abu-Faraj et al. [26]. The obstacles in front of users are sensed using ultrasonic transducers in this device. This method was used to determine the thickness of obstacles as well as the presence of any potholes in front of users. In 2018, Hu et al. proposed a method that relied on link models and assigned an equivalent amount of work based on its characteristics [27]. This method eliminates repetition and ensures precision to predetermined levels. It uses symbols instead of words because the objects are aligned in the 2D scale ratio. Al-Shehabi et al. [28] created a wearable navigation aid to help visually affected people reach their desired destination in a new environment. This module contains a Kinect chip, a tablet PC, a micro-controller, IMU, sensors, and vibration actuators for orienting the user in the next direction.
2.2. Computer Vision-Based Existing Works
In this subsection, some vital computer vision driven relevant works are discussed. In 2005, Chen and Yuille [29] published a cascaded model whose main purpose was to emphasize time complexities and their accuracy considering the various tests carried out by the greedy method; the model uses an algorithm that detects text from a cascade of images. In 2015, Wei et al. published a work in which the best output for image classification with a single label is achieved using a convolutional neural network (CNN) model [30]. In 2016, Zhang et al. presented a model that was used to detect trends in urban areas such as public streets and restaurants, as well as rainy environments [31]. This method categorized audio recordings, resulting in patterns. In 2015, Mekhalfi et al. [32] published a compressive sensing approach for vision-affected users. Here, objects were detected by grouping people using a camera in various indoor spaces. To determine the Euclidean distance and Gaussian method, this study used a multi-labeling strategy. It checks for the presence of a variety of artifacts related to the data collection. In Ref. [33], the authors provided a survey on electronic travel aids (ETA) for visually impaired navigation assistance. Various ETAs were discussed and compared in terms of their advantages and disadvantages. For activity recognition, a deep novel architecture for visually disabled people using a late combination of two parallel CNNs that outperform state-of-the-art methods was discussed [34]. Another approach proposed in Ref. [35] used CNN for object detection, followed by a recurrent neural network (RNN) and softmax classifier with intensity color thresholding for color recognition. In Ref. [36], researchers presented a visually impaired outdoor navigation assistant using a combination of machine vision and deep-learning techniques. The framework tracked objects without prior knowledge using a regression-based mechanism. It was able to handle sudden camera movements and uses you only look once (YOLO) for object recognition. In Ref. [37], a mobile app was designed to assist visually disabled people. It has two modes: offline and online based on the user’s network access. Faster RCNN and YOLO are used in the online mode to produce predictions in stable conditions. In the offline mode, however, a feature recognition module based on Haar features and histogram of gradients serves this function. The ImageNet dataset [38] has been used to develop a CNN for pretrained object recognition. Rajput et al. created a smart obstacle detector device to assist blind people in doing their jobs more conveniently and comfortably. A generic cane detection system was ineffective and precluded movement in blind people [39]. With the aid of a camera, the proposed device detected objects using a video processing method. This device used video processing to detect objects efficiently and quickly. Visual navigation aids were also developed by Thomas et al. [40]. To provide visual cues, this navigation aid was equipped with a wearable computer system with a see-through monitor, digital compass, and differential GPS.
2.3. Semantic Segmentation-Based Existing Works
Pixel-by-pixel semantic segmentation is an effective method for detecting and identifying many classes of objects at the same time. There are various methods based on it used for assisting a visually impaired person. Deep learning pipelines have spurred the growth of semantic segmentation. FCNs which were proposed to convert CNNs, which were originally created for classification, to give pixel-wise classification outputs by making them completely convolutional, are a significant part of the literature. Another groundbreaking deep CNN architecture with a topologically symmetrical encoder-decoder design is SegNet. Instead of keeping all feature maps, SegNet up-samples the corresponding feature maps for the decoder using max-pooling indexes obtained from the encoder, drastically reducing memory and computational costs. ENet was offered as an efficient option for real-time semantic segmentation implementation. ENet was built with different bottleneck modules that may be utilized for either down-sampling or up-sampling images, based on ResNet views. ENet, unlike SegNet, has a larger encoder than its decoder because it is thought that the initial network layers should not immediately contribute to classification. ERFNet was created with the goal of increasing the accuracy/efficiency trade-off and making CNN-based segmentation appropriate for existing embedded hardware platforms. SQNet used parallel dilated convolutions and fused them as an element-wise sum to merge low-level knowledge from lower layers of the encoder, which helped with more precisely categorizing object outlines. By linking the encoder and the accompanying decoder, LinkNet attempted to obtain precise instance-level prediction without sacrificing processing time. In terms of pixel-exact categorization of tiny features, these architectures have outperformed ENet. PSPNet advocated using a decoder with max-pooling layers of various widths for large-scale scene parsing jobs in order to acquire various amounts of context in the last layers.
3. Materials and Methodology
At present, many people with vision-related concerns find challenges in navigation inside or outside of their homes. Thus, a portable and reliable smart device with intelligence functionality can assist them to independently navigate and move freely.
3.1. System Requirements
Developing a smart and intelligent model for users with vision concerns requires systematic integration of various constituents. Table 1 highlights the vital hardware and software components used in configuration of the model.
Important constituents of the Vision Navigator include Arduino board, audio module, camera modules, water sensor, ultrasonic sensor, push button, and battery units. They are illustrated in a sample circuit diagram as shown in Figure 2.

3.2. Datasets Used
The system will need some inputs to generate a model for the proposed system. We are using the following datasets.
3.2.1. MS COCO
Microsoft Common Objects in Context (MS COCO) is an object detection dataset with 80 classes, 80,000 training images, and 40,000 validation images. The goal of this dataset is to determine the state of object recognition by placing different queries on object detection. This goal can be accomplished by gathering in a natural context various complex everyday scenes consisting of common objects from the environment. An object is labeled by optimization per instance to help locate objects precisely. The Single Shot MultiBox Detector (SSD) model uses this dataset for obstacle detection [41].
3.2.2. Flickr30k
The Flickr30k dataset is a definitive source for the representation of phrase-based images. This work introduces Flickr30k Entities, which increases the 158k captions from which a single image is given different captions. The RNN model uses this dataset for suitable sentence framing from detected obstacle images [42].
3.3. SSD-RNN for Obstacle Recognition
In this research, an SSD algorithm is used for detecting and classifying obstacles from the camera-captured images in the way of a person with vision concerns. Later, RNNs are used to generate an appropriate sentence for the detected obstacle by SSD, which is further communicated to users.
Using multibox [43], the SSD takes only one shot to detect multiple objects present in an image. The SSD has a substantially faster object detection algorithm with high accuracy. The SSD’s high speed and precision when working with low-resolution images can be attributed to the following factors.(i)Proposals of bounding boxes are no longer accepted.(ii)For predicting object categories and offsets in bounding box locations, a progressively decreasing convolutional filter is used.(iii)For object detection, multiple boxes or filters of various sizes and aspect ratios are used.
We add additional convolutional layers for detection to the base VGG network. The scale of the convolutional layers at the end of the base network decreases gradually, aiding in the detection of objects at a number of levels. Figure 3 shows a simple SSD model for obstacle detection.

3.3.1. Training of SSD
An input image with ground-truth bounding boxes for each object in the image is sent to the SSD. The function segmentation is based on the VGG-16 base network. Convolution layers test boxes that comprise various aspect ratios that are present at each position in several feature maps in the form of various scales. Several default boxes of various sizes and aspect ratios are placed around the entire picture, aiding in evaluating the default box that most strongly matches the ground-truth bounding box containing objects.
3.3.2. Matching Strategy
The default boxes are matched to the ground truth boxes during training in terms of aspect ratio, position, and size. The boxes that have the most overlap with the ground truth bounding boxes are chosen. The expected box and ground truth should have an intersection union greater than 0.5. The multibox chooses the expected box that has the most ground reality overlap. Each forecast is made up of the following parts:(i)The offsets from the default box’s middle section, as well as the height and width of the box.(ii)All object types or classes have confidence. Class 0 is reserved for suggesting the object’s absence.
3.3.3. Data Augmentation
Shearing, zooming in and out, rotating, cropping, and other data augmentation techniques are used to manage a wide range of object sizes and shapes. Through data use, augmentation enhances the model’s resilience to a wide range of input object sizes and shapes, aiding in improving the model’s accuracy. The original opening input image is randomly sampled for each training sample. The operational step of an SSD model for obstacle detection as discussed is shown in Figure 4.

RNNs are a form of neural network in which the output from the previous step is used as input in the current step [44–47]. RNNs have a “memory” that holds all details about the calculation. They use the same parameters for each input because they produce the same output by observing the performance on all inputs or hidden layers. RNNs make decisions based on historical data.
A network with one input layer, three hidden layers, and one output layer, as shown in Figure 5, is considered. Each layer, like other neural networks, has its own collection of weights and biases, such as (, b1) for hidden layer 1, (, b2) for the second hidden layer, and (, b3) for the third hidden layer. By giving the same weights and biases to the layers, RNN transforms independent activation into dependent activation, minimizing the complexity of increasing parameters and memorizing each previous output by feeding each output into the next hidden layer. As a result, these three layers can be combined into a single recurrent layer with the same weights and bias as the hidden layers.

The following is the formula for calculating the current state:where : current state, : input state, : previous state.
The formula for applying the activation function is as follows:where : recurrent neuron weight, : input neuron weight.
The formula for calculating the output is as follows:where, : output and : weight at the output layer.
3.3.4. Training through RNN
(i)The network receives a single time phase of the input.(ii)Using the current input and the previous state, the current state is determined.(iii)For the next time stage, the current becomes .(iv)As per the problem, as many time steps can be repeated as necessary and combine the data from all previous states.(v)The final current state is used to measure the performance after all time steps that have been completed.(vi)The error is then calculated by comparing the output with the real output, i.e., the target output.(vii)The error is then back-propagated to the network, updating the weights and training the network (RNN).3.4. Proposed Model Workflow
The proposed Vision Navigator consists of a combination of a Smart-fold Cane with a Smart-alert Walker. The Smart-fold Cane is a stick comprising water sensors, camera modules, and an audio module. As opposed to a traditional stick, the Cane is a smart blind stick that will help people with vision concerns to an extent by allowing them to avoid obstacles while walking or going out, which may cause accidents. The stick is equipped with sensors and cameras to find the objects and give feedback alert messages to the user to avoid unnecessary accidents. The Smart-alert Walker has ultrasonic sensors mounted over it to find the obstacles in the path of the visually impaired person over a short-range distance. Thus, it acts as an emergency unit. The overall prototype of the device, which is laced with IoT sensors and enabled with predictive capabilities on an obstacle in front of the user, is shown in Figure 6. The operation of the system is organized in such a manner that the person finds no difficulty in walking, reaching their destination.

The system model is activated when a visually impaired user uses it for navigation. An Arduino with an embedded Raspberry Pi camera is used as a single-board computer with Bluetooth facilities. It acts as the heart of the system because it controls, processes, and generates all inputs and outputs. The Smart-fold Cane has an ultrasonic sensor, water sensor, camera module, and audio module. The camera feed is taken live from the camera module. The camera module captures frames in real time and then the data are sent to the board where all frames are verified as per the model created to detect objects coming in front of the person. The SSD algorithm is used to detect potential obstacles such as animals, cars, and doors after training using the MS COCO dataset. The image feed is then validated against the trained deep learning model using the SSD model for object detection. Validated obstacles are transferred to RNN for sentence generation. The generated sentences interact with the Flickr30k dataset to frame appropriate sentences based on the matching image of the obstacle provided. The output of RNN is then forwarded to a text-to-speech application interface for vocal output. The audio module conveys the image captions to users in the form of audio alerts. It receives the audio signals from Arduino once the caption for the image is successfully converted into an audio format using a text-to-speech interface. This audio message reaches the user through an earpiece. An emergency alert provision is available using the Smart-alert Walker that is helpful in tracing any obstacle close to the user. Two ultrasonic sensors in Smart-alert Walker alerts for the presence of any obstacles that are too close, enhancing the accuracy of this system. These ultrasonic sensors are placed in such a way on the shoes that they can obtain data from obstacles that have height. The ultrasonic sensor feedback is then validated for its distance within 1 m and is sent to the Arduino board to verify whether the user will be collided with. Furthermore, water sensors are provided in the Smart-fold Cane. Various situations may arise in which the user may get trapped or face water bodies in their path if such bodies are not identified. The water sensor is able to sense any water body and indicate its presence to the user. The feed is then sent to the Arduino board, which analyzes it and sends a command to the vibration and buzzer. Figure 7 presents the overall functionality of the model with different interlinked modules.

4. Results and Analysis
The proposed obstacle detection model Vision Navigator was designed as an assistive interface for users with vision concerns or individuals with fluctuating vision for precise navigation in indoor and outdoor surroundings. Apart from providing a dedicated sensory unit, the predictive capability of this model due to the integration of SSD and RNN is a distinguishing feature. The SSD technique was primarily used for object detection and RNN helped in mapping the detected obstacles with appropriate text generation. The effectiveness of the model in real-time scenarios was tested in indoor and outdoor environments. The combined SSD-RNN model used in the work was compared with existing models like Retina Net, Yolo Tiny, and Region-Based Convolutional Neural Networks (R-CNN).
Figure 8 shows the available obstacles in outdoor real-time roadside surroundings. The SSD model was able to successfully identify various moving and static obstacles in front of users. The developed detection model can observe cars and motorcycles present in the frame.

Figure 9 shows the available objects with labels in an indoor housing environment. The model detected a table fan, two different chairs, a suitcase, and a study table in front of the person, with their respective accuracy rates labeled.

An overall performance analysis was carried out with various common obstacles observed, as shown in Figure 10. Classification accuracy was the metric considered for evaluation. General obstacles were grouped into various distinct types: human, animal, vehicles, plastics, furniture, house, and others. Detection accuracy was done using the comparative models as discussed earlier. R-CNN and Yolo Tiny models generated an intermediate performance throughout the process with 92.98% and 91.35% mean accuracy, respectively. The SSD-RNN model computed the highest mean accuracy rate of 95.54% considering all obstacles.

An accuracy comparison analysis was undertaken using the SSD-RNN model with others in an indoor environment, as shown in Table 2. Various entities, including pets, humans, and furniture, were detected in the indoor scenario. Retina Net used 6 frames to produce a mean accuracy of 93.62%. Yolo Tiny gave 92.26% mean accuracy using 8 frames. R-CNN generated a mean accuracy of 91.48% taking 12 frames. The proposed SSD-RNN model computed the best accuracy of 95.06%. Overall, although Retina Net and Yolo Tiny models performed well on static obstacles like furniture, the SSD-RNN model was more efficient not only in detecting static obstacles but also in accurately detecting human beings.
Obstacle detection outdoors is more challenging because of the frequency of moving objects such as vehicles and human beings. Table 3 shows the results applied to different images in the outdoor environment. The Yolo Tiny model has a relatively low accuracy of 81.32% compared with other models. A very good accuracy rate of 86.64% was generated when implemented with the R-CNN model. Our SSD-RNN method gave an optimum accuracy of 87.68%.
On the basis of measuring the distance of obstacles, the proposed model is evaluated in terms of detection accuracy, as shown in Figure 11. Obstacles were categorized into “close” and “distant” types on the basis of the distance computed between the user and the obstacle. Obstacles within 1 m diameter were tagged as “close” and those beyond 1 m diameter were referred to as “distant.” The analysis is applied to both detecting indoor and outdoor obstacles. As far as “close” obstacles are concerned, Retina Net and R-CNN models gave good performance with accuracy rates of 94.5% and 93.8%, respectively. In terms of “distant” obstacle identification, a visible dip in the performance of all models was observed. Still, among all methods, Retina Net performed reasonably well with 83.9% accuracy. With an accuracy rate of 96.4% and 86.8%, respectively, in “close” and “distant” categories, the proposed SSD-RNN model outperformed the other models.

Latency delay is an important parameter to determine the efficiency of an obstacle detection model in real time. An implementation analysis was performed in this context using the discussed algorithmic models for comparison purposes, as depicted in Figure 12. Retina Net and R-CNN exhibited a consistent performance, but a slight delay in executing obstacle detecting was observed with Yolo Tiny. The SSD-RNN model was comparatively faster in providing the desired result. The recorded latency time for Retina Net, Yolo Tiny, R-CNN, and SSD-RNN models are 6.35, 8.54, 6.96, and 4.82 s, respectively.

A pilot study was conducted to test the effectiveness of the model in a real-time environment. The system was formulated at the time when the world was going through the COVID-19 outbreak. So, we met few people who agreed to test the system. The motto behind the development of this device was to bring a smile on the faces of such specially abled beings. The system application was tested over a group of specially abled people at School of Hope, which is a school for such kind of specially abled people. Authors recorded the testing process by validating it on 10 people and asked the feedback and recorded the confidence quotient of users. Table 4 highlights the metrics of evaluation and the generated outcome. As noted in Table 4, the confidence level of the majority of users is “High,” which validates the effectiveness of the model.
Besides some classical models, the developed SSD-RNN model was also compared with some state-of-the-art models like FCOS, DETR, YOLO4, and EfficientDet. The accuracy and latency were computed and analyzed, as shown in Figure 13. It was found that the SSD-RNN model used in the research generated an optimum outcome with 95.5% accuracy and a mean latency of 4.82 seconds. The performance of other counterparts was equally good but slightly less compared to the SSD-RNN model for obstacle detection.

5. Conclusion
Proper navigation is important for any individual with vision concerns for detecting and avoiding potential obstacles. With advancements in technology, hybrid models can be designed to serve the purpose. In this work, Vision Navigator, a smart framework with intelligence capability using obstacle detection, classification, and notification to the user in real time has been presented to assist the visually impaired community. Smart-fold Cane and Smart-alert Walker are the sub-constituents of the model. Smart-fold Cane is a lightweight stick that has in-built sensors and cameras that are responsible for obstacle image capture and detection. The SSD algorithm is used for obstacle recognition, while the RNN model maps the detected obstacle into the text form. Water pits are also detected through water sensors embedded into the stick. An alert notification is sent to the user through the audio module. The Smart-alert Walker, which is a pair of sneakers, is equipped with ultrasonic sensors that act as an emergency unit and alert the user if any obstacle is present at a close distance from the user. The SSD-RNN obstacle recognition hybrid framework was evaluated indoors and outdoors with other relevant models, namely, Retina Net, Yolo Tiny, and R-CNN. The SSD-RNN model gave an optimum performance, generating an accuracy of 95.06% and 87.68% indoors and outdoors, respectively. It also recorded an accuracy of 96.4% and 86.8% when used for close and distant obstacle detection, respectively. A minimum latency delay of 4.82 s was computed using the SSD-RNN model. An overall 95.54% accuracy with the model for common obstacles was noted. Thus, the designed model is fairly easy to deploy and use, making it a more generic framework compared with other models and well-equipped with all vital features. The proposed system can constructively serve visually affected users for proper navigation.
Data Availability
No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.