Research Article
Improvement in Automatic Speech Recognition of South Asian Accent Using Transfer Learning of DeepSpeech2
Algorithm 1
Dataset preprocessing algorithm.
| Read CSV of common voice dataset | | Choose “Pakistani,” “Indian,” “Dutch,” and “Sri-Lankan” accent rows from country column | | Select “filename” columns from CSV file | | Create a dictionary for JSON file writing | | for i in filename do | | flacConvert(i) | | SilenceRemover(i) | | Duration(i) | | Finding transcript with respect to audio file | | for lineIndex in range(x) do | | transcript = split[lineIndex] | | Write JSON file | | data = {“audio_filepath”: “1.FLAC,” “duration”: duration, “text”: transcript} | | with open(‘json-other-train.txt,” “a”) as outfile: | | json.dump(data, outfile) | | end for | | end for | | flacConvert(i) (Convert mp3 -> FLAC) | | Calculate audio file length | | Calculate audio file sample rates | | Calculate duration = audio file length/sample rate | | Calculate loudness | | Calculate peak amplitude | | Splitting audio file into chunks with silence > 0.5 second (considering it silent if quieter than −16 dBFS) | | for chunk in enumerate (chunks) do | | List.Append(chunk) | | end for |
|