Home > Technology & Gadgets > Audio Recognizer with Pitch, Frequency and Types

Audio Recognizer with Pitch, Frequency and Types

Audio Recognizer

The audio recognizer is used to analyze the audio where frequency, speech to text, text to speech can be recognized. Apple provided few frameworks which can be used to detect speech and it is quite useful in many places. Speech recognition is useful for converting real-time audio to detect and understand what other people try to say or convey. It will detect text, pitch, frequency, amplitude of the audio. When a developer of the iPhone App Development Company will implement audio transmission then he needs to use some inbuilt framework like an audio kit, speech where it will recognize and populate data.

To Set up audio recognizer developer need to follow many steps:

1. The developer needs to add some privacy-related things in info.plist for audio permission where the developer needs to be granted

  • NSSpeechRecognitionUsageDescription
  • NSMicrophoneUsageDescription 

2. Developer needs to import all below framework for Speech recognizer along with frequency, amplitude and converting into text

  • AudioKit
  • Foundation
  • Speech
  • AudioToolbox
  • Accelerate
  • MediaPlayer
  • Foundation
  • OpenGLES

3. The developer needs to create controller, class, and functions that can detect the speech.

Step 1:

The developer needs to create a Controller to Detect Speech and import all necessary libraries and files need to create there.

class ViewController: UIViewController, SFSpeechRecognizerDelegate {

Step 2:

Define all local and global variables along with objects

let speechRecognizer: SFSpeechRecognizer? = SFSpeechRecognizer(locale: Locale.init(identifier:"en-uk"))

var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?

var recognitionTask: SFSpeechRecognitionTask?

let audioEngine = AVAudioEngine()

Step 3:

Define the audio object where the developer needs to implement the permission from the app side and record the audio. If the user did not provided permission to prompt dialog then it cannot be recorded and captured to analyze the audio.

override func viewDidLoad() {


        speechRecognizer?.delegate = self

        SFSpeechRecognizer.requestAuthorization{ status in

            var audiobtnVariable = false

            switch status {

            case .authorized:

audiobtnVariable= true

                print("Permission received")

            case .denied:

audiobtnVariable = false

                print("Premission not granted by the user")

            case .notDetermined:

audiobtnVariable = false

                print("Speech is not recognized ")

            case .restricted:

audiobtnVariable = false

                print("Speech not supporting in this particular device")


            DispatchQueue.main.async {

                self.speechRecognitionButton.isEnabled = buttonState



        self.speechRecognitionLabel.frame.size.width = view.bounds.width - 64


Step 4:

SFSpeech has a lot of capability where it is used to record audio, audio recognition, audio buffering, allocate speeches and developer can use preload audio files also where it can be recognized. SFSpeech can be canceled, pause, stop and resume the current activities to capture the audio session and parse to analyze. SFSpeech can transform internally and analyze the audio to converting into text, capture frequency, and amplitude also.

func startSpeechRecording() {

        if audiorecognitionTask != nil { //used to track progress of a transcription or cancel it


audiorecognitionTask = nil


        let audioRecordSession= AVAudioSession.sharedInstance()

        do {

            try audioRecordSession.setCategory(AVAudioSession.Category(rawValue:

           convertFromAVAudioSessionCategory(AVAudioSession.Category.record)), mode: .default)

            try audioRecordSession.setMode(AVAudioSession.Mode.measurement)

            try audioRecordSession.setActive(true, options: .notifyOthersOnDeactivation)

        } catch {

            print("Failed to setup audio session")


        recognitionRequest = SFSpeechAudioBufferRecognitionRequest() //read from buffer

        let inputNode = audioEngine.inputNode

        guard let recognitionRequest = recognitionRequest else {

            fatalError("Could not create request instance")


      recognitionRequest.shouldReportPartialResults = true

audiorecognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest) {

            res, err in

            var isLast = false

            if res != nil { //res contains transcription of a chunk of audio, corresponding to a single word usually

                isLast = (res?.isFinal)!


       if err != nil || isLast {


           inputNode.removeTap(onBus: 0)

           self.recognitionRequest = nil

           self.audiorecognitionTask = nil

           self.speechRecognitionButton.isEnabled = true

           let bestStr = res?.bestTranscription.formattedString

           var inDict = self.speechDict.contains { $0.key == bestStr}

          if inDict {

              self.speechRecognitionLabel.text = bestStr

              self.audioinput= self.speechDict[bestStr!]!


              else {

              self.speechRecognitionLabel.text = "can't find it in the dictionary"




let format = inputNode.outputFormat(forBus: 0)

        inputNode.installTap(onBus: 0, bufferSize: 1024, format: format) {

          buffer, _ in




        do {

          try audioEngine.start()

        } catch {

          print("Can't start the engine")



Step 5:

The developer needs to create a button action to perform audio recording and perform an action to analyze the recording.

@IBAction func speechRecognitionButtonClicked(_ sender: Any) {

        if audioEngine.isRunning {



speechRecognitionButton.isEnabled = false

            self.speechRecognitionButton.setTitle("Record", for: .normal)

        } else {


speechRecognitionButton.setTitle("Stop", for: .normal)



Audio Frequency, Amplitude and Pitch Detector

Developer can utilize the audio kit; there are many property, class and object to be used to detect frequency and amplitude for audio recognizer. Developer need to create object for microphone and capture the audio session. It will capture the audio, detect frequency, pitch signal with lower and upper bound.

//Creation of microphone object

let microphoneObject = AKMicrophone()

//Creation and Detection of frequency and amplitude from the

let frequencyDetetctor =  AKFrequencyfrequencyDetetctor()

//Finding the lower and upper bound of frequency mixer

let mixerNodeDetector = AKBooster()

AudioKit.output = microphoneObject


// Developer need to load inside viewDidLoad

viewDidLoad (){

frequencyDetetctor=AKFrequencyTracker.init(mic,minimumFrequency:200, maximumFrequency: 2000)

 mixerNodeDetector = AKBooster(frequencyDetetctor, gain: 0)



func findfrequency() {

    if frequencyDetetctor.amplitude > 0.1 {

        frequencyLabel.text = String(format: "%0.1f", frequencyDetetctor.frequency)

        var frequency = Float(frequencyDetetctor.frequency)

        while (frequency > Float(noteFrequencies[noteFrequencies.count-1])) {

            frequency = frequency / 2.0


        while (frequency < Float(noteFrequencies[0])) {

            frequency = frequency * 2.0


        var minDistance: Float = 10000.0

var index = 0

        for i in 0..<noteFrequencies.count {

            let distance = fabsf(Float(noteFrequencies[i]) - frequency)

            if (distance < minDistance){

                index = i

                minDistance = distance



        let octave = Int(log2f(Float(frequencyDetetctor.frequency) / frequency))

frequencyDetector.text = "\(noteNamesWithSharps[index])\(octave)"

amplitudeDetector.text = "\(noteNamesWithFlats[index])\(octave)"


    amplitudeLabel.text = String(format: "%0.2f", frequencyDetetctor.amplitude)