Skip to contents

It provides a method for predicting the new word given a set of previous words. It also provides a method for calculating the Perplexity score for a set of words. Furthermore it provides a method for calculating the probability of a given word and set of previous words.

Super class

wordpredictor::Base -> ModelPredictor

Methods

Inherited methods


    Method new()

    It initializes the current object. It is used to set the model file name and verbose options.

    Usage

    ModelPredictor$new(mf, ve = 0)

    Arguments

    mf

    The model file name.

    ve

    The level of detail in the information messages.


    Method get_model()

    Returns the Model class object.

    Usage

    ModelPredictor$get_model()

    Returns

    The Model class object is returned.


    Method calc_perplexity()

    The Perplexity for the given sentence is calculated. For each word, the probability of the word given the previous words is calculated. The probabilities are multiplied and then inverted. The nth root of the result is the perplexity, where n is the number of words in the sentence. If the stem_words tokenization option was specified when creating the given model file, then the previous words are converted to their stems.

    Usage

    ModelPredictor$calc_perplexity(words)

    Arguments

    words

    The list of words.

    Returns

    The perplexity of the given list of words.

    Examples

    # Start of environment setup code
    # The level of detail in the information messages
    ve <- 0
    # The name of the folder that will contain all the files. It will be
    # created in the current directory. NULL implies tempdir will be used
    fn <- NULL
    # The required files. They are default files that are part of the
    # package
    rf <- c("def-model.RDS")
    # An object of class EnvManager is created
    em <- EnvManager$new(ve = ve, rp = "./")
    # The required files are downloaded
    ed <- em$setup_env(rf, fn)
    # End of environment setup code
    
    # The model file name
    mfn <- paste0(ed, "/def-model.RDS")
    # ModelPredictor class object is created
    mp <- ModelPredictor$new(mf = mfn, ve = ve)
    # The sentence whoose Perplexity is to be calculated
    l <- "last year at this time i was preparing for a trip to rome"
    # The line is split in to words
    w <- strsplit(l, " ")[[1]]
    # The Perplexity of the sentence is calculated
    p <- mp$calc_perplexity(w)
    # The sentence Perplexity is printed
    print(p)
    # The test environment is removed. Comment the below line, so the
    # files generated by the function can be viewed
    em$td_env()


    Method predict_word()

    Predicts the next word given a list of previous words. It checks the last n previous words in the transition probabilities data, where n is equal to 1 - n-gram size of model. If there is a match, the top 3 next words with highest probabilities are returned. If there is no match, then the last n-1 previous words are checked. This process is continued until the last word is checked. If there is no match, then empty result is returned. The given words may optionally be stemmed.

    Usage

    ModelPredictor$predict_word(words, count = 3, dc = NULL)

    Arguments

    words

    A character vector of previous words or a single vector containing the previous word text.

    count

    The number of results to return.

    dc

    A DataCleaner object. If it is given, then the given words

    Returns

    The top 3 predicted words along with their probabilities.

    Examples

    # Start of environment setup code
    # The level of detail in the information messages
    ve <- 0
    # The name of the folder that will contain all the files. It will be
    # created in the current directory. NULL implies tempdir will be used
    fn <- NULL
    # The required files. They are default files that are part of the
    # package
    rf <- c("def-model.RDS")
    # An object of class EnvManager is created
    em <- EnvManager$new(ve = ve, "rp" = "./")
    # The required files are downloaded
    ed <- em$setup_env(rf, fn)
    # End of environment setup code
    
    # The model file name
    mfn <- paste0(ed, "/def-model.RDS")
    # ModelPredictor class object is created
    mp <- ModelPredictor$new(mf = mfn, ve = ve)
    # The next word is predicted
    nws <- mp$predict_word("today is", count = 10)
    # The predicted next words are printed
    print(nws)
    
    # The test environment is removed. Comment the below line, so the
    # files generated by the function can be viewed
    em$td_env()


    Method get_word_prob()

    Calculates the probability of the given word given the previous words. The last n words are converted to numeric hash using digest2int function. All other words are ignored. n is equal to 1 - size of the n-gram model. The hash is looked up in a data frame of transition probabilities. The last word is converted to a number by checking its position in a list of unique words. If the hash and the word position were found, then the probability of the previous word and hash is returned. If it was not found, then the hash of the n-1 previous words is taken and the processed is repeated. If the data was not found in the data frame, then the word probability is returned. This is known as back-off. If the word probability could not be found then the default probability is returned. The default probability is calculated as 1/(N+V), Where N = number of words in corpus and V is the number of dictionary words.

    Usage

    ModelPredictor$get_word_prob(word, pw)

    Arguments

    word

    The word whose probability is to be calculated.

    pw

    The previous words.

    Returns

    The probability of the word given the previous words.

    Examples

    # Start of environment setup code
    # The level of detail in the information messages
    ve <- 0
    # The name of the folder that will contain all the files. It will be
    # created in the current directory. NULL implies tempdir will be used
    fn <- NULL
    # The required files. They are default files that are part of the
    # package
    rf <- c("def-model.RDS")
    # An object of class EnvManager is created
    em <- EnvManager$new(ve = ve, "rp" = "./")
    # The required files are downloaded
    ed <- em$setup_env(rf, fn)
    # End of environment setup code
    
    # The model file name
    mfn <- paste0(ed, "/def-model.RDS")
    # ModelPredictor class object is created
    mp <- ModelPredictor$new(mf = mfn, ve = ve)
    # The probability that the next word is "you" given the prev words
    # "how" and "are"
    prob <- mp$get_word_prob(word = "you", pw = c("how", "are"))
    # The probability is printed
    print(prob)
    
    # The test environment is removed. Comment the below line, so the
    # files generated by the function can be viewed
    em$td_env()


    Method clone()

    The objects of this class are cloneable with this method.

    Usage

    ModelPredictor$clone(deep = FALSE)

    Arguments

    deep

    Whether to make a deep clone.

    Examples

    
    ## ------------------------------------------------
    ## Method `ModelPredictor$calc_perplexity`
    ## ------------------------------------------------
    
    # Start of environment setup code
    # The level of detail in the information messages
    ve <- 0
    # The name of the folder that will contain all the files. It will be
    # created in the current directory. NULL implies tempdir will be used
    fn <- NULL
    # The required files. They are default files that are part of the
    # package
    rf <- c("def-model.RDS")
    # An object of class EnvManager is created
    em <- EnvManager$new(ve = ve, rp = "./")
    # The required files are downloaded
    ed <- em$setup_env(rf, fn)
    # End of environment setup code
    
    # The model file name
    mfn <- paste0(ed, "/def-model.RDS")
    # ModelPredictor class object is created
    mp <- ModelPredictor$new(mf = mfn, ve = ve)
    # The sentence whoose Perplexity is to be calculated
    l <- "last year at this time i was preparing for a trip to rome"
    # The line is split in to words
    w <- strsplit(l, " ")[[1]]
    # The Perplexity of the sentence is calculated
    p <- mp$calc_perplexity(w)
    # The sentence Perplexity is printed
    print(p)
    #> [1] 1767
    # The test environment is removed. Comment the below line, so the
    # files generated by the function can be viewed
    em$td_env()
    
    ## ------------------------------------------------
    ## Method `ModelPredictor$predict_word`
    ## ------------------------------------------------
    
    # Start of environment setup code
    # The level of detail in the information messages
    ve <- 0
    # The name of the folder that will contain all the files. It will be
    # created in the current directory. NULL implies tempdir will be used
    fn <- NULL
    # The required files. They are default files that are part of the
    # package
    rf <- c("def-model.RDS")
    # An object of class EnvManager is created
    em <- EnvManager$new(ve = ve, "rp" = "./")
    # The required files are downloaded
    ed <- em$setup_env(rf, fn)
    # End of environment setup code
    
    # The model file name
    mfn <- paste0(ed, "/def-model.RDS")
    # ModelPredictor class object is created
    mp <- ModelPredictor$new(mf = mfn, ve = ve)
    # The next word is predicted
    nws <- mp$predict_word("today is", count = 10)
    # The predicted next words are printed
    print(nws)
    #> $found
    #> [1] TRUE
    #> 
    #> $words
    #>  [1] "a"      "the"    "used"   "better" "hard"   "rare"   "to"     "best"  
    #>  [9] "carved" "dry"   
    #> 
    #> $probs
    #>  [1] 0.17460317 0.11111111 0.06349206 0.03174603 0.03174603 0.03174603
    #>  [7] 0.03174603 0.01587302 0.01587302 0.01587302
    #> 
    
    # The test environment is removed. Comment the below line, so the
    # files generated by the function can be viewed
    em$td_env()
    
    ## ------------------------------------------------
    ## Method `ModelPredictor$get_word_prob`
    ## ------------------------------------------------
    
    # Start of environment setup code
    # The level of detail in the information messages
    ve <- 0
    # The name of the folder that will contain all the files. It will be
    # created in the current directory. NULL implies tempdir will be used
    fn <- NULL
    # The required files. They are default files that are part of the
    # package
    rf <- c("def-model.RDS")
    # An object of class EnvManager is created
    em <- EnvManager$new(ve = ve, "rp" = "./")
    # The required files are downloaded
    ed <- em$setup_env(rf, fn)
    # End of environment setup code
    
    # The model file name
    mfn <- paste0(ed, "/def-model.RDS")
    # ModelPredictor class object is created
    mp <- ModelPredictor$new(mf = mfn, ve = ve)
    # The probability that the next word is "you" given the prev words
    # "how" and "are"
    prob <- mp$get_word_prob(word = "you", pw = c("how", "are"))
    # The probability is printed
    print(prob)
    #> [1] 0.0024581
    
    # The test environment is removed. Comment the below line, so the
    # files generated by the function can be viewed
    em$td_env()