Live in the future, then build what's missing!

think stats note 3 cdf

maker /
categories | book 
tags | probability  statistics  python 

Cumulative distribution functions

pmf paradox

data oversampled selection bias

Percentiles Rank

def PercentileRank(scores, your_score):
    count = 0
    for score in scores:
        if score <= your_score:
            count += 1
    
    percentile_rank = 100.0 * count / len(scores)
    return percentile_rank

Percentiles

def Percentile(scores, percentile_rank):
    scores.sort()
    for score in scores:
        if PercentileRank(scores, score) >= percentile_rank:
            return score

Cumulative distribution functions

def Cdf(t, x):
    count = 0.0
    for value in t:
        if value <= x:
            count += 1.0
    
    prob = count / len(t)
    return prob

Prob(x): Given a value x, computes the probability p = CDF(x).

Value(p): Given a probability p, computes the corresponding value, x; that is, the inverse CDF of p.

Conditional distributions

A conditional distribution is the distribution of a subset of the data which is selected according to a condition.

Random numbers

CDFs are useful for generating random numbers with a given distribution. Here’s how: Choose a random probability in the range 0–1. Use Cdf.Value to find the value in the distribution that corresponds to the probability you chose.

resampling

The process of generating a random sample from a distribution that was computed from a sample.

In Python, sampling with replacement can be implemented with random.random to choose a percentile rank random.choice to choose an element from a sequence.

Sampling without replacement is provided by random.sample.

The numbers generated by random.random are supposed to be uniform between 0 and 1.

Summary statistics

The median is just the 50th percentile.

The 25th and 75th percentiles are often used to check whether a distribution is symmetric.

Their difference, which is called the interquartile range, measures the spread.


Previous     Next