| Title: | Descriptive Statistics Functions for Numeric Data | 
| Version: | 0.1.2 | 
| Description: | Provides fundamental functions for descriptive statistics, including MODE(), estimate_mode(), center_stats(), position_stats(), pct(), spread_stats(), kurt(), skew(), and shape_stats(), which assist in summarizing the center, spread, and shape of numeric data. For more details, see McCurdy (2025), "Introduction to Data Science with R" https://jonmccurdy.github.io/Introduction-to-Data-Science/. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.3.2 | 
| Depends: | R (≥ 3.5) | 
| LazyData: | true | 
| Suggests: | roxygen2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-20 21:01:46 UTC; lukepapayoanou | 
| Author: | Luke Papayoanou [aut], Jon McCurdy [aut, cre] | 
| Maintainer: | Jon McCurdy <j.r.mccurdy@msmary.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-22 11:01:57 UTC | 
MSMU: Fundamental Data Functions Package
Description
The MSMU package provides core functions for descriptive statistics and exploratory data analysis. It includes functions for computing central tendency, spread, shape, and position statistics, along with utility functions for estimating modes and standardized ranges. The package contains
Functions
Datasets
Author(s)
Luke Papayoanou, Jon McCurdy
Find the Mode of a Numeric Vector
Description
Calculates the mode (most frequent value) of a numeric vector. If there is a tie, returns all values that share the highest frequency.
Usage
MODE(x)
Arguments
| x | A numeric vector. | 
Value
A numeric value (or vector) representing the mode(s) of x.
Examples
# Mode of a Numeric Vector
MODE(c(1,2,3,3,3,4,5,5,3,8))
# Mode of the number of cylinders in mtcars dataset
data("mtcars")
MODE(mtcars$cyl)
Professional baseball teams data
Description
This dataset contains historical performance and statistics for professional baseball teams across multiple seasons from 2000-2020.
Usage
baseball_teams
Format
A data frame with 630 rows and 12 columns:
- year
- Year (integer) 
- team_name
- Team (character) 
- games_played
- Number of games played (integer) 
- wins
- Number of wins (integer) 
- losses
- Number of losses (integer) 
- world_series
- World series winner that specific year (character) 
- runs_scored
- Number of total runs scored during season (integer) 
- hits
- Number of total hits during season (integer) 
- homeruns
- Number of total homeruns during season (integer) 
- earned_run_average
- Team earned run average per 9 innings (numeric) 
- fielding_percentage
- Team fielding percentage (numeric) 
- home_attendance
- Average home game attendance (integer) 
Source
Data retrieved from Lahmans Baseball Database with alterations made for educational purposes
College basketball data
Description
This dataset contains performance statistics for 363 men’s college basketball teams from the 2022-23 season.
Usage
basketball
Format
A data frame with 363 rows and 18 columns:
- School
- School (character) 
- State
- State (character) 
- W
- Wins (integer) 
- L
- Loss's (integer) 
- W.L.
- Win Loss percentage (numeric) 
- SRS
- Simple Rating System (numeric) 
- SOS
- Strength of Schedule (numeric) 
- Points.Scored
- Points scored (integer) 
- Points.Allowed
- Points allowed (integer) 
- FG.
- Team field goal percentage (numeric) 
- X3P.
- Three point percentage (numeric) 
- FT.
- Free throw percentage (numeric) 
- Rebounds
- Number of rebounds (integer) 
- AST
- Number of assists (integer) 
- STL
- Number of steals (integer) 
- Blocks
- Number of blocks (integer) 
- Turn.Overs
- Number of turn overs (integer) 
- Fouls
- Number of fouls (integer) 
Source
Data retrieved from Sports Reference with alterations made for educational purposes.
Summary of Central Tendency
Description
Computes a variety of center statistics for a numeric vector, including:
mean, median, trimmed means (10% and 25%), and estimated mode (via probability density function
using estimate_mode()).
Usage
center_stats(x)
Arguments
| x | A numeric vector. | 
Value
A named numeric vector with values for:
- mean
- Arithmetic mean 
- median
- Median 
- trim25
- 25% trimmed mean 
- trim10
- 10% trimmed mean 
- est_mode
- Estimated mode from - estimate_mode()
See Also
Examples
# Center Stats of continuous random data
set.seed(123)
x <- rnorm(1000, mean=50, sd=10)
center_stats(x)
# Center Stats of Sepal Length in iris data set
data("iris")
center_stats(iris$Sepal.Length)
Christmas data
Description
Santa's dataset, exploring if Santa gives children presents based a variety of variables!
Usage
christmas
Format
A data frame with 1000 rows and 15 columns:
- Gender
- Gender (character) 
- Toy_Count
- Number of toys (integer) 
- Chores_Completed
- Number of Chores completed (numeric) 
- Favorite_Color
- Childs Favorite color (character) 
- Helping_Hand
- Childs helping hand number/score (integer) 
- Complaints_Received
- Number of complaints child says (numeric) 
- Tantrum_Count
- Number of Tantrums child has (integer) 
- Rule_Breaks
- Number of rule breaking child does (numeric) 
- Sharing_Behavior
- Childs willingness to share (numeric) 
- Hours_of_Sleep
- Childs average hours of sleep per night (numeric) 
- Screen_Time
- Childs average hours of screen time (numeric) 
- School_Grade
- Childs school grade (numeric) 
- Parent_Presence
- Childs parent presence (numeric) 
- Greed_Score
- Santas numeric system for labeling childrens greed (numeric) 
- Outcome
- Whether a child gets a present or coal (character) 
Source
Santa
Class demographics
Description
A sample dataset representing demographic and academic information for 50 college students.
Usage
class_demographics
Format
A data frame with 50 rows and 6 columns:
- names
- Persons name (character) 
- ages
- Persons age (int) 
- state
- Persons state (character) 
- year
- Persons year in college (character) 
- majors
- Persons major (character) 
- sport
- Binary Sport, 1(yes) or 0(no) (integer) 
Source
Synthetic Data
College data
Description
This dataset provides detailed information on 777 U.S. colleges and universities from 1995, covering aspects of admissions, academics, finances, and student demographics.
Usage
college_data
Format
A data frame with 777 rows and 16 columns:
- Name
- College name (character) 
- Region
- US region (character) 
- Accept
- Acceptance (integer) 
- Enroll
- Enrollment (integer) 
- Top10perc
- Percent of students that were top 10 in highschool class (integer) 
- Top25perc
- Percent of students that were top 25 in highschool class (integer) 
- F.Undergrad
- Full time undergrad (integer) 
- P.Undergrad
- Part time undergrad (integer) 
- Outstate
- Number of Out of state students (integer) 
- Room.Board
- Annual room and board price (integer) 
- PhD
- Percentage of Faculty with a PhD (integer) 
- Terminal
- Percentage of Faculty with a terminal degree (integer) 
- S.F.Ratio
- Student Faculty ratio (numeric) 
- perc.alumni
- Percent of alumni who donate to the college (integer) 
- Expend
- Instructional expenditure per student (integer) 
- Grad.Rate
- Graduation Rate (integer) 
Source
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. Adapted from the College data set in the ISLR library with alterations made for educational purposes.
County data
Description
Data for 3142 counties in the United States containing demographic, educational, economic, and technological statistics.
Usage
county_data
Format
A data frame with 3142 rows and 17 columns:
- state
- State (character) 
- name
- County name (character) 
- fips
- County level FIPS code (integer) 
- pop
- County population (integer) 
- households
- Number of households (integer) 
- median_age
- Median age of people in county (numeric) 
- age_over_18
- Percent age of people over 18 (numeric) 
- age_over_65
- Percent age of people over 65 (numeric) 
- hs_grad
- Percent of highschool grads (numeric) 
- bachelors
- Percent of people with bachelors degrees (numeric) 
- white
- Percent of population that is white (numeric) 
- black
- Percent of population that is black (numeric) 
- hispanic
- Percent of population that is hispanic (numeric) 
- household_has_smartphone
- Percent of households who have a smartphone (numeric) 
- mean_household_income
- Average household income (integer) 
- median_household_income
- Median household income (integer) 
- unemployment_rate
- Unemployment rate (numeric) 
Source
Adapted from the county_complete data set in the usdata library with alterations made for educational purposes.
Course scores data
Description
This dataset contains academic performance records for 200 students across four years of high school, with scores or letter grades in English and Math.
Usage
course_scores
Format
A data frame with 200 rows and 10 columns:
- student
- Student ID (integer) 
- type
- Grade type (character) 
- Freshman_English
- Freshman English Score/letter grade (character) 
- Freshman_Math
- Freshman Math Score/letter grade (character) 
- Sophomore_English
- Sophomore English Score/letter grade (character) 
- Sophomore_Math
- Sophomore Math Score/letter grade (character) 
- Junior_English
- Junior English Score/letter grade (character) 
- Junior_Math
- Junior Math Score/letter grade (character) 
- Senior_English
- Senior English Score/letter grade (character) 
- Senior_Math
- Senior Math Score/letter grade (character) 
Source
Synthetic Data
Synthetic Census dataset
Description
A synthetic dataset containing demographic and socioeconomic information for 1,000 individuals.
Usage
data_210_census
Format
A data frame with 1000 rows and 5 columns:
- age
- Persons Age (integer) 
- gender
- Persons Gender (character) 
- degree
- Persons level of education (character) 
- salary
- Persons Yearly Salary (integer) 
- height
- Persons Height in inches (integer) 
Source
Synthetic Data
2020 election data
Description
Dataset providing detailed results from the 2020 U.S. presidential election at the county level.
Usage
election_2020
Format
A data frame with 32177 rows and 7 columns:
- state
- State (character) 
- state_ev
- State electoral votes (integer) 
- county
- County name (character) 
- candidate
- Candidate name (character) 
- party
- Candidate party (character) 
- total_votes
- Total number of votes (integer) 
- won
- True or false for the candidate to win the county (logical) 
Source
Data retrieved from MIT Election Data and Science Lab, 2018, "County Presidential Election Returns 2000-2020” with alterations made for educational purposes.
Estimate Mode using Density function to find Mode of continuous data
Description
Estimates the mode of a numeric vector by identifying the value corresponding to the peak of its estimated probability density function.
Usage
estimate_mode(x)
Arguments
| x | A numeric vector. Missing values ( | 
Value
A single numeric value representing the estimated mode.
Examples
# Estimate the mode of continuous random data
set.seed(123)
x <- rnorm(1000, mean=5, sd=2)
estimate_mode(x)
# Estimate the mode of miles-per-gallon (mpg) in the mtcars dataset
data("mtcars")
estimate_mode(mtcars$mpg)
Exam data
Description
Synthetic dataset containing academic performance and background information for 1,000 students.
Usage
exam_data
Format
A data frame with 1000 rows and 8 columns:
- gender
- Students gender (character) 
- race.ethnicity
- Students race/ethnicity (character) 
- parental.level.of.education
- Parents level of education (character) 
- lunch
- Students lunch plan (character) 
- test.preparation.course
- Student test prep level (character) 
- math.score
- Students math score (integer) 
- reading.score
- Students reading score (integer) 
- writing.score
- Students writing score (integer) 
Source
Data retrieved from roycekimmons generated data
Football/Quarterback data
Description
Dataset containing performance statistics for 106 football players who attempted a pass in the NFL for the 2022 season.
Usage
football
Format
A data frame with 106 rows and 17 columns:
- Player
- Players name (character) 
- Tm
- Players team (character) 
- Age
- Players Age (integer) 
- Pos
- Players position (character) 
- G
- Number of games (integer) 
- GS
- Number of games starting (integer) 
- Wins
- Number of wins (integer) 
- Cmp
- Number of completions (integer) 
- Att
- Number of throwing attempts (integer) 
- Cmp.
- Completion percentage (numeric) 
- Yds
- Number of yards thrown (integer) 
- TD
- Number of touchdowns (integer) 
- Int
- Number of interceptions thrown (integer) 
- Y.A
- Yards per Attempt (numeric) 
- Y.G
- Yards per Game (numeric) 
- Rate
- Passer rating (numeric) 
- QBR
- Total Quarterback Rating (numeric) 
Source
Data retrieved from Pro Football Reference with alterations made for educational purposes.
Heart data
Description
Dataset containing medical and diagnostic information for 303 patients, used to study the presence of Atherosclerotic Heart Disease (AHD).
Usage
heart
Format
A data frame with 303 rows and 14 columns:
- Age
- Patients age (integer) 
- Sex
- Patients Sex (1 = Male, 0 = Female) (integer) 
- ChestPain
- Chest pain type (character) 
- RestBP
- Resting blood pressure (in mm Hg on admission to the hospital) (integer) 
- Chol
- Serum cholesterol in mg/dl (integer) 
- Fbs
- fasting blood sugar > 120 mg/dl (1 = true; 0 = false) (integer) 
- RestECG
- Resting electrocardiographic results (integer) 
- MaxHR
- Maximum heart rate achieved (integer) 
- ExAng
- Exercise induced angina (1 = yes; 0 = no) (integer) 
- Oldpeak
- ST depression induced by exercise relative to rest (numeric) 
- Slope
- The slope of the peak exercise ST segment (integer) 
- Ca
- Number of major vessels (0-3) colored by fluoroscopy (integer) 
- Thal
- Thal condition (character) 
- AHD
- Atherosclerosis Heart Disease condition (character) 
Source
Data retrieved from UC Irvine Machine Learning Repository
Housing data
Description
Data on houses that were recently sold in the Duke Forest neighborhood of Durham, NC in November 2020.
Usage
housing_data
Format
A data frame with 98 rows and 6 columns:
- price
- Home price (numeric) 
- bed
- Number of bedrooms (integer) 
- bath
- Number of bathrooms (numeric) 
- area
- Square footage (integer) 
- year_built
- Date house was built (integer) 
- lot
- lot size (numeric) 
Source
Adapted from the duke_forest dataset in the openintro library with alterations made for educational purposes.
Income data
Description
Dataset containing basic demographic and financial information for 20 individuals.
Usage
income_data
Format
A data frame with 20 rows and 5 columns:
- ID
- ID (integer) 
- Ages
- age (integer) 
- Years_til_Retirement.65
- Years until retirement at 65 (integer) 
- Salary
- Salary (integer) 
- Birth_weight
- Birth weight (integer) 
Source
Synthetic Data
Compute Sample Kurtosis
Description
Calculates the kurtosis of a numeric vector. A value near 0 suggests normal kurtosis (mesokurtic), positive values indicate heavier tails (leptokurtic), and negative values indicate lighter tails (platykurtic).
Usage
kurt(x)
Arguments
| x | A numeric vector. | 
Details
The z-scores are computed as:
z_i = \frac{x_i - \bar{x}}{sd}
The kurtosis is then calculated as:
\text{Kurtosis} = \frac{1}{n} \sum_{i=1}^{n} z_i^4 - 3
Where:
-  \bar{x}is the mean ofx,
-  sdis the standard deviation ofx,
- and - nis the number of observations.
Value
A single numeric value representing the kurtosis
Examples
# Kurtosis of mpg in mtcars
data("mtcars")
kurt(mtcars$mpg)
Ledger data
Description
Dataset mimicking a ledger showing the price an item was bought and sold for, the date it occurred, and the color of the product.
Usage
ledger_data
Format
A data frame with 4 rows and 104 columns:
- color
- colors (character) 
- type
- age (integer) 
- Jan_08
- Price on date (numeric) 
- Jan_15
- Price on date (numeric) 
- Jan_16
- Price on date (numeric) 
- Jan_31
- Price on date (numeric) 
- Feb_02
- Price on date (numeric) 
- Feb_03
- Price on date (numeric) 
- Feb_04
- Price on date (numeric) 
- Feb_14
- Price on date (numeric) 
- Feb_20
- Price on date (numeric) 
- Feb_22
- Price on date (numeric) 
- Feb_25
- Price on date (numeric) 
- Feb_27
- Price on date (numeric) 
- Feb_28
- Price on date (numeric) 
- Mar_01
- Price on date (numeric) 
- Mar_05
- Price on date (numeric) 
- Mar_09
- Price on date (numeric) 
- Mar_12
- Price on date (numeric) 
- Mar_16
- Price on date (numeric) 
- Mar_20
- Price on date (numeric) 
- Mar_21
- Price on date (numeric) 
- Mar_22
- Price on date (numeric) 
- Mar_24
- Price on date (numeric) 
- Mar_27
- Price on date (numeric) 
- Mar_28
- Price on date (numeric) 
- Mar_31
- Price on date (numeric) 
- Apr_06
- Price on date (numeric) 
- Apr_08
- Price on date (numeric) 
- Apr_10
- Price on date (numeric) 
- Apr_18
- Price on date (numeric) 
- Apr_19
- Price on date (numeric) 
- Apr_24
- Price on date (numeric) 
- Apr_26
- Price on date (numeric) 
- Apr_29
- Price on date (numeric) 
- May_01
- Price on date (numeric) 
- May_04
- Price on date (numeric) 
- May_12
- Price on date (numeric) 
- May_17
- Price on date (numeric) 
- May_24
- Price on date (numeric) 
- May_25
- Price on date (numeric) 
- May_28
- Price on date (numeric) 
- Jun_01
- Price on date (numeric) 
- Jun_04
- Price on date (numeric) 
- Jun_11
- Price on date (numeric) 
- Jun_16
- Price on date (numeric) 
- Jun_25
- Price on date (numeric) 
- Jun_28
- Price on date (numeric) 
- Jul_03
- Price on date (numeric) 
- Jul_04
- Price on date (numeric) 
- Jul_08
- Price on date (numeric) 
- Jul_10
- Price on date (numeric) 
- Jul_11
- Price on date (numeric) 
- Jul_13
- Price on date (numeric) 
- Jul_18
- Price on date (numeric) 
- Jul_23
- Price on date (numeric) 
- Jul_25
- Price on date (numeric) 
- Aug_05
- Price on date (numeric) 
- Aug_12
- Price on date (numeric) 
- Aug_13
- Price on date (numeric) 
- Aug_24
- Price on date (numeric) 
- Aug_26
- Price on date (numeric) 
- Sep_02
- Price on date (numeric) 
- Sep_06
- Price on date (numeric) 
- Sep_07
- Price on date (numeric) 
- Sep_08
- Price on date (numeric) 
- Sep_16
- Price on date (numeric) 
- Sep_21
- Price on date (numeric) 
- Sep_22
- Price on date (numeric) 
- Sep_23
- Price on date (numeric) 
- Sep_27
- Price on date (numeric) 
- Oct_07
- Price on date (numeric) 
- Oct_09
- Price on date (numeric) 
- Oct_10
- Price on date (numeric) 
- Oct_15
- Price on date (numeric) 
- Oct_16
- Price on date (numeric) 
- Oct_17
- Price on date (numeric) 
- Oct_19
- Price on date (numeric) 
- Oct_20
- Price on date (numeric) 
- Oct_21
- Price on date (numeric) 
- Oct_22
- Price on date (numeric) 
- Oct_29
- Price on date (numeric) 
- Oct_30
- Price on date (numeric) 
- Oct_31
- Price on date (numeric) 
- Nov_03
- Price on date (numeric) 
- Nov_04
- Price on date (numeric) 
- Nov_12
- Price on date (numeric) 
- Nov_13
- Price on date (numeric) 
- Nov_14
- Price on date (numeric) 
- Nov_16
- Price on date (numeric) 
- Nov_18
- Price on date (numeric) 
- Nov_23
- Price on date (numeric) 
- Nov_24
- Price on date (numeric) 
- Dec_02
- Price on date (numeric) 
- Dec_03
- Price on date (numeric) 
- Dec_06
- Price on date (numeric) 
- Dec_11
- Price on date (numeric) 
- Dec_12
- Price on date (numeric) 
- Dec_13
- Price on date (numeric) 
- Dec_16
- Price on date (numeric) 
- Dec_17
- Price on date (numeric) 
- Dec_18
- Price on date (numeric) 
- Dec_19
- Price on date (numeric) 
- Dec_26
- Price on date (numeric) 
Source
Synthetic Data
MLB data
Description
Batter statistics for 2018 Major League Baseball season
Usage
mlb_eda
Format
A data frame with 1270 rows and 13 columns:
- name
- Players name (character) 
- team
- Players team (character) 
- position
- Players position (character) 
- games
- Number of games (integer) 
- AB
- Number of at bats (integer) 
- R
- Number of runs (integer) 
- H
- Number of hits (integer) 
- doubles
- Number of doubles (integer) 
- HR
- Number of Home runs (integer) 
- RBI
- Number of Runs Batted In (integer) 
- AVG
- Players batting average (numeric) 
- SLG
- Players Slugging percentage (numeric) 
- OPS
- Players On-base Plus Slugging (numeric) 
Source
Data retrieved from MLB, with alterations made for educational purposes.
Mount St.Mary's dorm data
Description
Dataset summarizing the distribution of male and female students across various dormitories at Mount College, categorized by academic year.
Usage
mount_dorms
Format
A data frame with 4 rows and 11 columns:
- year
- Students year (character) 
- m_Pangborn
- Males living in Pangborn (integer) 
- m_Sheridan
- Males living in Sheridan (integer) 
- m_Terrace
- Males living in Terrace (integer) 
- m_Powell
- Males living in Powell (integer) 
- m_Towers
- Males living in the Towers (integer) 
- f_Pangborn
- Females living in Pangborn (integer) 
- f_Sheridan
- Females living in Sheridan (integer) 
- f_Terrace
- Females living in Terrace (integer) 
- f_Powell
- Females living in Powell (integer) 
- f_Towers
- Females living in the Towers (integer) 
Source
Synthetic Data
Percent Within N Standard Deviations of the Mean
Description
Calculates the percentage of values in a numeric vector that fall within
n standard deviations of the mean.
Usage
pct(x, n)
Arguments
| x | A numeric vector. | 
| n | A positive numeric value indicating how many standard deviations from the mean to use as bounds. | 
Value
A single numeric value representing the percentage (0–100) of values within the specified range.
Examples
# Percentage of values that fall within 2 sds of the mean in random normal data
set.seed(123)
x <- rnorm(1000)
pct(x,2)
# Percentage of values that fall within 2 sds of the mean in iris Sepal Lengths
data("iris")
pct(iris$Sepal.Length, 2)
Computes Position Statics, Quintiles and Quartiles
Description
Calculates the quintiles, including quartiles(data is split in 4 equal parts) and quintiles(data is split in 5 equal parts) of a numeric vector using the 'quantile()' function. NA's are removed.
Usage
position_stats(x)
Arguments
| x | A numeric vector. | 
Details
Percentiles are values that divide a dataset into 100 equal parts, each representing 1% of the distribution. For example, the 25th percentile is the value below which 25% of the data fall.
Quartiles are special percentiles that divide the data into four equal groups: Q1 (25th percentile), Q2 (50th percentile or median), Q3 (75th percentile).
Quintiles divide data into five equal groups, each representing 20% of the distribution: 20th percentile, 40th, 60th, 80th percentiles split the data into quintiles.
Value
A list with two elements:
- quint
- Numeric vector of quintiles (0%, 20%, 40%, ..., 100%) 
- quart
- Numeric vector of quartiles (0%, 25%, 50%, 75%, 100%) 
Examples
# Position stats of random data
set.seed(123)
x <- rnorm(1000)
position_stats(x)
# Position stats of MPG in mtcars data set
data("mtcars")
position_stats(mtcars$mpg)
Reaction Data
Description
This dataset contains synthetic reaction time measurements for 100 individuals under different conditions.
Usage
reaction_time
Format
A data frame with 100 rows and 6 columns:
- person
- Person id (integer) 
- color
- color (character) 
- left
- left (numeric) 
- right
- right (numeric) 
- age
- Person age (numeric) 
- gender
- Person gender (character) 
Source
Synthetic Data
Computes Sample Skew and Kurtosis
Description
Calculates the skewness of a numeric vector (via skew()).
A positive value indicates right skew (long right tail), while a negative value
indicates left skew (long left tail). A zero value represents symmetry.
Calculates the kurtosis of a numeric vector (via kurt()).
A value near 0 suggests normal kurtosis (mesokurtic),
positive values indicate heavier tails (leptokurtic), and negative
values indicate lighter tails (platykurtic).
Usage
shape_stats(x)
Arguments
| x | A numeric vector. | 
Value
A list with two elements:
- skew
- Skew of Data from - skew()
- kurt
- Kurtosis of Data from - kurt()
Examples
# Shape stats of mpg in mtcars
data("mtcars")
shape_stats(mtcars$mpg)
Compute Sample Skewness
Description
Calculates the skewness of a numeric vector. A positive value indicates right skew (long right tail), while a negative value indicates left skew (long left tail). A zero value represents symmetry
Usage
skew(x)
Arguments
| x | A numeric vector. | 
Value
A single numeric value representing the skewness of the distribution.
Examples
# Skew of Sepal Lengths in iris
data("iris")
skew(iris$Sepal.Length)
Historic soccer data
Description
This dataset contains historical match results from various international soccer games between different countries for the years 1872-2024.
Usage
soccer
Format
A data frame with 13750 rows and 5 columns:
- date
- Date of match (character) 
- home_team
- Home team name (character) 
- away_team
- Away team name (character) 
- home_score
- Home teams goal count (integer) 
- away_score
- Away teams goal count (integer) 
Source
Data retrieved from Kaggle International football results dataset with alterations made for educational purposes.
Summary of Spread Statistics
Description
Computes a variety of spread statistics for a numeric vector, including:
standard deviation, iqr, the normalized minimum, maximum,
and range as well as the percentage of data within 1, 2,
and 3 standard deviations (via pct())
Usage
spread_stats(x)
Arguments
| x | A numeric vector | 
Value
- sd
- Standard Deviation 
- iqr
- Inter Quartile Range 
- minz
- Normalized Minimum 
- maxz
- Normalized Maximum 
- diffz
- Normalized Range 
- pct1
- Percent of data within 1 standard deviation from - pct()
- pct2
- Percent of data within 2 standard deviation from - pct()
- pct3
- Percent of data within 3 standard deviation from - pct()
See Also
Examples
# Spread stats of random normal data
set.seed(123)
x <- rnorm(1000)
spread_stats(x)
# Spread stats of mpg in mtcars
data("mtcars")
spread_stats(mtcars$mpg)