Introduction to Data Analysis with R and Dplyr
In this article, we will explore how to analyze data using the popular programming language R and the dplyr library. We will use an example dataset to demonstrate various techniques for filtering, grouping, and aggregating data.
Installing and Loading Required Libraries
Before we begin, make sure you have the necessary libraries installed. You can install them using the following commands:
# Install required libraries
install.packages("dplyr")
# Load the dplyr library
library(dplyr)
Creating a Sample Dataset
We will create a sample dataset to demonstrate our techniques.
# Create a sample dataset
data <- data.frame(
Stock = c("Pfizer", "Yahoo", "Facebook", "Amazon", "Merck", "Ford"),
Date = c("18-Aug-2009", "19-Aug-2012", "20-Aug-2014", "21-Aug-2014", "22-Aug-2005", "23-Aug-2003"),
Price = c(18.8, 27.1, 77.14, 683.66, 22.9, 20.1)
)
# Print the dataset
print(data)
Data Preprocessing
In this section, we will demonstrate how to extract the year from the date column and filter the data based on a specific time frame.
# Extract the year from the Date column
data$Year <- as.integer(substr(data$Date, start = 8, 12))
# Filter the data to include only years within 10 years of year_zero (2010)
year_zero <- 2010
data_filtered <- data[data$Year >= year_zero - 10 & data$Year <= year_zero + 10, ]
# Print the filtered dataset
print(data_filtered)
Grouping and Aggregating Data
In this section, we will demonstrate how to group the data by stock and calculate the maximum price for each year.
# Group the data by Stock and Year, and calculate the Maximum Price
data_grouped <- data_filtered %>%
group_by(Stock, Year) %>%
summarise(Max_Price = max(Price)) %>%
ungroup()
# Print the grouped dataset
print(data_grouped)
Converting to Wide Format
In this section, we will demonstrate how to convert the grouped data to wide format.
# Convert the data to wide format (fill missing values with NA)
data_wide <- data_grouped %>%
select(Stock) %>%
mutate(across(Year, function(x) ifelse(is.na(x), NA, x)))
# Print the wide dataset
print(data_wide)
Conclusion
In this article, we demonstrated how to analyze data using R and dplyr. We created a sample dataset, performed data preprocessing techniques, grouped and aggregated data, and converted it to wide format. These are just some of the many techniques available for data analysis in R.
By following these steps and practicing with your own datasets, you will be able to tackle more complex data analysis tasks with confidence. Remember to always explore and visualize your data before attempting to analyze it, as this can help you identify patterns and relationships that may not be immediately apparent.
Last modified on 2023-09-24