Understanding Log-Transformed Axes and Units in R for Data Analysis

Understanding Log-Transformed Axes and Units in R

The units package is a powerful tool for working with units in R, allowing users to easily incorporate unit information into their data frames and statistical analyses. However, when it comes to plotting variables with units on log-transformed axes, there are some challenges to overcome.

Background: Understanding the Units Package

Before we dive into solving this problem, let’s take a brief look at how the units package works in R. The package allows users to create data frames with unit information using the set_units() function. This function takes two arguments: the value and the unit.

For example, if we want to set the units of the dist column in the cars dataset to feet (ft), we would use the following code:

Distance = set_units(cars$dist, ft)

This will create a new data frame with the same values as the original dist column but with units information.

Plotting Variables with Units on Log-Transformed Axes

When we try to plot variables with units on log-transformed axes using base plot or ggplot2’s standard functions, we encounter an error. This is because the log transformation changes the unit of measurement for each data point.

For instance, if we try to plot Speed vs Distance with log="y" in base plot, we get an error message:

Error in Ops.units(y, 0) : 
  both operands of the expression should be "units" objects

This is because the log transformation requires a unit object as input, which cannot be provided by simply setting the log argument.

Similarly, when using ggplot2’s standard scale_y_log10() function, we also get an error:

# Error in Ops.units(y, 0) : 
#  both operands of the expression should be "units" objects

This is because the log transformation requires a unit object as input, which cannot be provided by simply setting the trans argument.

Using the scale_y_unit() Function

One solution to this problem is to use the scale_y_unit() function from the ggforce package. This function allows us to specify the unit of measurement for each data point on a logarithmic scale.

However, when using scale_y_unit(trans="log10"), we encounter another issue: the axis label becomes cryptic and displays units information instead of just the numbers.

# Y-axis is cryptically labelled with "Distance (lg(re 0.3048 m))"

Applying Log Transformation via coord_trans

A better approach to solving this problem is to apply the log transformation using coord_trans instead of relying on the scale_y_unit() function. This allows us to specify a custom logarithmic scale for each axis without affecting the unit information.

Here’s an example:

library(units)
library(ggplot2)
library(ggforce)
df = cars
df$Distance = set_units(df$dist, ft)
df$Speed = set_units(df$speed, mph)

qplot(x=Speed, y=Distance, data=df) +
  scale_y_unit() +
  coord_trans(y = "log10")

By using coord_trans with a custom logarithmic scale for the y-axis (y="log10"), we can achieve log-transformed axes without affecting the unit information. The output is:

Y-axis is simply labelled with the numbers instead of displaying units.

This approach provides a clean and readable solution to plotting variables with units on log-transformed axes.

Best Practices

When working with units in R, here are some best practices to keep in mind:

  • Always specify unit information using set_units() function.
  • When plotting variables with units, use coord_trans instead of relying on standard scale functions.
  • Use scale_y_unit() function with caution and only when necessary.

By following these guidelines and using the techniques outlined above, you can effectively work with log-transformed axes and unit information in R.

Conclusion

In this article, we explored the challenges of plotting variables with units on log-transformed axes in R. By understanding how to apply the log transformation using coord_trans, we can achieve clean and readable solutions that preserve unit information.

Remember to always specify unit information using set_units() function, use coord_trans instead of relying on standard scale functions, and use scale_y_unit() function with caution when necessary. With these techniques, you’ll be able to tackle even the most complex data analysis tasks with confidence.


Last modified on 2024-01-19