Data Wrangling

Getting Data into Useable Formats
Author

W. Joel Schneider

Published

March 1, 2024

Getting Used to Working with Data in R

In programs like SPSS, the data seem to live in a spreadsheet that is always open and available to you. In R, you can see the data whenever you want, but usually it sits unseen inside a variable. It can be disconcerting at first, but think of your data as living in a file somewhere on your hard drive (or online!), and then the data just come for a short visit in R.

The major benefit of working with data in R is that all of the changes, transformations, and restructuring happens in code—which can be recreated at any time. There is usually no need to “save” the data after you have transformed it. The next time you work with the data, you just run your code and all the calculations will transform the data exactly the same way as before.

Why does this matter? If you feel the need to save your data all the time, you end up having multiple copies of it: data.sav, data_restructured.sav, data_new.sav, data_new_final.sav, fixed_data_new_final.sav, restructued_final_with_missing_cases_removed.sav, and so forth and so on. It can be hard to figure out where to start the next time you work with your data. You might not remember which version has errors and which version has what you need.

In general, start with a completely raw data file (one that is exactly the way you started). Resist the temptation to make any changes to it directly. If you must, save a pristine copy somewhere and only then change it. Import your data and make all changes to it with code. One benefit of doing so is that the code documents any changes you would otherwise need to record in your lab notebook.

All the code in one place:

I am going to walk through these steps. You can run this all at once to make sure it works. Then I will explain it step by step.

# Load packages
library(tidyverse)

# Import data
d <- read_csv("https://github.com/wjschne/EDUC5529/raw/master/on_task_multiple_baseline.csv")

# View data
d
glimpse(d)
View(d)

# Filtering

## Cases with fewer than 8 baseline sessions
d %>% 
  filter(baseline_sessions_n < 8)  %>% 
  arrange(-baseline_sessions_n)


## Students with `id` equal to 7
d %>% 
  filter(id == 7)

## Students with scores above 10 on `time_1` AND `time_2`:
d %>% 
  filter(time_1 > 10 & time_2 > 10)

## Students with scores above 10 on `time_1` OR `time_2`:
d %>% 
  filter(time_1 > 10 | time_2 > 10)

# Selecting columns
## Select the id and time_1 variables
d %>% 
  select(id, time_1)

## Select 5 adjacent variables: time_1 throught time_5
d %>% 
  select(time_1:time_5)

## Remove the id variable
d %>% 
  select(-id)

## Select all variables that start with "time" 
d %>% 
  select(starts_with("time"))

# Select all variables end with 0:

d %>% 
  select(ends_with("0"))

# Select all variables that contain "_2"
d %>% 
  select(contains("_2"))

# Renaming variables
## Rename id to become student_id, rename baseline_sessions_n to baseline
d_new <- d %>% 
  rename(student_id = id, 
         baseline = baseline_sessions_n) 

## Rename rename all variables that start with "time_" 
## so that they start with "T_" instead
d %>% 
  rename_with(str_replace, pattern = "time_", replace = "T_")

# Make small data
# Select variables from id to time_3 and filter rows so that 
# id is less than 3 (not inclusive)
d_small <- d %>% 
  select(id:time_3) %>% 
  filter(id < 3)

d_small

# Restructuring from wide to long format:
d_small_longer <- d_small %>%
  pivot_longer(cols = time_1:time_3,
               names_to = "time",
               values_to = "on_task",
               names_prefix = "time_", 
               names_transform = list(time = as.integer)) 

# Restructing from long to wide format:
d_small_longer %>% 
  pivot_wider(names_from = time, 
              values_from = on_task, 
              names_prefix = "time_")


# Restructure whole data from wide to long and create new variables
d_longer <- d %>%
  pivot_longer(cols = time_1:time_60,
               names_to = "time",
               values_to = "on_task",
               names_prefix = "time_", 
               names_transform = list(time = as.integer)) %>% 
    mutate(phase = case_when(
    time <= baseline_sessions_n ~ "Pre-Intervention",
    time <= baseline_sessions_n + intervention_sessions_n ~ "Intervention",
    time > baseline_sessions_n + intervention_sessions_n ~ "Post-Intervention" 
  )) %>% 
  mutate(phase = fct_inorder(phase))

# Plot
d_longer %>% 
  mutate(id = factor(id) %>% fct_reorder(baseline_sessions_n)) %>% 
  ggplot(aes(time, on_task)) +
  geom_line(aes(group = id, color = phase)) + 
  facet_grid(rows = vars(id))

Importing Data from a file

For this task we will use a file that lives here on the web:

on_task_multiple_baseline.csv

You can download it if you want, but you do not need to. You can import it over the web right into R.

We have a .csv file, so we will use the tidyverse function read_csv, from the readr package, which can be loaded with the tidyverse package:

library(tidyverse)
d <- read_csv("https://github.com/wjschne/EDUC5529/raw/master/on_task_multiple_baseline.csv")
Rows: 8 Columns: 63
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (63): id, baseline_sessions_n, intervention_sessions_n, time_1, time_2, ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Note that we have to assign the data to a variable—d in this case. Otherwise, the data will spill into the console, but R will not remember the data.

Inspecting Data

The read_csv function imports a special data structure called a tibble which is a variation on the traditional data.frame structure. It has some useful defaults that make working with it more predictable and practical.

If the data is too big to print in your console, the tibble will just show the first few rows, the first few columns, and then the remaining column names. The output also says which kind of data is in each column and how many rows and columns are in the data.

Whoa! That dataset has a lot of variables! When the there are more variables than rows, we say that the data is “wide.”

An alternate way to view the data is to use the glimpse function:

glimpse(d)

If you really need to see all the data, use the View function:

View(d)
d
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
1 22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
2 18 19 9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
3 5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
4 32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
5 19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
6 24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
7 10 19 9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
8 13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

Study Design

This dataset has 8 children, one in each row. Each child was given an intervention designed to increase “time on task” in the classroom so that the child will stay focused and complete his or her work. The study has a multiple baseline design. That is, each child was observed for a “baseline” period for a number days before the intervention was implemented. Each child’s baseline lasted a different length of time. Some children were observed for a short time before the intervention, and some were observed much longer. After the intervention ended, each child was observed for a number of days to see if the intervention’s effect diminished.

The id column is just a number to identify each child. The baseline_sessions_n variable indicates how many days the child was observed before the intervention began. The intervention_sessions_n variable indications the number days the intervention lasted. The remainding variables time_1time_60 are the number of minutes the child stayed on task during the observation period each day for 60 days.

Tidy Data

You might think that the data is ready to be analyzed, but it isn’t. The data might be perfectly structured for data entry, but it is not quite “tidy” yet.

The phrase tidy data was popularized by Hadley Wickham, but it mostly refers to something a little less catchy called database normalization. In data analysis, we want:

  1. Each variable must have its own column.
  2. Each observation must have its own row.
  3. Each value must have its own cell.

—From Wickham & Grolemund (2017):

This is how R likes its data. Whenever you find yourself fighting with confusing or tedious tasks in R, there is a good chance your data has violated one of these rules. Converting your data to this “tidy” format will likely simplify your task.

In the current data, there are two primary sources of untidiness.

  • We need a single variable to indicate time on task. Unfortunately, the time on task variable we would like to have is currently spread across 60 variables, time_1time_60.
  • We also would like a variable called “time” to indicate which day the observation took place. Right now, the time variable is lurking in the column names time_1time_60

Later we need to identify which treatment phase the observation was conducted in, “pre-intervention,” “intervention,” or “post-intervention.” That information is sitting in an unusual form right right now in the baseline_sessions_n and intervention_session_n variables.

Data Filtering

The filter function selects rows that meet certain conditions. For example, suppose we want to see rows with baseline_sessions_n less than 8:

The %>% is the “pipe” function. It means “and then.” It inserts the output of the previous function into the next function. If we did not use the pipe, we would type filter(d, baseline_sessions_n < 8) instead.

d %>% 
  filter(baseline_sessions_n < 8)
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
3 5 18 7.243792 14.30913 7.978843 11.17536 12.28703 13.14583 11.69432 18.99836 16.12554 20.29644 21.90515 21.95235 27.11428 27.93252 27.44784 30.74687 36.68505 37.33688 39.19731 43.8447 44.75169 43.35755 46.34139 47.00592 44.49681 46.50658 44.03419 47.01678 48.69067 43.64807 47.70016 49.47745 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.6021 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.2224 44.1629 49.38115 44.99909 46.10792 47.08795 44.91932

In this example, there is little advantage to using the pipe because there is only one step. However, when many steps are strung together, the pipe makes the code much easier to understand.

The double equal sign == performs a test to see if two things are the same. Forgetting that you need two equal signs is a really common error.

Students with id equal to 7:

d %>% 
  filter(id == 7)
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
7 10 19 9.695231 9.174433 10.63656 9.135522 7.362022 14.26667 9.447716 12.37797 6.0931 5.853903 9.539306 14.26216 13.72753 19.42458 20.66941 22.57634 26.74146 24.72857 28.85244 30.76006 31.62791 34.36055 36.09521 40.9182 39.43172 41.13707 43.04933 46.76154 46.19931 48.76898 47.36646 49.47742 52.47127 49.36562 44.5436 48.07476 46.38418 47.661 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.4419 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.7573

You can filter for more than one condition.

Students with scores above 10 on time_1 AND time_2:

d %>% 
  filter(time_1 > 10 & time_2 > 10)
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
1 22 20 11.48139 14.40148 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.88841 7.225326 11.47218 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
4 32 20 11.29658 14.30737 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.80403 11.591157 10.51906 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
8 13 20 10.59603 11.74640 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.57527 22.594344 25.97106 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

Students with scores above 10 on time_1 OR time_2:

d %>% 
  filter(time_1 > 10 | time_2 > 10)
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
1 22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
3 5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
4 32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
5 19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
6 24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
8 13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

Column selection

We can select columns by naming just the columns we want:

d %>% 
  select(id, time_1)
id time_1
1 11.481390
2 9.114530
3 7.243792
4 11.296579
5 10.667172
6 13.027050
7 9.695231
8 10.596034

If the columns we want are adjacent, we can use the : operator to select a sequence:

d %>% 
  select(time_1:time_5)
time_1 time_2 time_3 time_4 time_5
11.481390 14.401475 6.150868 9.254059 9.579971
9.114530 8.586483 11.009756 9.355582 8.094828
7.243792 14.309134 7.978843 11.175362 12.287030
11.296579 14.307367 14.810658 8.677719 9.533368
10.667172 6.803895 8.303144 14.497614 6.867806
13.027050 7.471434 13.679538 5.768483 10.788906
9.695231 9.174433 10.636559 9.135522 7.362022
10.596034 11.746395 10.903057 11.402866 8.263436

If want everything but a particular column (or columns), use the - operator:

d %>% 
  select(-id)
baseline_sessions_n intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
18 19 9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
10 19 9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

If columns have similar names, we can use the starts_with, ends_width, or contains functions.

Select all variables that start with the phrase time:

d %>% 
  select(starts_with("time"))
time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

Select all variables end with 0:

d %>% 
  select(ends_with("0"))
time_10 time_20 time_30 time_40 time_50 time_60
10.838353 11.472184 30.186799 44.60556 48.72249 50.03028
7.590044 14.825195 34.848345 47.54405 43.74299 50.34650
20.296444 43.844702 43.648066 47.10333 47.05461 44.91932
8.243799 10.519060 8.552978 27.08285 45.50657 50.34281
14.304808 14.449422 31.426959 50.28782 47.00786 50.76616
9.966903 9.089913 21.528074 41.40063 48.08299 48.73745
5.853903 30.760060 48.768979 49.09778 45.44424 52.75730
10.105379 25.971062 42.503742 48.39759 45.58281 50.32172

Select all variables that contain the phrase “_2”

d %>% 
  select(contains("_2"))
time_2 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29
14.401475 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730
8.586483 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382
14.309134 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672
14.307367 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314
6.803895 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100
7.471434 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091
9.174433 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305
11.746395 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969

The - operator works with these selection functions to exclude variables:

d %>% 
  select(-contains("time_"))
id baseline_sessions_n intervention_sessions_n
1 22 20
2 18 19
3 5 18
4 32 20
5 19 20
6 24 20
7 10 19
8 13 20

Renaming variables

d %>% 
  rename(student_id = id, 
         baseline = baseline_sessions_n)
student_id baseline intervention_sessions_n time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
1 22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
2 18 19 9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
3 5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
4 32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
5 19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
6 24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
7 10 19 9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
8 13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

We can rename many columns at once with a function. The str_replace function usually works like this:

x <- "time_1"
str_replace(x, pattern = "time_", replace = "Time ")
[1] "Time 1"

The rename_with is a function that applies a function to all variable names at once. It can take that function’s arguments, too. For example,

d %>% rename_with(str_replace, pattern = "time_", replace = "Time ")
id baseline_sessions_n intervention_sessions_n Time 1 Time 2 Time 3 Time 4 Time 5 Time 6 Time 7 Time 8 Time 9 Time 10 Time 11 Time 12 Time 13 Time 14 Time 15 Time 16 Time 17 Time 18 Time 19 Time 20 Time 21 Time 22 Time 23 Time 24 Time 25 Time 26 Time 27 Time 28 Time 29 Time 30 Time 31 Time 32 Time 33 Time 34 Time 35 Time 36 Time 37 Time 38 Time 39 Time 40 Time 41 Time 42 Time 43 Time 44 Time 45 Time 46 Time 47 Time 48 Time 49 Time 50 Time 51 Time 52 Time 53 Time 54 Time 55 Time 56 Time 57 Time 58 Time 59 Time 60
1 22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
2 18 19 9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
3 5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
4 32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
5 19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
6 24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
7 10 19 9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
8 13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

If you have a vector with names in it, you can rename many variables at once:

v_names <- c(ID = "id", 
             Baseline = "baseline_sessions_n",
             Intervention = "intervention_sessions_n")

d %>% 
  rename(any_of(v_names))
ID Baseline Intervention time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
1 22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
2 18 19 9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
3 5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
4 32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
5 19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
6 24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
7 10 19 9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
8 13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

In this example, there is no advantage over just applying the transformations directly in rename:

d %>% 
  rename(ID = "id", 
         Baseline = "baseline_sessions_n",
         Intervention = "intervention_sessions_n")
ID Baseline Intervention time_1 time_2 time_3 time_4 time_5 time_6 time_7 time_8 time_9 time_10 time_11 time_12 time_13 time_14 time_15 time_16 time_17 time_18 time_19 time_20 time_21 time_22 time_23 time_24 time_25 time_26 time_27 time_28 time_29 time_30 time_31 time_32 time_33 time_34 time_35 time_36 time_37 time_38 time_39 time_40 time_41 time_42 time_43 time_44 time_45 time_46 time_47 time_48 time_49 time_50 time_51 time_52 time_53 time_54 time_55 time_56 time_57 time_58 time_59 time_60
1 22 20 11.481390 14.401475 6.150868 9.254059 9.579971 8.160149 10.124808 11.120275 9.202528 10.838353 9.181093 8.436128 9.977338 13.242464 9.317091 8.275580 9.207833 10.888414 7.225326 11.472184 8.570382 10.582867 11.03763 13.12591 17.99503 12.965307 23.83989 21.39564 22.652730 30.186799 29.49430 31.359282 29.31638 33.78262 38.94499 37.28258 42.56795 37.73358 44.34489 44.60556 49.51082 49.60417 48.72288 50.88207 51.22964 49.03761 51.27490 48.99979 51.13479 48.72249 50.63808 51.23539 54.45665 48.83957 50.35392 54.53799 52.76513 50.67702 50.17861 50.03028
2 18 19 9.114530 8.586483 11.009756 9.355582 8.094828 8.771899 8.953120 7.442692 11.056727 7.590044 11.086753 10.670066 7.842831 9.018026 11.266130 5.886133 7.387846 8.107445 12.111284 14.825195 13.106666 18.156120 21.65248 22.81874 24.84652 25.152196 29.22987 31.49869 30.092382 34.848345 29.83119 42.884047 40.93578 45.12095 48.62162 48.62702 47.45525 47.03118 48.66027 47.54405 50.79882 50.57596 49.92920 48.24046 49.40579 48.22480 47.97522 47.58815 44.74564 43.74299 49.14827 48.02379 44.89332 48.27857 47.22880 48.91035 44.84299 45.47214 49.48035 50.34650
3 5 18 7.243792 14.309134 7.978843 11.175362 12.287030 13.145834 11.694320 18.998364 16.125544 20.296444 21.905149 21.952354 27.114283 27.932517 27.447843 30.746875 36.685054 37.336880 39.197311 43.844702 44.751691 43.357550 46.34139 47.00592 44.49681 46.506577 44.03419 47.01678 48.690672 43.648066 47.70016 49.477448 47.99977 46.17302 48.51305 43.80151 48.55182 48.29836 43.90161 47.10333 51.73539 40.91064 45.88078 49.40663 45.65272 43.96433 46.60210 44.56747 42.59574 47.05461 46.58171 43.50139 42.44631 47.22240 44.16290 49.38115 44.99909 46.10792 47.08795 44.91932
4 32 20 11.296579 14.307367 14.810658 8.677719 9.533368 7.822653 9.561326 8.208895 12.897751 8.243799 8.547692 8.616214 7.586228 8.920517 9.198943 8.711324 11.026336 10.804031 11.591157 10.519060 9.027216 7.848482 12.31915 10.06748 12.19758 7.872384 8.83143 10.43239 8.886314 8.552978 11.83615 7.607858 11.13613 13.53312 14.89085 19.39188 22.94677 23.98795 20.03181 27.08285 28.20672 27.31684 34.86907 33.64589 36.78373 37.47689 40.52695 41.81391 40.74878 45.50657 45.75414 47.57176 56.02767 51.27312 51.97966 47.74709 48.65355 50.63484 48.44609 50.34281
5 19 20 10.667172 6.803895 8.303144 14.497614 6.867806 12.367478 9.413963 9.444261 8.744703 14.304808 10.960766 10.451372 14.827904 6.272452 10.096415 12.403348 8.463626 8.628488 9.839363 14.449422 13.048669 16.131181 18.19289 17.92118 23.90188 22.576320 23.05887 25.57293 29.038100 31.426959 33.96424 36.342143 38.46910 41.43815 40.76039 44.75332 47.46249 48.81603 52.23999 50.28782 51.07685 48.51878 51.01045 49.58853 50.63677 47.93697 45.46109 52.17792 48.00961 47.00786 50.80013 47.70737 54.89829 49.20719 52.96849 51.58046 47.74537 53.55861 48.21013 50.76616
6 24 20 13.027050 7.471434 13.679538 5.768483 10.788906 8.698977 9.322671 8.311932 12.865342 9.966903 6.926024 7.546386 10.175361 12.911465 5.829294 8.554141 10.513759 12.225228 9.298788 9.089913 9.930872 10.489717 12.34784 11.75163 10.44748 13.390600 16.23082 16.95081 21.799091 21.528074 21.87916 25.988930 26.66509 32.06949 32.79460 35.01925 37.48007 32.86721 35.36552 41.40063 46.80009 49.05867 46.94410 50.26605 54.22345 50.03206 51.80317 49.74147 48.77986 48.08299 48.62034 48.06852 48.06212 46.87093 50.26182 44.33751 51.32408 48.03641 54.78962 48.73745
7 10 19 9.695231 9.174433 10.636559 9.135522 7.362022 14.266673 9.447716 12.377972 6.093100 5.853903 9.539306 14.262163 13.727533 19.424575 20.669408 22.576336 26.741461 24.728567 28.852438 30.760060 31.627913 34.360552 36.09521 40.91820 39.43172 41.137066 43.04933 46.76154 46.199305 48.768979 47.36646 49.477421 52.47127 49.36562 44.54360 48.07476 46.38418 47.66100 45.35457 49.09778 48.01635 47.49901 49.15443 48.99818 47.30794 46.44190 49.66045 48.06729 46.28851 45.44424 48.75336 49.89107 49.41766 47.99784 50.40713 46.83389 46.45106 48.04787 48.95078 52.75730
8 13 20 10.596034 11.746395 10.903057 11.402866 8.263436 7.589180 8.232499 7.294619 10.028391 10.105379 9.796763 11.889425 7.041182 12.705354 13.625825 14.989743 19.613571 23.575270 22.594344 25.971062 26.860383 30.064456 29.58605 31.22675 34.26105 36.188052 38.32964 42.61264 43.274969 42.503742 42.50890 51.437160 53.72465 50.30605 48.78798 51.40765 53.23059 50.96061 53.23112 48.39759 48.72193 50.48096 51.47567 50.61417 48.02566 50.54120 50.11116 49.68551 50.81838 45.58281 50.14003 49.66069 47.78592 49.98868 52.46699 50.67714 49.14738 51.38654 47.67108 50.32172

The reason one might use the named-vector method is that one already has the named vector handy, and retyping it would be an error-prone waste of time.

Data Restructuring

We need to make our wide data long. To make the illustration simple to see, let’s make a smaller version of the data. We will select just times 1–3 for the first 2 children:

d_small <- d %>% 
  select(id:time_3) %>% 
  filter(id < 3)

Breaking down each step, the code above does these three things:

  1. Create a new tibble called d_small starting with the tibble d “and then” (i..e, %>%)…
  2. select variables id through time_3 “and then”…
  3. filter the rows to find only the cases in which id is less than 3.

Let’s display d_small:

d_small
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3
1 22 20 11.48139 14.401475 6.150868
2 18 19 9.11453 8.586483 11.009756

What we need is:

id baseline_sessions_n intervention_sessions_n time on_task
1 22 20 1 11.481390
1 22 20 2 14.401475
1 22 20 3 6.150868
2 18 19 1 9.114530
2 18 19 2 8.586483
2 18 19 3 11.009756

Restructuring from wide to long format with pivot_longer

Use the pivot_longer function to pivot the three time variables to long format:

d_small %>% 
  pivot_longer(cols = time_1:time_3)
id baseline_sessions_n intervention_sessions_n name value
1 22 20 time_1 11.481390
1 22 20 time_2 14.401475
1 22 20 time_3 6.150868
2 18 19 time_1 9.114530
2 18 19 time_2 8.586483
2 18 19 time_3 11.009756

To avoid the hassle of renaming our columns, we can specify what the name and value columns should be called:

d_small %>%
  pivot_longer(cols = time_1:time_3,
               names_to = "time",
               values_to = "on_task")
id baseline_sessions_n intervention_sessions_n time on_task
1 22 20 time_1 11.481390
1 22 20 time_2 14.401475
1 22 20 time_3 6.150868
2 18 19 time_1 9.114530
2 18 19 time_2 8.586483
2 18 19 time_3 11.009756

Notice that the time variable is text, not a number, like we want. There are many ways to get rid of the prefix. The simplest is to tell pivot_longer to strip away the prefix “time_”.

d_small %>%
  pivot_longer(cols = time_1:time_3,
               names_to = "time",
               values_to = "on_task",
               names_prefix = "time_")
id baseline_sessions_n intervention_sessions_n time on_task
1 22 20 1 11.481390
1 22 20 2 14.401475
1 22 20 3 6.150868
2 18 19 1 9.114530
2 18 19 2 8.586483
2 18 19 3 11.009756

Unfortunately, R thinks that time is a text variable. We want it to be an integer, so we transform it using the as.integer function:

d_small %>%
  pivot_longer(cols = time_1:time_3,
               names_to = "time",
               values_to = "on_task",
               names_prefix = "time_", 
               names_transform = list(time = as.integer)) 
id baseline_sessions_n intervention_sessions_n time on_task
1 22 20 1 11.481390
1 22 20 2 14.401475
1 22 20 3 6.150868
2 18 19 1 9.114530
2 18 19 2 8.586483
2 18 19 3 11.009756

Alternatively, we can do all these transformations after pivoting:

d_small %>%
  pivot_longer(cols = time_1:time_3,
               names_to = "time",
               values_to = "on_task") %>% 
  mutate(time = str_remove(time, "time_") %>% 
           as.integer())
id baseline_sessions_n intervention_sessions_n time on_task
1 22 20 1 11.481390
1 22 20 2 14.401475
1 22 20 3 6.150868
2 18 19 1 9.114530
2 18 19 2 8.586483
2 18 19 3 11.009756

Restructuring from long to wide format with pivot_wider

Let’s pivot the data from long format to wide. First let’s assign the restructured d_small data as d_small_longer:

d_small_longer <- d_small %>%
  pivot_longer(cols = time_1:time_3,
               names_to = "time",
               values_to = "on_task",
               names_prefix = "time_",
               names_transform = list(time = as.integer))

Now let’s move it back to where it was with pivot_wider:

d_small_longer %>% 
  pivot_wider(names_from = time, 
              values_from = on_task, 
              names_prefix = "time_")
id baseline_sessions_n intervention_sessions_n time_1 time_2 time_3
1 22 20 11.48139 14.401475 6.150868
2 18 19 9.11453 8.586483 11.009756

Perfect! Now lets move back to the the original large tibble d.

Create new variables with mutate and case_when

First, let’s pivot the entire data and assign it to d_longer. We need to select time_1time_60:

d_longer <- d %>%
  pivot_longer(cols = time_1:time_60,
               names_to = "time",
               values_to = "on_task",
               names_prefix = "time_",
               names_transform = list(time = as.integer))

To create a new variable in d_longer, we “mutate” the tibble. For example, if we want to add 1 to on_task,

d_longer %>% 
  mutate(ontask_plus_1 = on_task + 1) %>% 
  head()
id baseline_sessions_n intervention_sessions_n time on_task ontask_plus_1
1 22 20 1 11.481390 12.481390
1 22 20 2 14.401475 15.401475
1 22 20 3 6.150868 7.150868
1 22 20 4 9.254059 10.254059
1 22 20 5 9.579971 10.579971
1 22 20 6 8.160149 9.160149

Notice that the variable is not saved anywhere because we did not assign the output to a variable:

d_longer %>% 
  head()
id baseline_sessions_n intervention_sessions_n time on_task
1 22 20 1 11.481390
1 22 20 2 14.401475
1 22 20 3 6.150868
1 22 20 4 9.254059
1 22 20 5 9.579971
1 22 20 6 8.160149

We need to identify which rows are part of the pre-intervention phase, which are in the intervention phase, and which are in the post-intervention phase. Let’s use the case_when function, which is like a series of “if-then” statements followed by the desired result.

d_longer <- d_longer %>% 
  mutate(phase = case_when(
    time <= baseline_sessions_n ~ "Pre-Intervention",
    time <= baseline_sessions_n + intervention_sessions_n ~ "Intervention",
    time > baseline_sessions_n + intervention_sessions_n ~ "Post-Intervention" 
  ))

d_longer %>% 
  head()
id baseline_sessions_n intervention_sessions_n time on_task phase
1 22 20 1 11.481390 Pre-Intervention
1 22 20 2 14.401475 Pre-Intervention
1 22 20 3 6.150868 Pre-Intervention
1 22 20 4 9.254059 Pre-Intervention
1 22 20 5 9.579971 Pre-Intervention
1 22 20 6 8.160149 Pre-Intervention

Plot the results:

Plotting is rarely done all at once. We usually build it step-by-step and layer-by-layer. Figure 1 is the first attempt.

d_longer %>% 
  ggplot(aes(time, on_task)) +
  geom_line(aes(group = id, color = phase))
Figure 1: First attempt at plotting. Messy, messy, messy!

That was not so great! Let’s make a separate plot for each person in Figure 2.

d_longer %>% 
  ggplot(aes(time, on_task)) +
  geom_line(aes(group = id, color = phase)) + 
  facet_grid(rows = vars(id))
Figure 2: Plot each person’s data on separate facet plots

Better!

Let’s reorder the cases by the order in which the intervention is first implemented. The fct_reorder function is incredibly useful for creating plots. Because the id variable is just a number, we need to change it to a factor variable first. A factor is R’s way of grouping categorical data in any order we wish.

We need to reorder the id variable so that people with shorter baselines come before people with longer baselines. The first slot in the fct_reorder function is the variable we want to transform into a reordered factor. The second slot is another variable we want to order the factor factor by. Thus, the fct_reorder function usually takes this form: fct_reorder(variable_to_be_reordered, variable_to_sort_by)

In this case, we are assigning the reordered variable back to the same variable id that we started with. Putting it all together, we get Figure 3.

d_longer %>% 
  mutate(id = fct_reorder(factor(id), baseline_sessions_n)) %>% 
  ggplot(aes(time, on_task)) +
  geom_line(aes(group = id, color = phase)) + 
  facet_grid(rows = vars(id))
Figure 3: Reordered facet plots

Looks even better!

If I wanted a more polished plot for publication, I might create something like Figure 4. My priorities for a plot in a publication are quite different than a plot I am making just for my own use. I want the plot to be nearly self-explanatory. I want reader to understand the plot without needing to work hard. Thus, I prefer direct labels rather than plot legends that require looking things up. I also want my plot to be inclusive, so I choose a large font size from a legible font family. I also choose colors that are distinguishable by people different kinds of color blindness.

Code
# Some pre-processing for easier plotting
d_longer_processed <- d_longer %>% 
    mutate(id = factor(id) %>% fct_reorder(baseline_sessions_n),
         phase = fct_inorder(phase)) %>% 
  arrange(phase) %>% 
  mutate(id = factor(id, labels = paste0("Child ", LETTERS[1:8])))

# data for phase rectangles
d_phases <- d_longer_processed %>% 
  summarise(.by = c(id, phase),
            begin = min(time),
            end = max(time)) %>% 
  mutate(begin = ifelse(begin == 1, 0, begin),
         end = end + 1)

d_longer_processed %>%
  ggplot(aes(time, on_task)) +
    geom_vline(xintercept = rep(1:4, 12) + rep(seq(0,55,5), each = 4), linewidth = unit(.1, "mm"), color = "gray80") +
  ggtext::geom_richtext(
    data = . %>% filter(id == "Child E", time %in% c(10, 30, 50)),
    aes(label = phase, color = phase),
    label.color = NA,
    fill = scales::alpha("white", .5),
    label.padding = unit(0, "mm"),
    label.margin = unit(2, "mm"),
    vjust = c(0, 0, 1),
    angle = c(0, 24, 0),
    size = 5.5,
    family = "Roboto Condensed"
  ) +
  geom_rect(
    data = d_phases,
    aes(
      xmin = begin,
      xmax = end,
      ymin = 0,
      ymax = 60,
      fill = phase
    ),
    inherit.aes = FALSE
  ) +
  geom_line(aes(group = id, color = phase), linewidth = 1) +
  facet_grid(rows = vars(id)) +
  theme(strip.text.y = element_text(angle = 0),
        legend.position = "none") +
  labs(x = "Day", y = "Time on Task")  +
  theme_light(base_family = "Roboto Condensed", base_size = 18) +
  theme(legend.position = "none",
        panel.border = element_blank(),
        strip.text.y = element_text(angle = 0), 
        axis.text.y = element_text(vjust = c(0))) +
  scale_color_viridis_d(alpha = .9, begin = .1, end = .7) +
  scale_fill_viridis_d(alpha = .15, begin = .1, end = .7) +
  scale_y_continuous(
    limits = c(0, 60),
    breaks = seq(0, 50, 10),
    expand = expansion(0)
  ) +
  scale_x_continuous(limits = c(0, 61), breaks = seq(0, 60, 10), minor_breaks = seq(0,60, 5), expand = expansion(0)) +
   guides(
    x = guide_axis(minor.ticks = TRUE)
    # y = guide_axis(minor.ticks = TRUE)
  ) + 
  coord_cartesian(clip = "off")
Figure 4: Effect on Intervention on Time on Task for Eight Children

For practice data wrangling, download this quarto document and complete the data-wrangling exercises therein.