People didn’t always celebrate the new year on January 1. The earliest recording of a new year celebration is believed to have been in Mesopotamia, circa 2000 B.C. The EarthSky team has a.
. I Explore. II Wrangle. III Program. IV Model. V Communicate. 16.1 IntroductionThis chapter will show you how to work with dates and times in R.
At first glance, dates and times seem simple. You use them all the time in your regular life, and they don’t seem to cause much confusion. However, the more you learn about dates and times, the more complicated they seem to get. To warm up, try these three seemingly simple questions:. Does every year have 365 days?.
Does every day have 24 hours?. Does every minute have 60 seconds?I’m sure you know that not every year has 365 days, but do you know the full rule for determining if a year is a leap year? (It has three parts.) You might have remembered that many parts of the world use daylight savings time (DST), so that some days have 23 hours, and others have 25. You might not have known that some minutes have 61 seconds because every now and then leap seconds are added because the Earth’s rotation is gradually slowing down.Dates and times are hard because they have to reconcile two physical phenomena (the rotation of the Earth and its orbit around the sun) with a whole raft of geopolitical phenomena including months, time zones, and DST.
This chapter won’t teach you every last detail about dates and times, but it will give you a solid grounding of practical skills that will help you with common data analysis challenges. 16.2 Creating date/timesThere are three types of date/time data that refer to an instant in time:.A date. Tibbles print this as.A time within a day. Tibbles print this as.A date-time is a date plus a time: it uniquely identifies aninstant in time (typically to the nearest second). Tibbles print thisas. Elsewhere in R these are called POSIXct, but I don’t thinkthat’s a very useful name.In this chapter we are only going to focus on dates and date-times as R doesn’t have a native class for storing times.
If you need one, you can use the hms package.You should always use the simplest possible data type that works for your needs. That means if you can use a date instead of a date-time, you should. Date-times are substantially more complicated because of the need to handle time zones, which we’ll come back to at the end of the chapter.To get the current date or date-time you can use today or now: today# 1 '2019-01-08' now# 1 '2019-01-08 15:37:18 UTC'Otherwise, there are three ways you’re likely to create a date/time:. From a string. From individual date-time components. From an existing date/time object.They work as follows.
![Why Is Jan 1 For Old Year In As.posixct Why Is Jan 1 For Old Year In As.posixct](/uploads/1/2/5/6/125643333/418138213.png)
![Why Why](/uploads/1/2/5/6/125643333/976133832.jpg)
16.2.1 From stringsDate/time data often comes as strings. You’ve seen one approach to parsing strings into date-times in.
Another approach is to use the helpers provided by lubridate. They automatically work out the format once you specify the order of the component.
To use them, identify the order in which year, month, and day appear in your dates, then arrange “y”, “m”, and “d” in the same order. That gives you the name of the lubridate function that will parse your date. For example: ymd( '2017-01-31')# 1 '2017-01-31' mdy( 'January 31st, 2017')# 1 '2017-01-31' dmy( '31-Jan-2017')# 1 '2017-01-31'These functions also take unquoted numbers. This is the most concise way to create a single date/time object, as you might need when filtering date/time data. Ymd is short and unambiguous: ymd( 20170131)# 1 '2017-01-31'ymd and friends create dates. To create a date-time, add an underscore and one or more of “h”, “m”, and “s” to the name of the parsing function: ymdhms( '2017-01-31 20:11:59')# 1 '2017-01-31 20:11:59 UTC' mdyhm( ' 08:01')# 1 '2017-01-31 08:01:00 UTC'You can also force the creation of a date-time from a date by supplying a timezone: ymd( 20170131, tz = 'UTC')# 1 '2017-01-31 UTC'. 16.3.1 Getting componentsYou can pull out individual parts of the date with the accessor functions year, month, mday (day of the month), yday (day of the year), wday (day of the week), hour, minute, and second.
Datetime 1 2016 month(datetime)# 1 7 mday(datetime)# 1 8 yday(datetime)# 1 190 wday(datetime)# 1 6For month and wday you can set label = TRUE to return the abbreviated name of the month or day of the week. Set abbr = FALSE to return the full name. Month(datetime, label = TRUE)# 1 Jul # 12 Levels: Jan 1 Friday # 7 Levels: Sunday% mutate( wday = wday(deptime, label = TRUE))%% ggplot( aes( x = wday)) + geombarThere’s an interesting pattern if we look at the average departure delay by minute within the hour. It looks like flights leaving in minutes 20-30 and 50-60 have much lower delays than the rest of the hour! Flightsdt%% mutate( minute = minute(deptime))%% groupby(minute)%% summarise(avgdelay = mean(arrdelay, na.rm = TRUE),n = n)%% ggplot( aes(minute, avgdelay)) + geomlineInterestingly, if we look at the scheduled departure time we don’t see such a strong pattern: scheddep% mutate( minute = minute(scheddeptime))%% groupby(minute)%% summarise(avgdelay = mean(arrdelay, na.rm = TRUE),n = n)ggplot(scheddep, aes(minute, avgdelay)) + geomlineSo why do we see that pattern with the actual departure times? Well, like much data collected by humans, there’s a strong bias towards flights leaving at “nice” departure times.
Always be alert for this sort of pattern whenever you work with data that involves human judgement! Ggplot(scheddep, aes(minute, n)) + geomline. 16.3.2 RoundingAn alternative approach to plotting individual components is to round the date to a nearby unit of time, with floordate, rounddate, and ceilingdate.
Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to. This, for example, allows us to plot the number of flights per week: flightsdt%% count( week = floordate(deptime, 'week'))%% ggplot( aes(week, n)) + geomlineComputing the difference between a rounded and unrounded date can be particularly useful. 16.3.3 Setting componentsYou can also use each accessor function to set the components of a date/time: (datetime 1 '2016-07-08 12:34:56 UTC' year(datetime) 1 '2020-07-08 12:34:56 UTC' month(datetime) 1 '2020-01-08 12:34:56 UTC' hour(datetime) 1 '2020-01-08 13:34:56 UTC'Alternatively, rather than modifying in place, you can create a new date-time with update.
This also allows you to set multiple values at once. 16.3.4 Exercises.How does the distribution of flight times within a day change over thecourse of the year?.Compare deptime, scheddeptime and depdelay. Are they consistent?Explain your findings.Compare airtime with the duration between the departure and arrival.Explain your findings. (Hint: consider the location of the airport.).How does the average delay time change over the course of a day?Should you use deptime or scheddeptime? Why?.On what day of the week should you leave if you want to minimise thechance of a delay?.What makes the distribution of diamonds$carat andflights$scheddeptime similar?.Confirm my hypothesis that the early departures of flights in minutes20-30 and 50-60 are caused by scheduled flights that leave early.Hint: create a binary variable that tells you whether or not a flightwas delayed.
16.4.1 DurationsIn R, when you subtract two dates, you get a difftime object: # How old is Hadley?hage Time difference of 14331 daysA difftime class object records a time span of seconds, minutes, hours, days, or weeks. This ambiguity can make difftimes a little painful to work with, so lubridate provides an alternative which always uses seconds: the duration. As.duration(hage)# 1 's (39.24 years)'Durations come with a bunch of convenient constructors: dseconds( 15)# 1 '15s' dminutes( 10)# 1 '600s (10 minutes)' dhours( c( 12, 24))# 1 '43200s (12 hours)' '86400s (1 days)' ddays( 0: 5)# 1 '0s' '86400s (1 days)' '172800s (2 days)' # 4 '259200s (3 days)' '345600s (4 days)' '432000s (5 days)' dweeks( 3)# 1 '1814400s (3 weeks)' dyears( 1)# 1 '31536000s (52.14 weeks)'Durations always record the time span in seconds. Larger units are created by converting minutes, hours, days, weeks, and years to seconds at the standard rate (60 seconds in a minute, 60 minutes in an hour, 24 hours in day, 7 days in a week, 365 days in a year).You can add and multiply durations: 2. dyears( 1)# 1 '63072000s (2 years)' dyears( 1) + dweeks( 12) + dhours( 15)# 1 '38847600s (1.23 years)'You can add and subtract durations to and from days: tomorrow 1 '2016-03-12 13:00:00 EST'onepm + ddays( 1)# 1 '2016-03-13 14:00:00 EDT'Why is one day after 1pm on March 12, 2pm on March 13?! If you look carefully at the date you might also notice that the time zones have changed. Because of DST, March 12 only has 23 hours, so if we add a full days worth of seconds we end up with a different time.
16.4.2 PeriodsTo solve this problem, lubridate provides periods. Periods are time spans but don’t have a fixed length in seconds, instead they work with “human” times, like days and months. That allows them work in a more intuitive way: onepm# 1 '2016-03-12 13:00:00 EST'onepm + days( 1)# 1 '2016-03-13 13:00:00 EDT'Like durations, periods can be created with a number of friendly constructor functions.
Seconds( 15)# 1 '15S' minutes( 10)# 1 '10M 0S' hours( c( 12, 24))# 1 '12H 0M 0S' '24H 0M 0S' days( 7)# 1 '7d 0H 0M 0S' months( 1: 6)# 1 '1m 0d 0H 0M 0S' '2m 0d 0H 0M 0S' '3m 0d 0H 0M 0S' '4m 0d 0H 0M 0S' # 5 '5m 0d 0H 0M 0S' '6m 0d 0H 0M 0S' weeks( 3)# 1 '21d 0H 0M 0S' years( 1)# 1 '1y 0m 0d 0H 0M 0S'You can add and multiply periods: 10. ( months( 6) + days( 1))# 1 '60m 10d 0H 0M 0S' days( 50) + hours( 25) + minutes( 2)# 1 '50d 25H 2M 0S'And of course, add them to dates. Compared to durations, periods are more likely to do what you expect: # A leap year ymd( '2016-01-01') + dyears( 1)# 1 '2016-12-31' ymd( '2016-01-01') + years( 1)# 1 '2017-01-01' # Daylight Savings Timeonepm + ddays( 1)# 1 '2016-03-13 14:00:00 EDT'onepm + days( 1)# 1 '2016-03-13 13:00:00 EDT'Let’s use periods to fix an oddity related to our flight dates.
Some planes appear to have arrived at their destination before they departed from New York City. Flightsdt%% filter(arrtime # A tibble: 10,633 x 9 # origin dest depdelay arrdelay deptime scheddeptime # # 1 EWR BQN 9 -4 2013-01-01 19:-01-01 19:20:00 # 2 JFK DFW 59 NA 2013-01-01 19:-01-01 18:40:00 # 3 EWR TPA -2 9 2013-01-01 20:-01-01 21:00:00 # 4 EWR SJU -6 -12 2013-01-01 21:-01-01 21:08:00 # 5 EWR SFO 11 -14 2013-01-01 21:-01-01 20:57:00 # 6 LGA FLL -10 -2 2013-01-01 21:-01-01 21:30:00 # # with 1.063e+04 more rows, and 3 more variables: arrtime, # # schedarrtime, airtime These are overnight flights.
We used the same date information for both the departure and the arrival times, but these flights arrived on the following day. We can fix this by adding days(1) to the arrival time of each overnight flight. Flightsdt% mutate(overnight = arrtime% filter(overnight, arrtime # A tibble: 0 x 10 # # with 10 variables: origin, dest, depdelay, # # arrdelay, deptime, scheddeptime, # # arrtime, schedarrtime, airtime, # # overnight. 16.4.3 IntervalsIt’s obvious what dyears(1) / ddays(365) should return: one, because durations are always represented by a number of seconds, and a duration of a year is defined as 365 days worth of seconds.What should years(1) / days(1) return?
Well, if the year was 2015 it should return 365, but if it was 2016, it should return 366! There’s not quite enough information for lubridate to give a single clear answer.
What it does instead is give an estimate, with a warning: years( 1) / days( 1)# estimate only: convert to intervals for accuracy # 1 365If you want a more accurate measurement, you’ll have to use an interval. An interval is a duration with a starting point: that makes it precise so you can determine exactly how long it is: nextyear 1 365To find out how many periods fall into an interval, you need to use integer division: ( today%-% nextyear)%/% days( 1)# Note: method with signature 'Timespan#Timespan' chosen for function '%/%', # target signature 'Interval#Period'. # 'Interval#ANY', 'ANY#Period' would also be valid # 1 365. 16.5 Time zonesTime zones are an enormously complicated topic because of their interaction with geopolitical entities. Fortunately we don’t need to dig into all the details as they’re not all important for data analysis, but there are a few challenges we’ll need to tackle head on.The first challenge is that everyday names of time zones tend to be ambiguous. For example, if you’re American you’re probably familiar with EST, or Eastern Standard Time.
However, both Australia and Canada also have EST! To avoid confusion, R uses the international standard IANA time zones. These use a consistent naming scheme “ / ”, typically in the form “/” (there are a few exceptions because not every country lies on a continent). Examples include “America/NewYork”, “Europe/Paris”, and “Pacific/Auckland”.You might wonder why the time zone uses a city, when typically you think of time zones as associated with a country or region within a country. This is because the IANA database has to record decades worth of time zone rules. In the course of decades, countries change names (or break apart) fairly frequently, but city names tend to stay the same. Another problem is that name needs to reflect not only to the current behaviour, but also the complete history.
For example, there are time zones for both “America/NewYork” and “America/Detroit”. These cities both currently use Eastern Standard Time but in 1969-1972 Michigan (the state in which Detroit is located), did not follow DST, so it needs a different name. It’s worth reading the raw time zone database (available at ) just to read some of these stories!You can find out what R thinks your current time zone is with Sys.timezone: Sys.timezone# 1 'UTC'(If R doesn’t know, you’ll get an NA.)And see the complete list of all time zone names with OlsonNames: length( OlsonNames)# 1 606 head( OlsonNames)# 1 'Africa/Abidjan' 'Africa/Accra' 'Africa/AddisAbaba' # 4 'Africa/Algiers' 'Africa/Asmara' 'Africa/Asmera'In R, the time zone is an attribute of the date-time that only controls printing. For example, these three objects represent the same instant in time: (x1 1 '2015-06-01 12:00:00 EDT'(x2 1 '2015-06-01 18:00:00 CEST'(x3 1 '2015-06-02 04:00:00 NZST'You can verify that they’re the same time using subtraction: x1 - x2# Time difference of 0 secsx1 - x3# Time difference of 0 secsUnless otherwise specified, lubridate always uses UTC. UTC (Coordinated Universal Time) is the standard time zone used by the scientific community and roughly equivalent to its predecessor GMT (Greenwich Mean Time). It does not have DST, which makes a convenient representation for computation.
Operations that combine date-times, like c, will often drop the time zone. In that case, the date-times will display in your local time zone: x4 1 '2015-06-01 12:00:00 EDT' '2015-06-01 12:00:00 EDT' # 3 '2015-06-01 12:00:00 EDT'You can change the time zone in two ways:.Keep the instant in time the same, and change how it’s displayed.Use this when the instant is correct, but you want a more naturaldisplay. X4a 1 '2015-06-02 02:30:00 +1030' '2015-06-02 02:30:00 +1030' # 3 '2015-06-02 02:30:00 +1030'x4a - x4# Time differences in secs # 1 0 0 0(This also illustrates another challenge of times zones: they’re notall integer hour offsets!).Change the underlying instant in time. Use this when you have aninstant that has been labelled with the incorrect time zone, and youneed to fix it.
X4b 1 '2015-06-01 12:00:00 +1030' '2015-06-01 12:00:00 +1030' # 3 '2015-06-01 12:00:00 +1030'x4b - x4# Time differences in hours # 1 -14.5 -14.5 -14.5.
'Tis the season for reading flurries of year-end round ups, stocking up on champagne, and planning lofty resolutions you'll inevitably break just a couple of short weeks into January—that's right, New Year's is coming. The new year has to begin sometime, but why pick January 1?Unlike the and certain, which are tied to astronomical phenomena, New Year's Day is a purely human artifact. Of course, it reflects a real astronomical fact: Earth traces a complete orbit around the sun once every 365.25 days. But there's nothing celestially special about January 1—March 6 or August 24 would do the trick just as well.Most of the world goes by the Gregorian calendar, which Pope Gregory established through a papal bull. The scheme aimed to solve liturgical problems surrounding. The calendar was only adopted gradually, causing a as, for example, Italy converted in 1582, England in 1752, and Russia in 1918.But the roots of marking the new year on January 1 go farther back even than Christianity.
Julius Caesar in 46 B.C. Took it upon himself to try to sort out the deeply complicated Roman calendar, and his solution, which stuck despite his assassination,.
That was in part on account of January's namesake, the god Janus, who oversaw.But after Christianity came into popularity, Europe saw many centuries of backlash against so-called pagan traditions like those established by Caesar. So dates like, also known as Lady Day or the Feast of the Annunciation, were officially used to mark the new year, although was still a popular day for less formal celebrations and regained its status with Gregory's reform.Of course, January 1 still hasn't quite nailed down its exclusive status as the first day of the new year., the new year is celebrated between January 21 and February 21 based on the lunar cycle. The Hebrew calendar.
Pre-Christian inhabitants of Britain used the. Every option gets the job done.Despite everyone's comfort with the calendar as it currently stands, some would still like to see it change. Have spent more than six years lobbying for a tweaked calendar that would as the first day of the year but chop the year off after a round number of weeks. It's likely to stick just as well as your resolutions.