Software needs a way to represent a point in time, a ‘timestamp’, and the difference between two points in time.
- TimeStamps: What are the problems to be solved?
- Absolute Time: UTC
- Time Zones: Time is Relative
- Date/Time from which Perspective?
- Extra Data: Local Time
- Original Event Time: Key To Reporting.
- Local Time Calculation & Complications.
- Timezone Solutions
- DateTime Properties & Usage
- Implementation
- General
- Python
- Kotlin
- Conclusion: Combining Python and Kotlin
Timestamps: What are the problems to be solved?
There is a need to record, store, view timestamps and formulate reports of when events occurred. To solve these needs, the following is required:
- There is a need to record the point in time an event occurs either:
- by recording data when the event occurs
- by converting date-times from existing data in other formats
- Points in time should be able to be sorted into a sequence of events, with time difference between events able to be calculated.
- Points in time need to be able to be displayed/reported from perspective as either:
- UTC
- in the local time of an observer
- in the local time of the original event
- The ‘point in time’ should provide for filtering events for calculations and reporting, so that events may be categorised by time/date characteristics of the event.
Absolute Time: UTC
“What is the time” may seem a simply question, but once ‘where’ is added it becomes a little more complex. When it is 6am in New York, it is already 11am in London. This means 10am in London occurs before 9am in New York, so 10am can occur after 9am, depending on where the two times are recorded. Reducing all times to UTC ( Coordinated Universal Time?) is the only way to ensure times can be correctly sequenced.
Operating systems solve the time sequence problem by keeping all file attribute date-times as UTC. This is easy to test. Create a new file, and check the created date. Manually change your computer time zone, for example forward one hour (on windows, right click on the time and select ‘adjust date/time’, then turn of ‘set the time zone automatically’ and change the time Zone). The clock on your computer will then move forward one hour. Check the file creation date time again, and it will have also moved forward one hour! The file still shows as being created a few moments ago, even against the now wrong time displayed on your computer. This happens because the file creation date/time is recorded as UTC, but then always displayed as to you as the ‘observer’, in your local time. This means that changing the time zone will change the time shown for file date-times.
Simply storing date-times as UTC is considered a solution for file timestamps, as used by operating systems. However, for many applications there is a compelling case allowing for displaying/reporting the time of the original event, which requires an associated event time-zone in addition to a simple UTC timestamp.
Time-zones: Time is relative
Some background. A universal time such as UTC is a new concept. The basis of our time keeping is the ‘day’: the time it takes the earth to rotate 361 degrees from one noon until the next, a time interval which varies slightly throughout the year.
Historically, time at any given location was based on the local ‘solar noon‘, determined by calculation when when the sun is at its highest point in the sky. The fact that this mean to moved even one meter to the east or west would change the time was countered by having a town clock with a standard time. The fact the each town still had a different time began to become a problem once railways started to operate, resulting in watches or ‘portable clocks’ which were already common being moved from town to town. As early as 1847 railway companies in Britain adopted “Greenwich mean time”, based on the solar noon at the British Observatory in in England, as a standard time. This first time zone was unofficial until 1880 when Greenwich Mean Time (GMT) was adopted for the island of Britain, creating the first official time zone, with a standard time in use over a large area. With the telegraph having been around since 1852, the time in one location could be relayed almost instantly over long distances, and the idea of time zones caught, with 26 countries attending the International Meridian Conference and adopting the concept of time zones ‘globally’.
The innovation of the time zone is a standard time for an entire region. UTC takes that one step forward as a standard time for the entire earth.
Date/Time from what Perspective?
While we can record events in UTC or ‘Earth Time’, then as operating systems do, translate those times most times into the time zone of the observer. However, some events are best expressed in the local time of the event, and not in the time zone of the the ‘observer’. For example, most international news stories will quote the local time of the event, because the local time of day is relevant to the story. “The accident occurred at 4pm our time” may lose the perspective of “the accident occurred as 3am local time in <location>”. When displaying events, the question arises, “the date/time relative to which local time”?
Perspectives when dealing with date/times include:
- Time Zone Independent Data: no specific time zone – UTC is appropriate
- time should be present relative to the ‘observer’: the time zone of the observer will be required
- time should be presented relative to the original event: this requires events to be tagged with the time zone of the event
1. Time Zone Independent Data
For data to be discussed during international conference calls or to viewed as is internationally, presenting events in UTC time may be the most appropriate format. Data related to time zone independent events such as on satellites, or for collections of data, perhaps even from many time zones but where it the relationship of the absolute date/times to each other that is considered important rather than the relationship to any specific local time, UTC as a ‘neutral’ time zone is the most appropriate format.
2. Time Relative To The Observer.
Consider the example of the file creation time. These date/times are always presented in the time zone of the observer. As per the experiment of changing the time zone of the observer, resulting in the change in the reported time. The same happens for example with a messaging program such as Facebook messenger or Skype, or twitter. Each observer seems the event of the message being posted displayed relative to the time zone where they make the observation. If they again observe the message at another time when they have changed times zones, they see a different time for the posting of the message. Where time relative to the observer is required, this is the time relative to the observe at the time of observation, not time relative to where the observer was when the original event occurred. This means the time zone of the observer must be determined at the time the observation takes place, using the location where the observation takes place, and thus the observer time zone is not part of the data.
3. Time Relative To Original Event.
Imagine a computer system accessible from two time zones. If someone in the other time zone creates a file and you check the file attributes 5 minutes later, you can easily see the file was created 5 minutes ago. Now consider, what if the time in your time zone is 2pm, so the file shows as created at 1:55pm (5 minutes ago), but the person who created the file is in a time zone 1hr ahead. If you ask that person about the file they created at 1:55pm, they could say ‘I was not even in the office at 1:55pm!’, as in their time one hour ahead, the file shows as created at 2:55pm.
What was the time of day when the file was created? To know that, the time zone for the event must have been recorded, otherwise, while the time zone of the observer at the time of the observation is known, nothing is known about time zones back when the event took place. As in the previous experiment on changing the clock after creating a file, where the actual time in operation when the file was created is not available, only the time with respect to the new ‘observer’ time zone.
This also can be relevant with timestamped messages. If I send a skype message from Sydney at 2:00pm to a person in Auckland, then the receiver (observer) of the message will see the message as sent at 4:00pm. The receiver of the message may know that because I am in the Sydney time zone, it was 2:00pm my time when I sent the message and direct me to ‘the message you sent at 2:00pm so I can find it, but this requires the knowledge what time zone is relevant for the sender. For a computer to program to have that knowledge, the time zone of the sender, or time zone of the original event, must be data available in addition to the UTC time of the event.
Original Event Time: Key to Reporting
Keeping times as just a UTC timestamp provides for both filtering and displaying time as UTC, or as times relative to the observer. Filtering or displaying the time of an event as it happened, the ‘original event time’ , requires additional information sufficient to determine what local time was at the time of the original event.
So is storing this extra data worthwhile? Answer: yes, if the events being recorded may possibly require being analysed as a data set, and may contain data from multiple times zones, or even one timezone that is subject to Daylight Saving Time. Operating systems record file create/modify dates using UTC time stamp alone and display using using observer time, but operating systems do not produce reports analysing file create or modify events.
Considering again any news story, as in hypothetical story about “the accident” above. On the news, any story about an event from another time zone will normally also report what time the event happened in local time. Now consider analysis of events such as a news worthy ‘accident’. One of the key statistics in analysing such events would be to created statistics on what time they must frequently occur. The time of day for such reports needs to be the time of day of the original event.
The same usually applies to business data. Consider, as opposed to a file creation ‘timestamp’, a ‘timestamp’ for an order for a meal. These timestamps are recording ‘order time’. Now consider a case where data has been is collected from a number of stores located in 4 different time zones, each 1hour apart: head office on US Pacific time, some stores 1hr ahead on US Mountain time, others on US Central time 2 hours ahead the remainder on US Eastern time and 3 hours ahead. Reports run separately for the stores of each times zone show:
- Pacific Time Stores: Noon to 1pm
- Mountain Time Stores: 11am to Noon
- Central Time Stores: 10am to 11am
- Eastern Time Stores: 9am to 10am
Using UTC times and adjusting for local time of the computer running the reports in California, would show a lunch order at 12:15pm in New York (Easter time) as being at 9:15am, because the order was at 9:15am California time. In fact for all stores, the busiest time is between Noon and 1pm. A reader seeing such reports could adjust themselves, but if a single report added all transactions it could misleadingly report transitions as equally spread between 9am and 1pm, which is useful for calculating how busy the server in California will be, but of little use at stores for planning when local staff are needed.
The keep point is that for some reporting absolute times are appropriate, but there are also other reports where local times are essential.
This means that for any application where reporting and analysis will in some cases require local times, additional data is required. A generalised timestamp format needs to provide for both UTC time, and local time.
In the example above, recording which time zone (Eastern, Central, Mountain, Pacific) is related to the order, together with to the UTC time stamp, would provide for both types of reporting:
- chronological sequence
- local time: time of day/day of week style analysis
There are several possible ways to record the additional data required an association to a local time in addition to the core UTC time.
- Possible ways to add local time information:
- time zone (e.g. Eastern, Central…)
- an independent timestamp field for the local time
- a difference value between UTC time and local time
- local
'time of day'
and'day of week'
Any of these values, combined with the UTC for an event, could meet the requirements for the problem to be solved, and each has advantages and limitations.
Operating Systems use UTC time and time zone, but as already discussed, do not save time zone with file timestamps, and as a consequence file timestamps cannot report the original local time when a file event (create/modify/access) took place.
Following conventions desirable, storing this extra data as ‘time-zone’ had appeal since it is consistent with operating system conventions, but it is not proven this is suitable for reporting local times for events.
Local Time Calculation & Complications.
Calculation of local time zone, whether for ‘observer’ time zone, or ‘original event time zone’, is complex for the reasons outlined in this section. While observert time zones report relative to the time zone in effect now, “Original event’ time zones have the additional complication of needing to calculate the time zone in operation at a given point of time in the past.
In the ‘Local Time’ section, the example considered how multiple time zones results in the need for storing data additional data in order to be able to report both absolute time and local time. A generalised solution needs to able to work for local times from the anywhere in the entire world, and as events may be retrieved at a future time, also be as ‘future-proof’ as possible. So what are the possible complications? These include:
- Daylight Saving Time
- Varying Dates and Rules For Daylight Saving Time
- Varying Amounts of Daylight Saving
- Changes to Time zones
Daylight Saving Time.
In the example for ‘more data‘, the report considered 4 time zones. If each timestamp contains an identifier of the time zone, then local time could be calculated from the UTC timestamp. However, each of those time zones also has daylight saving. Operating systems further divide time zones to into times following the same daylight saving rules. For example:
- US Mountain Time: Divided Into
- Arizona: Following ‘Mountain Standard Time’ all year.
- US Mountain Time: Changing to ‘Mountain Summer Time’ according to rules common to other US locations.
- Australian Eastern Standard Time: Divided Into
- Brisbane: Australian Eastern Standard Time all year
- Sydney, Canberra, Melbourne: Changes to Australian Eastern Summer Time
- Hobart: Changes to Australian Eastern Summer Time, but for a longer time
Note that the divisions are on the basis of the rules for applying daylight saving (or not). There are locations in Arizona the choose to follow US Mountain Time rules, and the whole state of Queensland follows the same rules as Brisbane. For any location, it is a matter of selecting the set of rules matching those applying locally, and the labels serve as a ‘we are like them’, not as an actual location.
Varying Dates And Rules For Daylight Saving Time.
The ‘divided time zones’ as described above allows programs to calculate the current local time from the UTC time, but calculation for past dates becomes more complex. This year the US started daylight saving at 2:00 am on Sunday, 8 March, and ends at 2:00 am on Sunday, 1 November. In the UK, daylight saving time 2020 in United Kingdom began at 1:00 am on
Sunday, 29 March and ends at 2:00 am on Sunday, 25 October. Not only do the dates change every year, usually in the west to fall on a Sunday, but which Sunday is governed rules that change from location to location. Iraq did follow daylight saving until 2007 and the last change was on a Monday. This year Iran started daylight saving on a Saturday and ends on a Sunday. An implementation of the rules for each time zone would too complex. However, a table of start and end dates for each time zone for every years since daylight saving began sounds possible, until the fact that states and even cities can change which set of rules they follow. In place of dividing time zones by which rules are currently followed, locations would have to be divided by locations that have historically followed the same rules. Who is even going to know which ‘zone’ to choose! Did my city follow the same rules as CityA in 1920, or as CityB at that that time?
Varying Amounts of Daylight Saving
It even turns out not every location that has daylight saving changes the clocks by one hour. Instead of a simple Boolean flag to ‘add one hour, daylight saving applies’, the time difference could be 20 minutes, 30 minutes, 40 minutes or two hours. Usage of times other than one hour are rare, but they have happened and even still happen, so any system planning to be universal must cope.
Changing Time Zones.
Historically countries/cities change not only their daylight saving rules, but even their local time itself. Country/State boundaries do not follow lines of longitude, and changing time mid country or mid-state would just be impractical. But which state/country does a location wish to share time with? As allegiances change, so do actual time zones. The YouTube video here gives a useful introduction to how things change. Clearly, over time, the meaning of a time zone (e.g. US Pacific Daylight Time), could change.
Time Zone Solutions.
Time Zone GPS DataBases
There are databases and web services that allow calculating time zone from GPS coordinates. Although theoretically possible, I have not found a database or service that calculates part time zones for any given date/time in the past. Given time zone boundaries can change, there is no guarantee the time zone for a given GPS location will be able to be easily determined in the future, so recording GPS coordinates for the purpose of reports being able to use the GPS coordinates to calculate time zone creates an element of risk. However, these databases or services do provide for either:
- calculate observer time zone
- calculate local event time zone at the time of the event in order to record time zone information
The alternative method to determine either observer time zone, or original event time zone at the time of the event, is to have the time zone information collected by a web browser or client application used by the observer or used to record the original event. Browsers, mobile devices and PCs are all normally ‘local time zone aware’.
Time Zone Offset Databases.
At some some point, knowing which time zone is insufficient and an offset from UTC must be calculated.
Again, there are also databases and web services that enable finding the offset from UTC for any given time zone. Notably the Tz Offset database, also known as the Olsen time zone database, provides for calculating the offset for any given date/time and time zone since January 1st 1970.
Any web browser, mobile or desktop app can retrieve the current time zone offset for the local time zone of the device without using any database. While there are libraries with function to calculate time zone offset from a database, the calculation is not insignificant and difficult to incorporate in a database query.
All factors considered, offsets for an ‘original event time’, as usually best stored with the time at the time the event is recorded. This still requires calculating the offset at the time of storing the event, but saves calculations at reporting time, which would result in calculation for each event in the report.
For calculating an ‘original event time’ offset, or an offset for the current local time of an observer, the choice is(in order of preference):
- capture the current local time offset at the client device
- use a database the calculate the relevenat offset
Stored Time Zone Offset
- Possible ways to add local time information:
time zone (e.g. Eastern, Central…)not workable due to level of calculation- an independent timestamp field for the local time
- a difference value between UTC time and local time
- local
'time of day'
and'day of week'
While there is a simple list of time zone codes, and each has a current ‘offset’ from UTC, but there is no rule that over time the offset will not change, hence the need for more complex tools such at the tz database mentioned above. However, when storing data in a database, performing lookups and calculations from another database is inefficient. This suggests the use of either 2,3 or 4 when storing original event time offsets.
Full DateTime Properties & Usage
The first step is to consider all ‘properties’, then the next step is to consider for a given implementation, which properties would be stored in ‘backing fields’, and which are best calculated as needed.
The full list of attributes of a datetime that could be used or reported, together with their ‘types’ are:
- UTCDateTime: TimeStamp
- LocalTimeStamp: TimeStamp
- TimeZone
- Label: String
- Offset: Int
- DaylightSavingOffset: Int
- FilterData (read only properties)
- Day of Week: ShortInt
- Day Of Month: ShortInt
- Day Of Year: ShortInt
- MonthNumber: ShortInt
- Time Of Day: TimeDelta
Actual types are language dependant.
The list could be expanded to include 'day of the week name'
and 'month name'
, and functions should be available to provide these, but as these are just simple mappings of the numerical values, they have not been listed as data.
Many of these attributes can be derived from other attributes by simple calculations. However an attribute would be the basis of selection by a database query, there could be advantages to including even redundant data to save on calculations.
Many values would logically be derived by calculations using other properties, but implementation is discussed in the next section, with this section focusing on the values and their uses.
UTCDateTime: TimeStamp
In simple terms, this is the actual time of the event. All other data is purely for reporting and conventient representation and/or display of this data.
This is recorded as a ‘timestamp’, or number of number of seconds/milliseconds that have passed since 00:00:00 UTC on 1 January 1970. There is an ambiguity, as some languages/systems use seconds, while others use milliseconds.
The suggestion is to use milliseconds, with a 64bit integer.
Note: Python uses seconds, but as a float, allowing resolution to go to microseconds. But milliseconds is more common.
LocalTimeStamp: TimeStamp
Storing this information as a timestamp is of questionable value, and can confuse the concept of ‘single source of truth’ being the UTCDataTime. The two main reasons for using local time, it is show (print/display) times in a convenient format, and for grouping events based on similar time patterns when represented in local time. Neither the patterns, groupings or convenient format apply to data in TimeStamp format.
The value ‘localtimestamp’ can provide for an alternative constructors for DateTime objects, and haveing the value as a property provides for using library functions that convert
TimeZone-String: String
A string recording the time zone is useful to confirm to someone seeing the data in what timezone the datetime is being presented. Calculations cannot be done on the basis of timezone codes stored with datetimes, as meanings may change over time.
TimeZone-Offset: TimeDelta
This provides the offset value needed to present an event in the format of a local time.
The calculation should be
LocalTimeStamp = UYTCDateTime + TimeZoneOffset
On this basis the simplest way to store time zone offset would be milliseconds, even though actual time would be an exact number of minutes. If 14 hours is the largest possible offset, then the largest value as minutes would be 840, requiring a 16 bit integer and as milliseconds the value would be 14*60*60*1_000 = 50_400_000
, which would require a 32 bit integer. If storage is critical, storage in minutes would allow using a single byte and converting to milliseconds in the ‘get’ method is possible.
The type is labelled ‘TimeDelta’, which ‘TimeStamps’ then being a TimeDelta relative to Epoch Start (1 January 1970 00:00:00 UT, a point in time known as the Unix epoch)..
DayLightSavingAdjustment: TimeDelta
Do we really need this? The answer is more ‘it is nice to have but not necessary, which means the field should be considered as optional. This means the field should not part of the calcualtion of local time, which time-zone offset being the complete offset including any daylight saving adjustment. This field provides a way of indicating daylight saving is active, together with what the adjustment is for daylight saving for those rare cases where it could other than one hour.
If 2 hours is the largest possible daylight adjustment, then the largest value as minutes would be 120, requiring an 8 bit integer, and as milliseconds the value would be 2*60*60*1_000 = 7_200_000
, which would also require a 32 bit integer. Given that the intent is this field is not to be included in calculations, and most often be effecitvely binary (as any value other than one hour is very rare) in this case storing and presenting the value as minutes is suggested.
FilterData
The main use of these fields is for filtering. With a series of events, it can be useful to filter events collect those that meet local date and local time based criteria within, say a six month period, such as:
- monday morning
- evening peak hour
- week-end lunch time
Consider working from UTCDateTime and timeZone. Events occurring on a Monday morning would mean 26 different time periods to consider, even if there data from only one time zone. Four timezones would mean 4 times as many ranges. Evening peak hour occurs 5 days a week, giving 4*5*26=
416 time ranges, even when only working with six months of data and limited to just four time zones.
Display- Day of Week: ShortInt
The recommendation is to use iso-week day format which, counts from 1, not zero with Monday as the first day. This value used as a filter allows selecting events that occur on specified day(s). Also useful for text representation of dates.
Display-Day Of Month: ShortInt
As in the date. Usefull in building text representations of dates, but less useful for filtering. It is possible data for ‘early in the month’ (day < somevalue) could be useful or compared with late in the month.
Day Of Year: ShortInt
is this useful? This one is of less value.
Month: ShortInt
1 to 12. As in the date. Useful in building text representations of dates, but less useful for filtering. With data collected over many years, use for filtering would increase. How do sales in June compare with August from all previous years? A more common filter would be to chart sales by month. This could be done from a local time-datetime as a time delta, but it much simpler using month number.
Time Of Day: TimeDelta
There is no clear best way to store this, but rather than risk another data type, the suggestion is to return a timedelta. The main use of this field is for reporting/grouping filtering events that occur between specific times on some form of repreated basis.
The data could be done by converting a local time in to hour-minute-second then converting that to seconds, but a property standardises which format to use when there is no clear obvious choice.
Week-Number: ShortInt
This should be the ISO-8601 week numbering.
It is assumed this is mostly useful in presentation of day.
Implementation.
General.
It is suggested to have three main elements:
- the UTC time stored as 64 bit integer.
- the timezone, stored as an object with three attributes (timezonelable, offset, daylight saving)
- filter-display properties (read only)
All of the filter-display data could be calculated on demand. However, since calculation on demand can introduce additional pipeline stages to database retrievals, consideration should be given to whether to store calculated values when storing in a database.
Python.
Python boasts ‘batteries included’: and there ample Date and Time support in the form of libraries included with the Python standard installation. The Recommendation is the use datetime.datetime objects, but to avoid the use of ‘timezone
The Zen of Python states:
There should be one– and preferably only one –obvious way to do it.
The Zen of Python (PEP20)
It is not so clear this goal is met. There are three different standard library modules that can be used to represent date and time data: time, datetime and calendar. In addition, there are numerous packages available on PyPi.
To quote the python calendar page:
See also
Moduledatetime
Object-oriented interface to dates and times with similar functionality to thetime
module.Module
Python Calendar Page.time
Low-level time related functions.
There is sufficient confusion on which is most suited to provide a solution, and some of the confusion is arises from a lack of understanding that there are three different perspectives to reporting dates/times, with different needs. Simplistically:
- time: provides support for time from a UTC or observer perspective
- datetime: adds support for ‘original event’ time
- dateutil / PyTz: improve support for Observer perspective across time zones
I suggest Python datetime.datetime achieves all reporting from any perspective, althouth dateutil can be of use when the need arises to present data in ‘observer’ time for additional time zones.
There are ambiguities, quirks and limitations which results in the datetime
module giving being quite obtuse in what it actually supports. But these can be worked through, an when used well datetime gives as standard library support for key features missing from the time library.
That discussion aside, some helper functions are needed to produce filter/display properties.
These could extension are available through the DateTimeField of the viewmodel class, or could be provided as an independent set of functions, or by way of a subclass of datetime.
Kotlin
Kotlin does not have its own implementation of a date and time. JVM based Kotlin has used typically used java classes for handling date and time. This leaves JavaScript and Kotlin native each using their own underlying platform implementation. There is a need for a cross platform Kotlin Date Time, and this will soon be added.
In the interim, KotlinIdiom provides its own spec, which can be implemented for each platform.
The ‘common main’ definition is as follows:
typealias TimeDelta = Long // count of milliseconds
const val defaultFmt = "YYYY:MM:dd HH:MM:SS"
expect class KIDateTime {
val utcDateTime: TimeDelta // delta from Unix Epoch
val timezone: KITimeZone // UTC as default
/** Local time in minutes since midnight. */
val timeOfDay: Int
/** From 1-7 starting Monday, matching ISO standard. */
val dayOfWeek: Int
fun toStr(format: String = defaultFmt, timeZone: KITimeZone=timezone):String
override fun toStr(): String // = toStr(defaultFmt)
}
expect class KITimeZone {
/** Only value needed for by KIDaDifference calculation */
val offset: TimeDelta
/** Minutes of daylight saving in offset. Usually 0 or 60, null means unknown. */
val dstInOffset: Int?
/** display string for TZ, can be TZ name or TZ abbreviation */
val label: String?
}
The following content is still a work in progress…. come back in 24 hours (or July 20 UTC)
Implemented as
A key question is should datetimes be immutable?
class DateTime(val stampUTC: Long, val timeZone: TimeZone){
// code to be updated
}
Databases: MongoDB
Some databases can have calculated-value fields or ‘computed fields by various techniques, however, generally complex expressions can be add complexity. For example, MongoDB has the ‘aggregation pipeline’ – a process that creates interim values from calculated fields.
It is recommended to store the three key filter properties ‘time of day’ and ‘day of week’ and ‘month number’ as separate values, unless database filter expressions for these values have been verified as simple with the relevant database.
Conclusion: A solution for Python, Kotlin and DB
The exercise of examining dates and time highlights a key difference between Python and and Kotlin. Python comes with ‘batteries included’, while Kotlin is still building the batteries, often using Java ‘batteries’ and a fall-back. Koltin Java DateTime libraries have limitations an