Reply To: Time intervals

#11468
Dan Damelin
Keymaster

Tim I’m not sure we can adequately address this without having a much more robust way of identifying and interpreting date-time formats. As Bill mentioned, if we can interpret something as a date then we can find the difference (in seconds) between them without the number() function. As you pointed out, the number function only works if the date can be correctly interpreted as a date, so I guess we really don’t need to go the number() route.

This date-time stuff is complicated. A good percentage of the questions I field from teachers is around date-time issues. Usually, the solution falls into a few options:

  1. If most of the entries are recognized as dates, then fix those that don’t conform or set the attribute type to “date” which will work for those cases where the string can be interpreted as a date, ignoring the rest.
  2. If none of the dates can be interpreted as dates, then reformat all of what they have. Sometimes this involves combining a date attribute with a time attribute to get a single date-time that can be used on graphs. (I usually suggest using the concat() function).
  3. If it’s a real mixed bag of date formats that are sometimes recognized as dates and other times not, then I suggest they really need to dive in and create something more consistent.

It seems that we might make this easier in a number of ways:

  • Recognize many more date-time formats than we currently do, perhaps with some fuzzy logic to take what might be a date format and treating it like one.
  • Ease up on the way we treat mixtures of data types for a single attribute. For now if we have 2000 numbers and one text string, CODAP treats the attribute as categorical by default. Perhaps we could allow a certain percentage of values that don’t conform and consider the default attribute type to be the one that matches 95% of the cases. This would allow for the occasional missing data indicator (sometimes a string in an otherwise numerical attribute), or the occasional typo in date-time values.
  • Be more smart about how to respond when they try to make a graph where almost every case is a unique category. In the situation where most of the values could be interpreted as a number or date, but a few are strings, we could default to assuming the number or date type, or we could point out the issue and suggest some ways to resolve it.