53

extract-date: Extracts Date from an Arbitrary Text Input

 5 years ago
source link: https://www.tuicool.com/articles/hit/NNvMnm3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

extract-date :date:

Extracts date from an arbitrary text input.

Features

  • Deterministic and unambiguous date parsing (input must include year; seeDate resolution without year)
  • No date format configuration.
  • Recognises relative dates (yesterday, today, tomorrow).
  • Recognises weekday's (Monday, Tuesday, etc.).
  • Supports timezones (for relative date resolution) and locales.

Motivation

I am creating a large scale data aggregation platform ( https://applaudience.com/ ). I have observed that the date-matching patterns and site specific date validation logic is repeating and could be abstracted into a universal function as long as minimum information about the expected pattern is provided (such as the direction configuration). My motivation for creating such abstraction is to reduce the amount of repetitive logic that we use to extract dates from multiple sources.

Use case

The intended use case is extracting date of future events from blobs of text that may contain auxiliary information, e.g. 'Event at 14:00 2019-01-01 (2D)'.

The emphasis on the future events is because resolving dates such 'today' (relative dates) and 'Wednesday' (weekday dates) requires knowing the offset date. If your input sources refer predominantly to future events, then the ambiguity can be resolved using the present date.

Usage

import extractDate from 'extract-date';

extractDate('extracts date from anywhere within the input 2000-01-02');
// 2000-01-02

extractDate('extracts only the first date from the input 2000-01-02, 2000-01-03');
// 2000-01-02

extractDate('produces a null when date is ambiguous 02/01/2000');
// null

extractDate('uses `format` to resolve ambiguous dates 02/01/2000', {format: 'DMY'});
// 2000-01-02

extractDate('uses `timezone` to resolve relative dates such as today or tomorrow', {timezone: 'Europe/London'});
// 2000-01-02 (assuming today is 2000-01-02)

Configuration

Name Description Default format Token identifying the order of numeric date attributes within the string. Possible values: DMY, DYM, YDM, YMD. Used to resolve ambiguous dates, e.g. DD/MM/YYYY and MM/DD/YYYY. N/A maximumAge SeeDate resolution without year. Infinity minimumAge SeeDate resolution without year. Infinity timezone TZ database name . Used to resolve relative dates ("Today", "Tomorrow"). N/A

Resolution of ambiguous dates

Date resolution without year

When year is not part of the input (e.g. March 2nd), then minimumAge and maximumAge configuration determines the year value.

minimumAge
maximumAge

Example:

  • If the current date is 2000-12-01 and the parsed date is 10-01, then the month difference is -2.

    minimumAge
    minimumAge
    
  • If the current date is 2000-01-01 and the input date is 10-01, then the month difference is 9.

    maximumAge
    maximumAge
    

Note: minimumAge comparison is done using absolute difference value.

Implementation

Note: This section of the documentation is included for contributors.

  • extract-date includes a collection of formats ( ./src/formats.js ).
  • The formats are attempted in the order of their specificity, i.e. "YYYY-MM-DD" is attempted before "MM-DD".
  • Formats are attempted against a tokenised version of the input (see).
  • The first format that can extract the date is used.

Input tokenisation

extract-date

Example:

Given input "foo bar baz qux" and format:

{
  direction: 'YMD',
  localised: false,
  momentFormat: 'YYYY MM.DD',
  wordCount: 2,
  yearIsExplicit: true
}

Input is broken down into:

  • "foo bar"
  • "bar baz"
  • "baz qux"

collection and the format is attempted against each phrase until a match is found.

Format specification

Field Description direction Identifies the order of numeric date attributes within the string. Possible values: DMY, DYM, YDM, YMD. Used to resolve ambiguous dates, e.g. DD/MM/YYYY and MM/DD/YYYY. localised Identifies if the date is localised, i.e. includes names of the week day or month. A format that is localised is used only when locale configuration is provided. momentFormat Identifies moment format used to attempt date extraction. moment is evaluated using the strict parser option. wordCount Identifies how many words make up the date format. yearIsExplicit Identifies whether the date format includes year.

Example formats:

{
  direction: 'YMD',
  localised: false,
  momentFormat: 'YYYY.MM.DD',
  wordCount: 1,
  yearIsExplicit: true
},
{
  direction: 'DD MMMM',
  localised: true,
  momentFormat: 'DD MMMM',
  wordCount: 2,
  yearIsExplicit: false
},

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK