Data science API documentation (1.0.0)

Download OpenAPI specification:Download

Data science team group email: ml-user@yapily.com

This is the documentation for data science services. Currently, these all run on AWS and can be accessed through a single gateway.

Services currently offered:

  1. Transaction categorisation
  2. Merchant extraction
  3. Transaction stream classification
  4. Balance Prediction

Server authentication is by api key - please contact a member of the team for the key

Merchants and Categories

Allows clients to get merchant and category information for corresponding transactions. This information is extracted primarily from the transactionInformation string, but we may also process the proprietary and ISO codes to determine if the transaction is cash, bank fees, etc.

Get a list of the categories we output

Returns a list of all categories, subcategories and ids returned by data science

Authorizations:
DataScienceApiKey

Responses

Response samples

Content type
application/json
{
  • "statusCode": 0,
  • "body": {
    }
}

Get aggregated categories

Aggregates categories from all relevant sources. This endpoint should be used as the primary endpoint for categories

Authorizations:
DataScienceApiKey
header Parameters
application-id
required
string <uuid>

The application id of the application

Request Body schema: application/json
Array
hash
required
string (Transaction hash)
reference
string (Transaction reference)
description
required
string (Transaction Description)
institution
required
string (Institution)
bookingDateTime
required
string (Booking date time)
amount
number (Transaction amount)
proprietaryBankTransactionCode
string (Proprietary bank transaction code)
isoBankFamilyCode
string (Iso bank family code)
isoBankSubFamilyCode
string (Iso bank subfamily code)

Responses

Request samples

Content type
application/json
[
  • {
    }
]

Response samples

Content type
application/json
[
  • {
    }
]

Get categories page

Gets categories for a page of transactions. This uses a machine learning classifier, based on the transaction description, amount and date. This endpoint is no longer in use by the main Yapily gateway, and is only consumed by other data science applications.

Authorizations:
DataScienceApiKey
Request Body schema: application/json
required
Array of objects (Data to be categorised)

Responses

Request samples

Content type
application/json
{
  • "instances": [
    ]
}

Response samples

Content type
application/json
{
  • "Data": {
    }
}

Get merchants

Returns merchants, categories, and where possible:

  • Locations
  • Payment processors
  • Corrected dates for a page of transactions
  • This endpoint should not be consumed directly - use the category aggregator to fetch this category information as that will correctly combine other sources.

    Authorizations:
    DataScienceApiKey
    header Parameters
    application-id
    required
    string <uuid>

    The application id of the application

    Request Body schema: application/json
    Array
    hash
    required
    string (Transaction hash)
    reference
    string (Transaction reference)
    description
    required
    string (Transaction Description)
    institution
    required
    string (Institution)
    bookingDateTime
    required
    string (Booking date time)
    amount
    number (Transaction amount)
    proprietaryBankTransactionCode
    string (Proprietary bank transaction code)
    isoBankFamilyCode
    string (Iso bank family code)
    isoBankSubFamilyCode
    string (Iso bank subfamily code)

    Responses

    Request samples

    Content type
    application/json
    [
    • {
      }
    ]

    Response samples

    Content type
    application/json
    [
    • {
      }
    ]

    The Transaction Stream Classifier

    The Transaction Stream Classifier identifies transactions streams for a user. For a given user's transactions, the transaction stream classifier determines which of their transactions form regular streams, rather than one-time incomes or expenditures. For each of those streams, the classifier isolates the transactions belonging to that stream.

    It also determines:

  • Frequency of transactions (eg. "monthly")
  • More detailed frequency (eg. "monthly on the last working day of the month")
  • A score that describes how well the transaction stream fits that schedule.
  • The transaction stream classifier predicts when the next transaction will be (or would have been, if the stream is no longer current). For convenience, the classifier also returns the dates of the first and last transactions in the stream

    Finally, the classifier returns the average amount of the transactions in the stream, and an amount consistency score, calculated as 1-v where v is the coeffient of variation of the amounts in the stream.

    Identify transaction streams for a user

    The endpoint identifies transaction streams from a payload of transactions. This endpoint returns the transactions with an additional transaction streams object containing:

  • The transaction schedule
  • Frequency
  • A consistency score that describes how well the transaction stream fits that schedule.
  • PLEASE NOTE: the transaction stream classifier requires a contiguous block of transactions - at a minimum three months, but ideally a year. The transaction stream clasifier also benefits from receiving transactions from all of a users accounts. This allows us to

    1. Remove internal transfers by matching them across the user's accounts and
    2. Track transaction streams across multiple accounts, which will happen if a user changes bank.
    Authorizations:
    DataScienceApiKey
    Request Body schema: application/json
    required
    Array of objects (User accounts) non-empty

    Responses

    Request samples

    Content type
    application/json
    {
    • "accounts": {
      }
    }

    Response samples

    Content type
    application/json
    {
    • "enrichment": {
      }
    }

    The Balance Prediction Engine

    The Balance Prediction Engine predicts a user's finances over the next 30 days. It consumes raw transactions: as many as possible ideally, but if necessary it can function with only two months' data.

    For each day, it outputs the median predicted balance - the prediction itself - and the 10% and 90% confidence bounds to give a sense of how accurate the prediction is expected to be. For convenience it also outputs the historic balances over the last 60 days as well.

    Get balance predictions

    Returns balance predictions for an array of accounts

    Authorizations:
    DataScienceApiKey
    Request Body schema: application/json
    required
    Array of objects (Entire balance prediction user)

    Responses

    Request samples

    Content type
    application/json
    {
    • "accounts": [
      ]
    }

    Response samples

    Content type
    application/json
    {
    • "enrichedAccounts": [
      ]
    }

    The Category Model

    category_2
    required
    string (Category 2)
    Enum: "BILLS" "CHARITY & GIFTS" "EATING OUT" "ENTERTAINMENT" "INVESTMENT & SAVINGS" "GENERAL" "GROCERIES" "PERSONAL CARE" "HOME" "INSURANCE" "OTHER" "SHOPPING" "TRANSPORT" "TRAVEL & HOLIDAY" "FEES & CHARGES"

    The category of the transaction - called category 2 here for historical reasons

    category_3
    string (Category 3)

    The subcategory of the transaction - called category_3 here for historical reasons

    source
    string (Category Source)
    Enum: "MODEL" "MERCHANT" "CODE" "KEYWORD"

    The source of the categorisation for the transaction.

    {
    • "categories": {
      }
    }

    The Transaction Model

    hash
    required
    string (Transaction hash)
    reference
    string (Transaction reference)
    description
    required
    string (Transaction Description)
    institution
    required
    string (Institution)
    bookingDateTime
    required
    string (Booking date time)
    amount
    number (Transaction amount)
    proprietaryBankTransactionCode
    string (Proprietary bank transaction code)
    isoBankFamilyCode
    string (Iso bank family code)
    isoBankSubFamilyCode
    string (Iso bank subfamily code)
    {
    • "hash": "f409f4c937d84cf0311561404f026d54",
    • "description": "TESCO PFS BASINGSTOKE 2020/07/30 3803",
    • "reference": "TESCO PFS",
    • "institution": "lloyds",
    • "bookingDateTime": "2020-08-01 00:00:00",
    • "amount": 20.1,
    • "proprietaryBankTransactionCode": "DEB",
    • "isoBankFamilyCode": "CCRD",
    • "isoBankSubFamilyCode": "POSD"
    }

    The Merchant Model

    merchantName
    required
    string (Merchant name)

    Name of the merchant

    required
    object (categories)
    parentGroup
    string (Parent group)

    Wider group of merchants to which the merchant belongs. Could be a brand such as Virgin, or a single company that spans multiple categories such as Tesco (Tesco Petrol, Tesco Bank etc.)

    object (Thumbnail Urls)
    {
    • "merchantName": "TESCO PETROL",
    • "parentGroup": "TESCO",
    • "categories": {
      }
    }

    The Enrichment Model

    hash
    required
    string (Transaction Hash)
    object (Merchant)
    location
    string (Location)

    If we're able to extract a location from the transaction information, it goes here

    paymentProcessor
    string (Payment processor)

    If we're able to extract a payment processor (such as paypal, square) from the transaction information, it goes here

    correctedDate
    string (Corrected date)

    Where the transaction information contains a date, it goes here. This date is generally pre-clearing and so is likely to be more accurate than bookingDateTime as to when the transaction actually occurred

    object (categories)
    {
    • "hash": "f409f4c937d84cf0311561404f026d54",
    • "merchant": {
      },
    • "location": "Basingstoke",
    • "correctedDate": "2020-07-30"
    }