0

Step 4: Use Your Own Data

 1 year ago
source link: https://blogs.sap.com/2022/06/25/step-4-use-your-own-data/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
June 25, 2022 6 minute read

Step 4: Use Your Own Data

In this part of our guide, we’ll repeat all the Personalized Recommendation processes from the preceding steps – but this time using your own dataset.

Prepare the Data

The training endpoint can only receive an input in the form of a zipped file. The zipped file contains two files: the Clickstream file and the Item Catalogue file. Here’s an overview of these files.

  1. Clickstream fileOverview

    Clickstream data is essential in understanding customer behavior based on the sequences of historical item interactions for each user. This data enables the model to do the following:
    • Learn relations between items that appear in the same context – for example, user session
    • Identify browsing patterns for each user, providing the basis for personalized recommendations – for example, for users with different histories interacting with the same item
    • (If the item catalogue is provided) learn relations between items and their attributes – for example, from consecutive items in the same session belonging to the same category or having a similar name
    • (If the user metadata is provided) learn relations between users’ attributes and item interaction sequences – for example, users with the same attribute values have different histories of item interactions

To cover a wide range of recommendation domains or business scenarios, the clickstream schema (presented in the following section) is domain agnostic. Data size limits and content recommendations are provided in the subsequent sections.


Data schema

The clickstream is a CSV file with the following columns:

    • userId – raw user ID (string)
    • itemId – raw item ID (string)
    • timestamp – timestamp (float)

Each row represents an event defined by the tuple (user, item, time). Note the agnostic nature of the event definition: It may refer to any type of user-item interaction – for example, view, add to cart, subscribe, rate, and so on. When preparing the clickstream data to fit this format, it’s up to you to determine which type of events to export. (All events are treated as one type by the model).

Click this link to download a sample clickstream file for your reference.

  1. Item catalogueOverviewThe item catalogue provides additional information on the items in the form of categorical and text features. This information can be harnessed by the model in the following ways:
    • To improve the quality of the predictions: More item information results in more patterns in the clickstream data being identified and forecasted.
    • To support item cold start: approximate item representations (for example, embeddings) based on attribute representations
    • To give flexibility during inference: Item attributes may be customized (modified attribute values and/or weights), influencing the recommendations.
    • To enable smart search: As the model learns to represent words and item attributes, it can take them as inputs to recommend relevant items.

To avoid ambiguity in interpreting the attributes and standardizing the catalogue format, the data schema has been fixed, as described in the next section. Additional information on size limits and recommended content of the catalogue is provided in the subsequent sections.

Therefore, while not mandatory, we highly recommend having the item catalogue accompany the clickstream data to unlock the full potential of the recommendation engine.


Data Schema

The item catalogue is a JSON file with the following structure:

{
    raw item ID (string): {
        'categoricalFeatures': {
            categorical attribute (string): {
                'values': attribute values (list of strings),
                'weights' (optional): attribute weights (list of float)
            },
            ...
        },
        'textFeatures': {
            text attribute (string): attribute value (string),
            ...
        },

        'numericalFeatures': {
            numerical attribute (string): attribute value (float),
            ...
        },
        'unavailable' (optional): availability flag (bool)
    }
}

item_catalogue schema in json format

Training the Model

To train the model, navigate to the Training section, click the file-upload dropdown, and then click the Try it out button.

image001-5.png

Training Trigger API in the Swagger API page

Some actions required for this are:

  1. Enter tenant name.
    You must enter a name for the tenants, so please input a name of your choice.
  2. Upload Data.
    Upload the newly created data.
  3. Enter site name.
    A site name is also required, but if you leave this field blank, it’s automatically filled with “default”.
    image002-5.png

    Tenant, training data and site fields in the training trigger API request

  4. Set serve_model=True.
    Make this setting to automatically deploy the real-time model serving instance.
    image003-4.png

    serve_model field in the training trigger API request

  5. Click the Execute Bear in mind that users with the free tier model for SAP BTP are limited to only two training endpoint executions. When you call the third training API, you get a 403 response stating that you have exceeded your quota for the month.

After you trigger the training, there are two possible outcomes.

  1. Still ongoing, where the status stated as “PENDING”, and code 200
    image004-5.png

    Status PENDING in the API response

  2. Conflict, where the trigger clashed with the ongoing training from the previous trigger. The status is “Previously submitted job still in progress”, and the code is 409.
    image005-3.png

    Error Conflict in the API response

To check the status of the ongoing job, navigate to the “latest” path, click “Try it out”, and enter tenant name and site name (must be exactly the same as in the previous step).

image006-4.png

GET training job status API request

After you click Execute, there are three possible outcomes:

  1. Job is still ongoing, with the status “SUBMITTED” and code 200. Please wait for around 5–10 minutes before rechecking the progress.
    image007-3.png

    Status SUBMITTED in the API response

  2. The job is completed, with the status “SUCCEEDED” and code 200. You can now proceed to the next step.
    image008-3.png

    Status SUCCEEDED in the API response

  3. The job fails, with “FAILED” and code 200. Please retrigger the training and make sure that you have entered all information correctly.
    image009-3.png

    Status FAILED in the API response

Calling an Inference

For the real-time inference call, there are three available inference endpoints, each with different use cases. For our example, however, we’ll use next-item, so navigate to and click the next-items endpoint and then click the Try it out button.

image010-3.png
next-item recommendation API in the Swagger API page

After you open the next-items dropdown, you must complete some actions similar to the ones you completed in the training steps:

  1. Enter tenant name.
    You must use the same tenant name that you entered during the training process.
  2. Insert Payload.
    Here, you provide all the relevant inference input data in the payload. Each of the different inference endpoints has different requirements: For “next_items”, the items_ls parameter is required while the other parameters are not required (but imputable). The items_ls parameter is a list of item_id representing the past item interactions (clickstream) of the user to generate the recommendations.

For this parameter to be valid, the input must meet the following requirements:

  • Correspond to an object entry in the item_catalogue training data used to train the model, or
  • Be provided as an entry in the metadata parameter as a cold start item, or
  • Be provided as a cold start item via the ‘metadata update’ feature

Taking an example from our sample dataset, insert a payload with the content of:
{ “items_ls”: [“2858”] }

For more details regarding the payload input, please refer to this documentation.

After clicking Execute, you can expect the following responses:

  1. The training process has not finished yet. This returns a 404 code, stating that no model instances were found.image011-3.png

    Error Not Found in the API response

  2. The user inputted an incorrect payload. This returns a 400 code, stating that the model doesn’t understand the payload request.image012-3.png

    Error Bad Request in the API response

  3. The model is able to understand the request and successfully return a set of recommendations. This returns a 200 code, stating the recommended items with their respective confidence scores.
    image013-2.png

    Recommendation results in the API response

  4. Forbidden. The user has exceeded their inference quota for the month. A short message is displayed with code 403.image014-2.png

    Error Forbidden in the API response

Cheers

At this point, we have successfully completed all steps. If you have encountered any issues, feel free to leave a comment below. My team will definitely help you out. Alternatively, check out the Q&A area in the community or visit our community page to browse our use cases and learning materials.

Feel free to follow my profile and stay tuned for the next blog post. See you in the next blog!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK