Vertex AI Matching Engine
=========================

Vertex AI is a Google Cloud service that provides a variety of machine learning tools. One of these tools is the Matching Engine, 
which is a vector similarity search service. The Matching Engine is used in CAS to search for cells that are in close proximity in a vector space to the user's submitted cells.

Summary - The good, the bad and the ugly
----------------------------------------

Good
^^^^
- Autoscaling (min/max nodes) is a nice feature (when it works)
- Has the ability to restrict matches based on user-defined labels on the vectors
- Potentially supports incremental updates

Bad
^^^
- Index creation, even for 500 cells, took quite a long time (like 30 minutes!)
- Performance and reliability with just one node is a bit inconsistent

Ugly
^^^^
- UI is missing a lot of features
- Network configuration is needlessly complicated
- Some calls use project_id and some use project_number, some use fully-qualified paths others short names

Configuration, Creating and Deploying an Index
----------------------------------------------

Using Vertex AI Matching Engine requires several steps before matching:

#. Network Configuration (one time)
#. Create Endpoint (one time)
#. Create Index
#. Deploy Index

Constants
^^^^^^^^^

.. code-block:: bash

    # Google project, project number and region to host Vertex AI
    export PROJECT_ID="dsp-cell-annotation-service"
    export PROJECT_NUMBER=`gcloud projects describe $PROJECT_ID | grep projectNumber | cut -d"'" -f2`
    export REGION="us-central1"

    # Bucket containing CSV/AVRO of vectors to be searched
    export BUCKET_URI="gs://dsp-cell-annotation-service/demo_4m_v2/new_embeddings_for_loading/"
    export DIMENSIONS=512
    export APPROX_NEIGHBORS_COUNT=100

    # Constants, not necessary to change
    export VPC_NETWORK="ai-matching"
    export PEERING_RANGE_NAME="ann-haystack-range"
    export INDEX_ENDPOINT_NAME="casp_index_endpoint"
    export INDEX_NAME="casp_index_v1"

Network Configuration
^^^^^^^^^^^^^^^^^^^^^

.. note::

    You can skip this if network configuration has already been done. It should be done once per project.

Create the VPC Network
~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

    gcloud compute networks create ${VPC_NETWORK} --bgp-routing-mode=regional --subnet-mode=auto --project=${PROJECT_ID}

Add necessary firewall rules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

    gcloud compute firewall-rules create ${VPC_NETWORK}-allow-icmp --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow icmp

    gcloud compute firewall-rules create ${VPC_NETWORK}-allow-internal --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow all --source-ranges 10.128.0.0/9

    gcloud compute firewall-rules create ${VPC_NETWORK}-allow-rdp --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow tcp:3389

    gcloud compute firewall-rules create ${VPC_NETWORK}-allow-ssh --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow tcp:22

Reserve IP range
~~~~~~~~~~~~~~~~

.. code-block:: bash

    gcloud compute addresses create ${PEERING_RANGE_NAME} --global --prefix-length=16 --network=${VPC_NETWORK} --purpose=VPC_PEERING --project=${PROJECT_ID}

Set up peering with service networking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note::

    Your account must have the "Compute Network Admin" role to run the following.

.. code-block:: bash

    gcloud services vpc-peerings connect --service=servicenetworking.googleapis.com --network=${VPC_NETWORK} --ranges=${PEERING_RANGE_NAME} --project=${PROJECT_ID}

Managing Indexes
^^^^^^^^^^^^^^^^

Create Index Endpoint (to serve the index)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This step takes several minutes to complete.

.. note::

    You can skip this if the endpoint has already been created. New indexes can be deployed to existing endpoints.

.. code-block:: bash

    gcloud ai index-endpoints create --display-name ${INDEX_ENDPOINT_NAME} --network projects/${PROJECT_NUMBER}/global/networks/${VPC_NETWORK} --region ${REGION} --project $PROJECT_ID

Create Index
~~~~~~~~~~~~

Creating the actual index takes a long time! (~30 minutes even for a small dataset).

.. code-block:: bash

    # save configuration to a local file
    export LOCAL_PATH_TO_METADATA_FILE=/tmp/metadata.json
    cat << EOF > ${LOCAL_PATH_TO_METADATA_FILE}
    {
      "contentsDeltaUri": "${BUCKET_URI}",
      "config": {
        "dimensions": ${DIMENSIONS},
        "approximateNeighborsCount": ${APPROX_NEIGHBORS_COUNT},
        "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
        "algorithm_config": {
          "treeAhConfig": {
          }
        }
      }
    }
    EOF

    gcloud ai indexes create \
      --metadata-file=${LOCAL_PATH_TO_METADATA_FILE} \
      --display-name=${INDEX_NAME} \
      --project=${PROJECT_ID} \
      --region=${REGION}

This is an async operation; you will have to poll for success (the command is given by the create command above).

For example:

.. code-block:: bash

    gcloud ai operations describe 2843220864793575424 --index=7139735929568100352 --region us-central1 --project=dsp-cell-annotation-service

Deploy Index
~~~~~~~~~~~~

Deploy the index to the endpoint so it can be searched. Several non-intuitive IDs are required to run this step.

.. code-block:: bash

    # This is an identifier and a display name YOU give for this deployed index (can be the same)
    export DEPLOYED_INDEX_ID="deployed_4m_${INDEX_NAME}"
    export DISPLAY_NAME=$DEPLOYED_INDEX_ID

    # Then we need the endpoint id with a little JQ magic
    export ENDPOINT_ID=$(gcloud ai index-endpoints list --region $REGION --project $PROJECT_ID --format json | jq -r ".[] | select (.displayName == \"$INDEX_ENDPOINT_NAME\") | .name ")

    # and the id of the index to be deployed
    export INDEX_ID=$(gcloud ai indexes list --region $REGION --project $PROJECT_ID --format json | jq -r ".[] | select (.displayName == \"$INDEX_NAME\") | .name ")

    gcloud ai index-endpoints deploy-index $ENDPOINT_ID \
      --deployed-index-id=$DEPLOYED_INDEX_ID \
      --display-name=$DISPLAY_NAME \
      --index=$INDEX_ID \
      --min-replica-count 2 \
      --max-replica-count 2

This is an async operation; you will have to poll for success (the command is given by the create command above).

For example:

.. code-block:: bash

    gcloud ai operations describe 1574402038526115840 --index-endpoint=82032363525111808 --project $PROJECT_ID --region $REGION

Search!
-------

Searching can only be performed from compute on the same network that was configured above with the proper peering settings. The easiest way to do this is to create a Notebook instance and under the Networking configuration choose the VPC network created in the above steps (i.e., ``ai-matching`` in this example).

The DIMENSIONS, ENDPOINT_ID, and DEPLOYED_INDEX_ID variables should have the value from above.

Then from that notebook VM:

.. code-block:: python

    from google.cloud import aiplatform
    import numpy as np

    DIMENSIONS=75
    ENDPOINT_ID="projects/350868384795/locations/us-central1/indexEndpoints/82032363525111808"
    DEPLOYED_INDEX_ID="deployed_casp_index_v1"

    # locate the endpoint
    ep = aiplatform.MatchingEngineIndexEndpoint(index_endpoint_name=ENDPOINT_ID)

    # generate a random vector to search with
    emb1 = np.random.randn(75)

    # perform the query
    response = index_endpoint.match(deployed_index_id=DEPLOYED_INDEX_ID, queries=[emb1], num_neighbors=25)

    # response is an array of results where each result is an array of MatchNeighbor objects
    for result in response:
        for match in result:
            print(f"ID:{match.id} DISTANCE:{match.distance}")

Evaluating Performance
----------------------

Aspects to consider:

#. Throughput (overall matches per second)
#. Latency (response time per request)
#. Scalability (with respect to index size)
#. Accuracy
#. Cost

TBD

Cleaning Up (excluding the network setup)
-----------------------------------------

If you want to remove everything, just go in the opposite order from the above.

.. code-block:: bash

    # Undeploy Index from Endpoint
    gcloud ai index-endpoints undeploy-index ${ENDPOINT_ID} --project ${PROJECT_ID} --region ${REGION} --deployed-index-id=${DEPLOYED_INDEX_ID}

    # Delete Endpoint
    gcloud ai index-endpoints delete ${ENDPOINT_ID} --project ${PROJECT_ID} --region ${REGION}

    # Delete Index
    gcloud ai indexes delete ${INDEX_ID} --project ${PROJECT_ID} --region ${REGION}