Vertex AI Matching Engine

Summary - The good, the bad and the ugly

Good

  • Autoscaling (min/max nodes) is a nice feature (assuming it works)

  • Has the ability to restrict matches based on user-defined labels on the vectors

  • Potentially supports incremental updates

Bad

  • Index creation, even for 500 cells, took quite a long time (like 30 minutes!)

Ugly

  • UI is missing a lot of features

  • Network configuration is needlessly complicated

  • Some calls use project_id and some use project_number, some use fully-qualified paths others short names

Configuration, Creating and Deploying an Index

Using Vertex AI Matching Engine requires several steps before matching:

  1. Network Configuration (one time)

  2. Create Endpoint (one time)

  3. Create Index

  4. Deploy Index

Constants

# Google project, project number and region to host Vertex AI
export PROJECT_ID="dsp-cell-annotation-service"
export PROJECT_NUMBER=`gcloud projects describe $PROJECT_ID | grep projectNumber | cut -d"'" -f2`
export REGION="us-central1"

# Bucket containing CSV/AVRO of vectors to be searched
export BUCKET_URI="gs://dsp-cell-annotation-service/demo_4m_v2/new_embeddings_for_loading/"
export DIMENSIONS=512
export APPROX_NEIGHBORS_COUNT=100

# Constants, not necessary to change
export VPC_NETWORK="ai-matching"
export PEERING_RANGE_NAME="ann-haystack-range"
export INDEX_ENDPOINT_NAME="casp_index_endpoint"
export INDEX_NAME="casp_index_v1"

Network Configuration

Note

You can skip this if network configuration has already been done. It should be done once per project.

Create the VPC Network

gcloud compute networks create ${VPC_NETWORK} --bgp-routing-mode=regional --subnet-mode=auto --project=${PROJECT_ID}

Add necessary firewall rules

gcloud compute firewall-rules create ${VPC_NETWORK}-allow-icmp --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow icmp

gcloud compute firewall-rules create ${VPC_NETWORK}-allow-internal --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow all --source-ranges 10.128.0.0/9

gcloud compute firewall-rules create ${VPC_NETWORK}-allow-rdp --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow tcp:3389

gcloud compute firewall-rules create ${VPC_NETWORK}-allow-ssh --network ${VPC_NETWORK} --priority 65534 --project ${PROJECT_ID} --allow tcp:22

Reserve IP range

gcloud compute addresses create ${PEERING_RANGE_NAME} --global --prefix-length=16 --network=${VPC_NETWORK} --purpose=VPC_PEERING --project=${PROJECT_ID}

Set up peering with service networking

Note

Your account must have the “Compute Network Admin” role to run the following.

gcloud services vpc-peerings connect --service=servicenetworking.googleapis.com --network=${VPC_NETWORK} --ranges=${PEERING_RANGE_NAME} --project=${PROJECT_ID}

Managing Indexes

Create Index Endpoint (to serve the index)

This step takes several minutes to complete.

Note

You can skip this if the endpoint has already been created. New indexes can be deployed to existing endpoints.

gcloud ai index-endpoints create --display-name ${INDEX_ENDPOINT_NAME} --network projects/${PROJECT_NUMBER}/global/networks/${VPC_NETWORK} --region ${REGION} --project $PROJECT_ID

Create Index

Creating the actual index takes a long time! (~30 minutes even for a small dataset).

# save configuration to a local file
export LOCAL_PATH_TO_METADATA_FILE=/tmp/metadata.json
cat << EOF > ${LOCAL_PATH_TO_METADATA_FILE}
{
  "contentsDeltaUri": "${BUCKET_URI}",
  "config": {
    "dimensions": ${DIMENSIONS},
    "approximateNeighborsCount": ${APPROX_NEIGHBORS_COUNT},
    "distanceMeasureType": "DOT_PRODUCT_DISTANCE",
    "algorithm_config": {
      "treeAhConfig": {
      }
    }
  }
}
EOF

gcloud ai indexes create \
  --metadata-file=${LOCAL_PATH_TO_METADATA_FILE} \
  --display-name=${INDEX_NAME} \
  --project=${PROJECT_ID} \
  --region=${REGION}

This is an async operation; you will have to poll for success (the command is given by the create command above).

For example:

gcloud ai operations describe 2843220864793575424 --index=7139735929568100352 --region us-central1 --project=dsp-cell-annotation-service

Deploy Index

Deploy the index to the endpoint so it can be searched. Several non-intuitive IDs are required to run this step.

# This is an identifier and a display name YOU give for this deployed index (can be the same)
export DEPLOYED_INDEX_ID="deployed_4m_${INDEX_NAME}"
export DISPLAY_NAME=$DEPLOYED_INDEX_ID

# Then we need the endpoint id with a little JQ magic
export ENDPOINT_ID=$(gcloud ai index-endpoints list --region $REGION --project $PROJECT_ID --format json | jq -r ".[] | select (.displayName == \"$INDEX_ENDPOINT_NAME\") | .name ")

# and the id of the index to be deployed
export INDEX_ID=$(gcloud ai indexes list --region $REGION --project $PROJECT_ID --format json | jq -r ".[] | select (.displayName == \"$INDEX_NAME\") | .name ")

gcloud ai index-endpoints deploy-index $ENDPOINT_ID \
  --deployed-index-id=$DEPLOYED_INDEX_ID \
  --display-name=$DISPLAY_NAME \
  --index=$INDEX_ID \
  --min-replica-count 2 \
  --max-replica-count 2

This is an async operation; you will have to poll for success (the command is given by the create command above).

For example:

gcloud ai operations describe 1574402038526115840 --index-endpoint=82032363525111808 --project $PROJECT_ID --region $REGION

Evaluating Performance

Aspects to consider:

  1. Throughput (overall matches per second)

  2. Latency (response time per request)

  3. Scalability (with respect to index size)

  4. Accuracy

  5. Cost

TBD

Cleaning Up (excluding the network setup)

If you want to remove everything, just go in the opposite order from the above.

# Undeploy Index from Endpoint
gcloud ai index-endpoints undeploy-index ${ENDPOINT_ID} --project ${PROJECT_ID} --region ${REGION} --deployed-index-id=${DEPLOYED_INDEX_ID}

# Delete Endpoint
gcloud ai index-endpoints delete ${ENDPOINT_ID} --project ${PROJECT_ID} --region ${REGION}

# Delete Index
gcloud ai indexes delete ${INDEX_ID} --project ${PROJECT_ID} --region ${REGION}