Updated Jun-2024 Databricks-Machine-Learning-Professional Exam Practice Test Questions [Q28-Q49]

4.4/5 - (5 votes)

Updated Jun-2024 Databricks-Machine-Learning-Professional Exam Practice Test Questions

Verified Databricks-Machine-Learning-Professional dumps Q&As 100% Pass in First Attempt Guaranteed Updated Dump

Databricks Databricks-Machine-Learning-Professional Exam Syllabus Topics:

Topic	Details
Topic 1	Identify less performant data storage as a solution for other use cases Describe why complex business logic must be handled in streaming deployments
Topic 2	Identify the requirements for tracking nested runs Describe an MLflow flavor and the benefits of using MLflow flavors
Topic 3	Test whether the updated model performs better on the more recent data Identify when retraining and deploying an updated model is a probable solution to drift
Topic 4	Identify live serving benefits of querying precomputed batch predictions Describe Structured Streaming as a common processing tool for ETL pipelines
Topic 5	Identify JIT feature values as a need for real-time deployment Describe how to list all webhooks and how to delete a webhook
Topic 6	Create, overwrite, merge, and read Feature Store tables in machine learning workflows View Delta table history and load a previous version of a Delta table
Topic 7	Describe model serving deploys and endpoint for every stage Identify scenarios in which feature drift and or label drift are likely to occur
Topic 8	Identify that data can arrive out-of-order with structured streaming Identify how model serving uses one all-purpose cluster for a model deployment

QUESTION 28
Which of the following is a benefit of logging a model signature with an MLflow model?

The model will have a unique identifier in the MLflow experiment

The schema of input data can be validated when serving models

The model can be deployed using real-time serving tools

The model will be secured by the user that developed it

The schema of input data will be converted to match the signature

QUESTION 29
A data scientist would like to enable MLflow Autologging for all machine learning libraries used in a notebook. They want to ensure that MLflow Autologging is used no matter what version of the Databricks Runtime for Machine Learning is used to run the notebook and no matter what workspace-wide configurations are selected in the Admin Console.
Which of the following lines of code can they use to accomplish this task?

mlflow.sklearn.autolog()

mlflow.spark.autolog()

spark.conf.set(“autologging”, True)

It is not possible to automatically log MLflow runs.

mlflow.autolog()

QUESTION 30
Which of the following operations in Feature Store Client fs can be used to return a Spark DataFrame of a data set associated with a Feature Store table?

fs.create_table

fs.write_table

fs.get_table

There is no way to accomplish this task with fs

fs.read_table

QUESTION 31
A machine learning engineer has registered a sklearn model in the MLflow Model Registry using the sklearn model flavor with UI model_uri.
Which of the following operations can be used to load the model as an sklearn object for batch deployment?

mlflow.spark.load_model(model_uri)

mlflow.pyfunc.read_model(model_uri)

mlflow.sklearn.read_model(model_uri)

mlflow.pyfunc.load_model(model_uri)

mlflow.sklearn.load_model(model_uri)

QUESTION 32
A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in the model_uri variable and its Run ID in the run_id variable. They have also determined that the model was logged with the name “model”. Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name “best_model”.
Which of the following lines of code can they use to register the model to the MLflow Model Registry?

mlflow.register_model(model_uri, “best_model”)

mlflow.register_model(run_id, “best_model”)

mlflow.register_model(f”runs:/{run_id}/best_model”, “model”)

mlflow.register_model(model_uri, “model”)

mlflow.register_model(f”runs:/{run_id}/model”)

QUESTION 33
A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:
1. Deploy a model to production and compute predicted values
2. Obtain the observed (actual) label values
3. _____
4. Run a statistical test to determine if there are changes over time
Which of the following should be completed as Step #3?

Obtain the observed values (actual) feature values

Measure the latency of the prediction time

Retrain the model

None of these should be completed as Step #3

Compute the evaluation metric using the observed and predicted values

QUESTION 34
A data scientist has created a Python function compute_features that returns a Spark DataFrame with the following schema:

The resulting DataFrame is assigned to the features_df variable. The data scientist wants to create a Feature Store table using features_df.
Which of the following code blocks can they use to create and populate the Feature Store table using the Feature Store Client fs?

features_df.write.mode(“fs”).path(“new_table”)

features_df.write.mode(“feature”).path(“new_table”)

QUESTION 35
Which of the following is a simple statistic to monitor for categorical feature drift?

Mode

None of these

Mode, number of unique values, and percentage of missing values

Percentage of missing values

Number of unique values

QUESTION 36
A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.
Which of the following deployment strategies can be used to meet these requirements?

Edge/on-device

Streaming

None of these strategies will meet the requirements.

Batch

Real-time

QUESTION 37
A machine learning engineer and data scientist are working together to convert a batch deployment to an always-on streaming deployment. The machine learning engineer has expressed that rigorous data tests must be put in place as a part of their conversion to account for potential changes in data formats.
Which of the following describes why these types of data type tests and checks are particularly important for streaming deployments?

All of these statements

Because the streaming deployment is always on, there is no practitioner to debug poor model performance

None of these statements

Because the streaming deployment is always on, there is a need to confirm that the deployment can autoscale

Because the streaming deployment is always on, all types of data must be handled without producing an error

QUESTION 38
Which of the following deployment paradigms can centrally compute predictions for a single record with exceedingly fast results?

Streaming

Batch

Edge/on-device

None of these strategies will accomplish the task.

Real-time

QUESTION 39
A machine learning engineer is manually refreshing a model in an existing machine learning pipeline. The pipeline uses the MLflow Model Registry model “project”. The machine learning engineer would like to add a new version of the model to “project”.
Which of the following MLflow operations can the machine learning engineer use to accomplish this task?

mlflow.register_model

MlflowClient.update_registered_model

mlflow.add_model_version

MlflowClient.get_model_version

The machine learning engineer needs to create an entirely new MLflow Model Registry model

QUESTION 40
A machine learning engineer is using the following code block as part of a batch deployment pipeline:

Which of the following changes needs to be made so this code block will work when the inference table is a stream source?

Replace “inference” with the path to the location of the Delta table

Replace schema(schema) with option(“maxFilesPerTriqqer”, 1}

Replace spark.read with spark.readStream

Replace formatfdelta”) with format(“stream”)

Replace predict with a stream-friendly prediction function

QUESTION 41
Which of the following MLflow Model Registry use cases requires the use of an HTTP Webhook?

Starting a testing job when a new model is registered

Updating data in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage

Sending an email alert when an automated testing Job fails

None of these use cases require the use of an HTTP Webhook

Sending a message to a Slack channel when a model version transitions stages

QUESTION 42
A data scientist has developed a scikit-learn random forest model model, but they have not yet logged model with MLflow. They want to obtain the input schema and the output schema of the model so they can document what type of data is expected as input.
Which of the following MLflow operations can be used to perform this task?

mlflow.models.schema.infer_schema

mlflow.models.signature.infer_signature

mlflow.models.Model.get_input_schema

mlflow.models.Model.signature

There is no way to obtain the input schema and the output schema of an unlogged model.

QUESTION 43
A data scientist set up a machine learning pipeline to automatically log a data visualization with each run. They now want to view the visualizations in Databricks.
Which of the following locations in Databricks will show these data visualizations?

The MLflow Model Registry Model paqe

The Artifacts section of the MLflow Experiment page

Logged data visualizations cannot be viewed in Databricks

The Artifacts section of the MLflow Run page

The Figures section of the MLflow Run page

QUESTION 44
In a continuous integration, continuous deployment (CI/CD) process for machine learning pipelines, which of the following events commonly triggers the execution of automated testing?

The launch of a new cost-efficient SQL endpoint

CI/CD pipelines are not needed for machine learning pipelines

The arrival of a new feature table in the Feature Store

The launch of a new cost-efficient job cluster

The arrival of a new model version in the MLflow Model Registry

QUESTION 45
A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.
Which of the following tools can be used to provide this type of continuous processing?

Spark UDFs

[Structured Streaming

MLflow
D Delta Lake

AutoML

QUESTION 46
A machine learning engineer is attempting to create a webhook that will trigger a Databricks Job job_id when a model version for model model transitions into any MLflow Model Registry stage.
They have the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so that the code block accomplishes the task?

“MODEL_VERSION_CREATED”

“MODEL_VERSION_TRANSITIONED_TO_PRODUCTION”

“MODEL_VERSION_TRANSITIONED_TO_STAGING”

“MODEL_VERSION_TRANSITIONED_STAGE”

“MODEL_VERSION_TRANSITIONED_TO_STAGING”, “MODEL_VERSION_TRANSITIONED_TO_PRODUCTION”

QUESTION 47
Which of the following is a reason for using Jensen-Shannon (JS) distance over a Kolmogorov-Smirnov (KS) test for numeric feature drift detection?

All of these reasons

JS is not normalized or smoothed

None of these reasons

JS is more robust when working with large datasets

JS does not require any manual threshold or cutoff determinations

QUESTION 48
A data scientist has developed a model model and computed the RMSE of the model on the test set. They have assigned this value to the variable rmse. They now want to manually store the RMSE value with the MLflow run.
They write the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?