4) ML Engineer
![]() |
![]() |
![]() |
Título del Test:![]() 4) ML Engineer Descripción: 4) ML Engineer Fecha de Creación: 2024/02/13 Categoría: Otros Número Preguntas: 25
|




Comentarios |
---|
NO HAY REGISTROS |
You are working on a classification problem with time series data. After conducting just a few experiments using random cross-validation, you achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) value of 99% on the training data. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?. A. Address the model overfitting by using a less complex algorithm and use k-fold cross-validation. B. Address data leakage by applying nested cross-validation during model training. C. Address data leakage by removing features highly correlated with the target value. D. Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value. You need to execute a batch prediction on 100 million records in a BigQuery table with a custom TensorFlow DNN regressor model, and then store the predicted results in a BigQuery table. You want to minimize the effort required to build this inference pipeline. What should you do?. A. Import the TensorFlow model with BigQuery ML, and run the ml.predict function. B. Use the TensorFlow BigQuery reader to load the data, and use the BigQuery API to write the results to BigQuery. C. Create a Dataflow pipeline to convert the data in BigQuery to TFRecords. Run a batch inference on Vertex AI Prediction, and write the results to BigQuery. D. Load the TensorFlow SavedModel in a Dataflow pipeline. Use the BigQuery I/O connector with a custom function to perform the inference within the pipeline, and write the results to BigQuery. You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality greater than 10,000 unique values. How should you encode these categorical values as input into the model?. A. Convert each categorical value into an integer value. B. Convert the categorical string data to one-hot hash buckets. C. Map the categorical variables into a vector of boolean values. D. Convert each categorical value into a run-length encoded string. You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000 unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?. A. Create a hot-encoding of words, and feed the encodings into your model. B. Identify word embeddings from a pre-trained model, and use the embeddings in your model. C. Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model. D. Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model. You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms@p99, the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to Implement the simplest solution. How should you configure the prediction pipeline?. A. Embed the client on the website, and then deploy the model on AI Platform Prediction. B. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Firestore for writing and for reading the user’s navigation context, and then deploy the model on AI Platform Prediction. C. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the user’s navigation context, and then deploy the model on AI Platform Prediction. D. Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the user’s navigation context, and then deploy the model on Google Kubernetes Engine. Your data science team has requested a system that supports scheduled model retraining, Docker containers, and a service that supports autoscaling and monitoring for online prediction requests. Which platform components should you choose for this system?. A. Vertex AI Pipelines and App Engine. B. Vertex AI Pipelines, Vertex AI Prediction, and Vertex AI Model Monitoring. C. Cloud Composer, BigQuery ML, and Vertex AI Prediction. D. Cloud Composer, Vertex AI Training with custom containers, and App Engine. You are profiling the performance of your TensorFlow model training time and notice a performance issue caused by inefficiencies in the input data pipeline for a single 5 terabyte CSV file dataset on Cloud Storage. You need to optimize the input pipeline performance. Which action should you try first to increase the efficiency of your pipeline?. A. Preprocess the input CSV file into a TFRecord file. B. Randomly select a 10 gigabyte subset of the data to train your model. C. Split into multiple CSV files and use a parallel interleave transformation. D. Set the reshuffle_each_iteration parameter to true in the tf.data.Dataset.shuffle method. You need to design an architecture that serves asynchronous predictions to determine whether a particular mission-critical machine part will fail. Your system collects data from multiple sensors from the machine. You want to build a model that will predict a failure in the next N minutes, given the average of each sensor’s data from the past 12 hours. How should you design the architecture?. A. 1. HTTP requests are sent by the sensors to your ML model, which is deployed as a microservice and exposes a REST API for prediction 2. Your application queries a Vertex AI endpoint where you deployed your model. 3. Responses are received by the caller application as soon as the model produces the prediction. B. 1. Events are sent by the sensors to Pub/Sub, consumed in real time, and processed by a Dataflow stream processing pipeline. 2. The pipeline invokes the model for prediction and sends the predictions to another Pub/Sub topic. 3. Pub/Sub messages containing predictions are then consumed by a downstream system for monitoring. C. 1. Export your data to Cloud Storage using Dataflow. 2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data. 3. Export the batch prediction job outputs from Cloud Storage and import them into Cloud SQL. D. 1. Export the data to Cloud Storage using the BigQuery command-line tool 2. Submit a Vertex AI batch prediction job that uses your trained model in Cloud Storage to perform scoring on the preprocessed data. 3. Export the batch prediction job outputs from Cloud Storage and import them into BigQuery. Your company manages an application that aggregates news articles from many different online sources and sends them to users. You need to build a recommendation model that will suggest articles to readers that are similar to the articles they are currently reading. Which approach should you use?. A. Create a collaborative filtering system that recommends articles to a user based on the user’s past behavior. B. Encode all articles into vectors using word2vec, and build a model that returns articles based on vector similarity. C. Build a logistic regression model for each user that predicts whether an article should be recommended to a user. D. Manually label a few hundred articles, and then train an SVM classifier based on the manually classified articles that categorizes additional articles into their respective categories. You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human. Which metric(s) should you use to monitor the model’s performance?. A. Number of messages flagged by the model per minute. B. Number of messages flagged by the model per minute confirmed as being inappropriate by humans. C. Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review. D. Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute. You are a lead ML engineer at a retail company. You want to track and manage ML metadata in a centralized way so that your team can have reproducible experiments by generating artifacts. Which management solution should you recommend to your team?. A. Store your tf.logging data in BigQuery. B. Manage all relational entities in the Hive Metastore. C. Store all ML metadata in Google Cloud’s operations suite. D. Manage your ML workflows with Vertex ML Metadata. You have been given a dataset with sales predictions based on your company’s marketing activities. The data is structured and stored in BigQuery, and has been carefully managed by a team of data analysts. You need to prepare a report providing insights into the predictive capabilities of the data. You were asked to run several ML models with different levels of sophistication, including simple models and multilayered neural networks. You only have a few hours to gather the results of your experiments. Which Google Cloud tools should you use to complete this task in the most efficient and self-serviced way?. A. Use BigQuery ML to run several regression models, and analyze their performance. B. Read the data from BigQuery using Dataproc, and run several models using SparkML. C. Use Vertex AI Workbench user-managed notebooks with scikit-learn code for a variety of ML algorithms and performance metrics. D. Train a custom TensorFlow model with Vertex AI, reading the data from BigQuery featuring a variety of ML algorithms. You are an ML engineer at a bank. You have developed a binary classification model using AutoML Tables to predict whether a customer will make loan payments on time. The output is used to approve or reject loan requests. One customer’s loan request has been rejected by your model, and the bank’s risks department is asking you to provide the reasons that contributed to the model’s decision. What should you do?. A. Use local feature importance from the predictions. B. Use the correlation with target values in the data summary page. C. Use the feature importance percentages in the model evaluation page. D. Vary features independently to identify the threshold per feature that changes the classification. You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to AI Platform. You need to determine which customer attribute has the most predictive power for each prediction served by the model. What should you do?. A. Use AI Platform notebooks to perform a Lasso regression analysis on your model, which will eliminate features that do not provide a strong signal. B. Stream prediction results to BigQuery. Use BigQuery’s CORR(X1, X2) function to calculate the Pearson correlation coefficient between each feature and the target variable. C. Use the AI Explanations feature on AI Platform. Submit each prediction request with the ‘explain’ keyword to retrieve feature attributions using the sampled Shapley method. D. Use the What-If tool in Google Cloud to determine how your model will perform when individual features are excluded. Rank the feature importance in order of those that caused the most significant performance drop when removed from the model. You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a company’s logo. In the dataset, 96% of examples don’t have the logo, so the dataset is very skewed. Which metrics would give you the most confidence in your model?. A. F-score where recall is weighed more than precision. B. RMSE. C. F1 score. D. F-score where precision is weighed more than recall. You work on the data science team for a multinational beverage company. You need to develop an ML model to predict the company’s profitability for a new line of naturally flavored bottled waters in different locations. You are provided with historical data that includes product types, product sales volumes, expenses, and profits for all regions. What should you use as the input and output for your model?. A. Use latitude, longitude, and product type as features. Use profit as model output. B. Use latitude, longitude, and product type as features. Use revenue and expenses as model outputs. C. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use profit as model output. D. Use product type and the feature cross of latitude with longitude, followed by binning, as features. Use revenue and expenses as model outputs. You work as an ML engineer at a social media company, and you are developing a visual filter for users’ profile photos. This requires you to train an ML model to detect bounding boxes around human faces. You want to use this filter in your company’s iOS-based mobile phone application. You want to minimize code development and want the model to be optimized for inference on mobile phones. What should you do?. A. Train a model using AutoML Vision and use the “export for Core ML” option. B. Train a model using AutoML Vision and use the “export for Coral” option. C. Train a model using AutoML Vision and use the “export for TensorFlow.js” option. D. Train a custom TensorFlow model and convert it to TensorFlow Lite (TFLite). You have been asked to build a model using a dataset that is stored in a medium-sized (~10 GB) BigQuery table. You need to quickly determine whether this data is suitable for model development. You want to create a one-time report that includes both informative visualizations of data distributions and more sophisticated statistical analyses to share with other ML engineers on your team. You require maximum flexibility to create your report. What should you do?. A. Use Vertex AI Workbench user-managed notebooks to generate the report. B. Use the Google Data Studio to create the report. C. Use the output from TensorFlow Data Validation on Dataflow to generate the report. D. Use Dataprep to create the report. You work on an operations team at an international company that manages a large fleet of on-premises servers located in few data centers around the world. Your team collects monitoring data from the servers, including CPU/memory consumption. When an incident occurs on a server, your team is responsible for fixing it. Incident data has not been properly labeled yet. Your management team wants you to build a predictive maintenance solution that uses monitoring data from the VMs to detect potential failures and then alerts the service desk team. What should you do first?. A. Train a time-series model to predict the machines’ performance values. Configure an alert if a machine’s actual performance values significantly differ from the predicted performance values. B. Implement a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Train a model to predict anomalies based on this labeled dataset. C. Develop a simple heuristic (e.g., based on z-score) to label the machines’ historical performance data. Test this heuristic in a production environment. D. Hire a team of qualified analysts to review and label the machines’ historical performance data. Train a model based on this manually labeled dataset. You are developing an ML model that uses sliced frames from video feed and creates bounding boxes around specific objects. You want to automate the following steps in your training pipeline: ingestion and preprocessing of data in Cloud Storage, followed by training and hyperparameter tuning of the object model using Vertex AI jobs, and finally deploying the model to an endpoint. You want to orchestrate the entire pipeline with minimal cluster management. What approach should you use?. A. Use Kubeflow Pipelines on Google Kubernetes Engine. B. Use Vertex AI Pipelines with TensorFlow Extended (TFX) SDK. C. Use Vertex AI Pipelines with Kubeflow Pipelines SDK. D. Use Cloud Composer for the orchestration. You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?. A. Increase the instance memory to 512 GB and increase the batch size. B. Replace the NVIDIA P100 GPU with a v3-32 TPU in the training job. C. Enable early stopping in your Vertex AI Training job. D. Use the tf.distribute.Strategy API and run a distributed training job. You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?. A. Train a regression model using AutoML Tables. B. Develop a custom TensorFlow regression model, and optimize it using Vertex AI Training. C. Develop a custom scikit-learn regression model, and optimize it using Vertex AI Training. D. Develop a regression model using BigQuery ML. You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI Training, and you want to improve the model’s training time. What should you try out first?. A. Migrate your model to TensorFlow, and train it using Vertex AI Training. B. Train your model in a distributed mode using multiple Compute Engine VMs. C. Train your model with DLVM images on Vertex AI, and ensure that your code utilizes NumPy and SciPy internal methods whenever possible. D. Train your model using Vertex AI Training with GPUs. You are an ML engineer at a travel company. You have been researching customers’ travel behavior for many years, and you have deployed models that predict customers’ vacation patterns. You have observed that customers’ vacation destinations vary based on seasonality and holidays; however, these seasonal variations are similar across years. You want to quickly and easily store and compare the model versions and performance statistics across years. What should you do?. A. Store the performance statistics in Cloud SQL. Query that database to compare the performance statistics across the model versions. B. Create versions of your models for each season per year in Vertex AI. Compare the performance statistics across the models in the Evaluate tab of the Vertex AI UI. C. Store the performance statistics of each pipeline run in Kubeflow under an experiment for each season per year. Compare the results across the experiments in the Kubeflow UI. D. Store the performance statistics of each version of your models using seasons and years as events in Vertex ML Metadata. Compare the results across the slices. You are an ML engineer at a manufacturing company. You need to build a model that identifies defects in products based on images of the product taken at the end of the assembly line. You want your model to preprocess the images with lower computation to quickly extract features of defects in products. Which approach should you use to build the model?. A. Reinforcement learning. B. Recommender system. C. Recurrent Neural Networks (RNN). D. Convolutional Neural Networks (CNN). |