Free Access Google.Professional-Data-Engineer.v2022-07-07.q155 Practice Test (Page 3)

Question 6

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

A.Try to collect more data and increase the size of your dataset.

B.Increase the share of the test sample in the train-test split.

C.Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.

D.Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.

Question 7

You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

A.Assign the users/groups data viewer access at the table level for each table

B.Create SQL views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the SQL views

C.Create authorized views for each team in datasets created for each team. Assign the authorized views data viewer access to the dataset in which the data resides. Assign the users/groups data viewer access to the datasets in which the authorized views reside

D.Create authorized views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the authorized views

Question 8

You are building a model to make clothing recommendations. You know a user's fashion pis likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

A.Continuously retrain the model on just the new data.

B.Continuously retrain the model on a combination of existing data and the new data.

C.Train on the existing data while using the new data as your test set.

D.Train on the new data while using the existing data as your test set.

Question 9

Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error:
# Syntax error : Expected end of statement but got "-" at [4:11]
SELECT age
FROM
bigquery-public-data.noaa_gsod.gsod
WHERE
age != 99
AND_TABLE_SUFFIX = '1929'
ORDER BY
age DESC
Which table name will make the SQL statement work correctly?

A.bigquery-public-data.noaa_gsod.gsod*

B.'bigquery-public-data.noaa_gsod.gsod*`

C.'bigquery-public-data.noaa_gsod.gsod'

D.'bigquery-public-data.noaa_gsod.gsod'*

Question 10

You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM).
Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?

A.Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory

B.Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS

C.Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up

D.Allocate additional network interface card (NIC), and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage

Other Version: 151Google.Professional-Data-Engineer.v2025-09-04.q126; 812Google.Professional-Data-Engineer.v2025-03-18.q108; 948Google.Professional-Data-Engineer.v2024-12-09.q327; 1474Google.Professional-Data-Engineer.v2024-02-15.q202; 2260Google.Professional-Data-Engineer.v2023-09-14.q233; 5247Google.Professional-Data-Engineer.v2022-02-11.q268; 112Google.Examcollectionpass.Professional-Data-Engineer.v2021-12-20.by.jonathan.161q.pdf

Latest Upload: 107Oracle.1Z0-1057-23.v2025-09-10.q47; 150Google.Professional-Cloud-Network-Engineer.v2025-09-09.q179; 131SAP.C-S4EWM-2023.v2025-09-08.q83; 155TheSecOpsGroup.CNSP.v2025-09-08.q20; 206CFAInstitute.ESG-Investing.v2025-09-08.q173; 152PECB.ISO-IEC-27001-Lead-Implementer.v2025-09-06.q132; 140Salesforce.Data-Architect.v2025-09-05.q216; 136Adobe.AD0-E605.v2025-09-05.q50; 176Nutanix.NCP-MCI-6.10.v2025-09-05.q55; 114Oracle.1z0-591.v2025-09-05.q104

Question 6

Question 7

Question 8

Question 9

Question 10

Download PDF File