You want to store your team's shared tables in a single dataset to make data easily accessible to various analysts. You want to make this data readable but unmodifiable by analysts. At the same time, you want to provide the analysts with individual workspaces in the same project, where they can create and store tables for their own use, without the tables being accessible by other analysts. What should you do?
Correct Answer: C
The BigQuery Data Viewer role allows users to read data and metadata from tables and views, but not to modify or delete them. By giving analysts this role on the shared dataset, you can ensure that they can access the data for analysis, but not change it. The BigQuery Data Editor role allows users to create, update, and delete tables and views, as well as read and write data. By giving analysts this role at the dataset level for their assigned dataset, you can provide them with individual workspaces where they can store their own tables and views, without affecting the shared dataset or other analysts' datasets. This way, you can achieve both data protection and data isolation for your team. References: * BigQuery IAM roles and permissions * Basic roles and permissions
Question 22
Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2 answers)?
Correct Answer: B,D
Explanation The following rules will apply when you use preemptible workers with a Cloud Dataproc cluster: Processing only-Since preemptibles can be reclaimed at any time, preemptible workers do not store data. Preemptibles added to a Cloud Dataproc cluster only function as processing nodes. No preemptible-only clusters-To ensure clusters do not lose all workers, Cloud Dataproc cannot create preemptible-only clusters. Persistent disk size-As a default, all preemptible workers are created with the smaller of 100GB or the primary worker boot disk size. This disk space is used for local caching of data and is not available through HDFS. The managed group automatically re-adds workers lost due to reclamation as capacity permits. Reference: https://cloud.google.com/dataproc/docs/concepts/preemptible-vms
Question 23
You are working on a sensitive project involving private user dat
Correct Answer: D
Question 24
You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?
Correct Answer: B
To load on-premises data into BigQuery while masking sensitive data, we need a solution that offers flexibility for both streaming and batch processing, as well as data masking capabilities. Here's a detailed explanation of why option B is the best choice: * Apache Beam and Dataflow: * Apache Beam SDKprovides a unified programming model for both batch and stream data processing. * Google Cloud Dataflowis a fully managed service for executing Apache Beam pipelines, offering scalability and ease of use. * Customization for Different Use Cases: * By using the Apache Beam SDK, you can write custom pipelines that can handle both streaming and batch processing within the same framework. * This allows you to switch between streaming and batch modes based on your use case without changing the core logic of your data pipeline. * Data Masking with Cloud DLP: * Google Cloud Data Loss Prevention (DLP)API can be integrated into your Apache Beam pipeline to de-identify and mask sensitive data programmatically before loading it into BigQuery. * This ensures that sensitive data is handled securely and complies with privacy requirements. * Cost Efficiency: * Using Dataflow can be cost-effective because it is a fully managed service, reducing the operational overhead associated with managing your own infrastructure. * The pay-as-you-go model ensures you only pay for the resources you consume, which can help keep costs under control. Implementation Steps: * Set up Apache Beam Pipeline: * Write a pipeline using the Apache Beam SDK for Python that reads data from your on-premises storage. * Add transformations for data processing, including the integration with Cloud DLP for data masking. * Configure Dataflow: * Deploy the Apache Beam pipeline on Google Cloud Dataflow. * Customize the pipeline options for both streaming and batch use cases. * Load Data into BigQuery: * Set BigQuery as the sink for your data in the Apache Beam pipeline. * Ensure the processed and masked data is loaded into the appropriate BigQuery tables. Reference Links: * Apache Beam Documentation * Google Cloud Dataflow Documentation * Google Cloud DLP Documentation * BigQuery Documentation
Question 25
You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component in order to train and serve the model your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?
Newest Professional-Data-Engineer Exam PDF Dumps shared by BraindumpsPass.com for Helping Passing Professional-Data-Engineer Exam! BraindumpsPass.com now offer the updated Professional-Data-Engineer exam dumps, the BraindumpsPass.com Professional-Data-Engineer exam questions have been updated and answers have been corrected get the latest BraindumpsPass.com Professional-Data-Engineer pdf dumps with Exam Engine here: