A data analyst needs to calculate the mean for Q1 sales using the data set below: Which of the following is the mean?
Correct Answer: C
Explanation The mean is the average of all the values in a data set. To calculate the mean, we add up all the values and divide by the number of values. In this case, the mean for Q1 sales is ($2,000 + $3,000 + $4,000 + $2,500 + $3,500) / 5 = $3,082.72 References: CompTIA Data+ Certification Exam Objectives, page 9
Question 72
Which of the following value is the measure of dispersion "range" between the scores of ten students in a test. The scores of ten students in a test are 17, 23, 30, 36, 45, 51, 58, 66, 72, 77.
Correct Answer: B
Explanation The correct answer is: 60 Range is the interval between the highest and the lowest score. Range is a measure of variability or scatteredness of the varieties or observations among themselves and does not give an idea about the spread of the observations around some central value. Symbolically R = Hs - Ls. Where R = Range; Hs is the 'Highest score' and Ls is the Lowest Score. The scores of ten students in a test are: 17, 23, 30, 36, 45, 51, 58, 66, 72, 77. The highest score is 77 and the lowest score is 17. So the range is the difference between these two scores Range = 77 - 17 = 60
Question 73
A data analyst is asked on the morning of April 9, 2020, to create a sales report that identifies sales year to date. The daily sales data is current through the end of the day. Which of the following date ranges should be on the report?
Correct Answer: D
Explanation This is because sales year to date refers to the sales that have occurred from the beginning of the current year until the current date. By creating a sales report that identifies sales year to date, the analyst can measure and compare the sales performance and progress of the current year. Since the analyst is asked to create the sales report on the morning of April 9, 2020, and the daily sales data is current through the end of the day, the date range that should be on the report is January 1, 2020 to April 9, 2020. The other date ranges are not correct for identifying sales year to date. Here is why: January 1, 2020 to April 1, 2020 would not include the sales that occurred in the first eight days of April, which would underestimate the sales year to date. January 1, 2020 to April 7, 2020 would not include the sales that occurred in the last two days of April, which would also underestimate the sales year to date. January 1, 2020 to April 8, 2020 would not include the sales that occurred on April 9, which would also underestimate the sales year to date.
Question 74
A data analyst has been asked to merge the tables below, first performing an INNER JOIN and then a LEFT JOIN: Customer Table - In-store Transactions - Which of the following describes the number of rows of data that can be expected after performing both joins in the order stated, considering the customer table as the main table?
Correct Answer: C
Explanation An INNER JOIN returns only the rows that match the join condition in both tables. A LEFT JOIN returns all the rows from the left table, and the matched rows from the right table, or NULL if there is no match. In this case, the customer table is the left table and the in-store transactions table is the right table. The join condition is based on the customer_id column, which is common in both tables. To perform an INNER JOIN, we can use the following SQL query: SELECT * FROM customer INNER JOIN in_store_transactions ON customer.customer_id = in_store_transactions.customer_id; This query will return 9 rows of data, as shown below: customer_id | name | lastname | gender | marital_status | transaction_id | amount | date 1 | MARC | TESCO | M | Y | 1 | 1000 | 2020-01-01 1 | MARC | TESCO | M | Y | 2 | 5000 | 2020-01-02 2 | ANNA | MARTIN | F | N | 3 | 2000 | 2020-01-03 2 | ANNA | MARTIN | F | N | 4 | 3000 | 2020-01-04 3 | EMMA | JOHNSON | F | Y | 5 | 4000 | 2020-01-05 4 | DARIO | PENTAL | M | N | 6 | 5000 | 2020-01-06 5 | ELENA | SIMSON| F| N|7|6000|2020-01-07 6|TIM|ROBITH|M|N|8|7000|2020-01-08 7|MILA|MORRIS|F|N|9|8000|2020-01-09 To perform a LEFT JOIN, we can use the following SQL query: SELECT * FROM customer LEFT JOIN in_store_transactions ON customer.customer_id = in_store_transactions.customer_id; This query will return 15 rows of data, as shown below: customer_id|name|lastname|gender|marital_status|transaction_id|amount|date 1|MARC|TESCO|M|Y|1|1000|2020-01-01 1|MARC|TESCO|M|Y|2|5000|2020-01-02 2|ANNA|MARTIN|F|N|3|2000|2020-01-03 2|ANNA|MARTIN|F|N|4|3000|2020-01-04 3|EMMA|JOHNSON|F|Y|5|4000|2020-01-05 4|DARIO|PENTAL|M|N|6|5000|2020-01-06 5|ELENA|SIMSON||F||N||7||6000||2020-01-07 6||TIM||ROBITH||M||N||8||7000||2020-01-08 7||MILA||MORRIS||F||N||9||8000||2020-01-09 8||JENNY||DWARTH||F||Y||NULL||NULL||NULL As you can see, the customers who do not have any transactions (customer_id = 8) are still included in the result, but with NULL values for the transaction_id, amount, and date columns. Therefore, the correct answer is C: INNER: 9 rows; LEFT: 15 rows.
Question 75
An analyst has received the requirements for an internal user dashboard. The analyst confirms the data sources and then creates a wireframe. Which of the following is the NEXT step the analyst should take in the dashboard creation process?