0% found this document useful (0 votes)
7 views4 pages

Domain 2 - Data Collection Processing and Engineering

Uploaded by

mamdouhbevnoty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Domain 2 - Data Collection Processing and Engineering

Uploaded by

mamdouhbevnoty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Domain 2 - Data Collection Processing and

Engineering
MCQ
1. What happens if data exists in multiple locations when building AI specific data sets?
The data needs to be imported and transformed into a single dataset
New data needs to be collected from a single location instead
One data set needs to be chosen as a primary data set
New data needs to be created
2. After data is collected for use with an AI model, what is the next step?
Determining the data type
Determining the characteristics of the data
Looking for missing or corrupt data elements
Assessing the quality of the data
3. Why would data collected for an AI model need to be converted into binary?
To make looking for corrupt or missing data easier
So the data is a more manageable size
to ensure there is enough data
So the AI model understands it
4. When might one choose a local AI hosting solution for their data?
When data is particularly Sensitive and requires stringent security controls
When a project requires extensive computational power
When everyone using the data is in one location
When they want lower initial cost and lower maintenance
5. What is a white paper?
A comprehensive document serving as a complete reference for an AI project, outlining its
design, implementation, and outcomes.
A troubleshooting guide for AI models
A procedure with a sequence of operations
I report identifying A relevant correlations in a data set
6. What does processing a data set involve?
Transforming and manipulating data to ensure it is ready to be in an AI model
Building the initial vector features for an AI model
Creating a true picture of a real-life situation in which data is entered into an AI model
Ensuring that assets are large enough to build an unbiased AI model
7. What is a feature vector?
An ordered list of numerical properties of observed phenomena
A characteristic of data, such as numeric, string, or date
A repository used to store code for development projects
A tool used to cleanse data to ready it for an AI learning model
8. Why are feature vectors needed?
To ensure that the integrity so an AI model can make accurate predictions
To ensure a data set is large enough to create an unbiased AI model
To ensure data is balanced so an AI model can make accurate predictions
To convert data into machine readable format so that an AI model can make informed
predictions
9. What are some important positive aspects of assessing data quality? Choose 3 answers.
Does it contain personal information
Is it complete
Is it large enough to produce the results without being too large
Is it balanced
Does it have corrupt elements
10. What is the most important question to answer when determining which features to use in
AI machine learning?
Is this feature expensive to use?
Is this feature big enough?
Does this feature impact the outcome and help the model's performance?
Does this feature make the AI model look better?
11. Consideration for data when working with an AI model?
Data quality
data corruption
data representation
Data type
12. Which statements regarding data set size are true? Choose 3 answers.
There is a set percentage of statistical data that one must adhere to when collecting data
The size of a data set is important
The size of a data set is not important
The amount of data collected must be sufficient to build an unbiased AI model
If there is not enough data for a data set, one should consider increasing the number of
sources used
13. a recreational area is starting a new hiker`s programme and wants to group people based on
experience level. No data currently exists to show these categories. What is the best way to
collect this data?
Surveys from existing hikers
Data from an IoT device
A web crawler to extract data from websites
Records of the number of existing hikers
14. which three statements regarding a data test set are true?
It should include Proper representation of each category or class of data
it should be used to train an A I model
It should be updated accordingly as data changes
It should include random sampling of data
its records should be kept with the training set records throughout the AI building process
15. The data set an AI model accesses usually comes in which 2 forms?
Training and assessment
Numerical and alphabetical
Beta and production
Training and testing
16. Match the area of importance in documenting data decisions to its definition.
Assumptions  Believes or conditions taken to be true for a system to work as intended
Predicates  logical statements or conditions defining properties or relationships between
different entities in an AI system
Constraints  restrictions on an AI system
17. Which three statements regarding data for the data collection process are true?
It must be a good starting point for an AI model or it cannot be used
It should be as free from bias as possible
It should be relevant to the problem being solved
It needs to be large enough to produce solid outcomes
It should contain personal information
18. match the type of data collection bias to its description
selection bias  when the data collected does not represent the entire population intended
to be analyzed
digital divide bias  with the needs and opinions of those with limited access to or ability
with technology or overlooked
observer bias  when subjective interpretation influences the data being collected
historical bias  when data is only collected at a special time rather than at all times of the
year
19. what is the purpose of randomizing data when building training and testing data sets?
To get a true picture of a real-life situation for fata entry into an AI model
To ensure the AI model does not become confused when making predictions
To ensure data Is error free
To ensure the data sets are large enough to build an unbiased AI model
20. What is involved in deep transparency?
Retraining a model to use different settings, such as parameters
Keeping track of important details such as algorithms, data, and interpretability, auditing,
and accountability
Managing customer expectations by not overpromising
Building 1 or more connections between AI and data, specifically the applications with the
data being used to develop an AI model
21. Arrange the steps for feature engineering in the correct order.
Double check that the features used are relevant to the solution  position 1
Categorize features into different types  position 2
Transform any data not in the best format  position 3
Validate transformations to ensure they will help build as accurate and AI model as possible
 position 4
22. Match the appropriate technique or tool with the data quality issue it handles.
Imputation methods  missing data
Consistency checks  misaligned data
Anomaly detection techniques  data corruption
Cyber security measures  external threats such as viruses
23. Why is it important to consult a relevant subject matter expert in the field and AI model`s
solution is designed for? Choose 2 answers
To identify any Risky features involving demographic information
to verify that the features selected for the model are valid
to build initial vector features for an AI model
to ask them to write programming code for the AI model
to have them test the AI model for us
24. What are tokens in relation to AI building?
Smaller units of words and sentences
Smaller units of time
Numerical representations of data
Part pf the python programing language
25. Which statements regarding feature vectors are true? Choose 3 answers.
They help ensure a data set is large enough to create an unbiased AI model
they can be numerical or categorical
inconsistencies in vectors will cause inconsistencies within AI predictions
multiple feature vectors across features need to be scaled properly
they can only be numerical
26. Which statements regarding cloud based AI hosting solutions are true? Choose 3 answers
there is no need to maintain physical servers
they offer scalable resources
they have the advantage of built-in tools and services for AI and machine learning
they offer greater control and data security
they come with a higher initial investment and maintenance cost
27. What are three budget considerations when planning an AI project?
Legal requirements specific to the industry for which the AI model is being used
Cost-benefit analysis to ensure a solid return on investment
Equipment for testing, processing, and running AI model
Technological and human resources needed
Guidelines used for algorithm selection
28. Which three statements are true regarding converting data to a format AI can process?
Data needs to be converted to all share the same data type
Many systems require images to be converted to binary to recognize them
conversion may be done for you by some software
AI systems can be programmed to take images and render them into binary numbers for
proper rendering
Data needs to be turned into tokens

You might also like