NOTES:
Databases: MySQL and MongoDB; they have raw data in various
formats which has to be cleaned for use.
Websites having data: OPEN DATA Baltimore
,infochimps/marketplace, [Link], asdfree, Kaggle,
Order of cleaning:
1. Raw data
2. Tidy data
3. Code book explaining each variable; its values and units.
4. Instruction List: A record of each step involved in going from 1 to 2
and 3. (R script)
<<Different types of Files: Binary, Excel, XML, JSON, HDF5 from
API, manual data>>
Tips: Include something that connects all the data in a column namely an
id number or something.
CODEBOOK: (Word or txt file )
Info about variables and units
Summary choices ex: mean etc
Info about study design or where you got the data from
R Code: in the Data Cleaning folder
Local flat files= text, comma delimited , etc.
Reading XML file: Markup
Starting tags: <text>
End tags </text>
Empty tags: <line break/>
Elements are specific tags :
<greeting> hello </greeting>
Attributes are components of the label: <step number=”3”> connect A to
B </step>
MySQL database:
Rows are called records.
Structure: diff tables linked together
It’s damn complicated for windows so I skipped this one.
HDF5: Heirarchal Data Format
Stores large data
Webscraping
Extracting data from the HTML of website
API’s :Application Programming Interfaces
Given the current situation, the only way to expand academic resources
is through online sources. The existing list collated by the Ministry does
not seem to include websites which contain country-wise databases like,
[Link]
[Link]
which contain data useful for country-wise research.
Any additional databases which include separate and comparative
country wise data would be helpful for the student body.
1. For CWC and the e-library,the department could create detailed
instructions like the ones given for the Summer Ball event in Minecraft:
Pictures of the respective sites attached with step by step instructions to
be followed. Since the registration process for CWC requires the
Ashoka email ids, it is highly unlikely that there will be exceptional
cases. A document with pictures and instructions would be easier to
download and refer to than videos in case any student faces connectivity
issues.
2. In case there are any concerns or queries these could be addressed in
their respective cohort sessions as the most comprehensive method to
understand is from your peers. For larger concerns an excel sheet could
be circulated and the queries could be answered via a mass email.
Rather than the resource itself, I was dissatisfied with the information I
was given on the CWC instructors in my first year. There were so many
instructors I could go to, but I didn't know who would be able to guide
me the best. I understand that they all could help me with my
writing(which they did!), but if I had some background information on
their field of expertise I could've made a better choice. I think the
incoming first years would find it easier to approach CWC for help if
they could find an instructor to help for their specific problems.
Collating the field of experience and interests of individual instructors
and updating the CWC website with this information could help.