0% found this document useful (0 votes)

23 views6 pages

Speller

cs50 notes

Uploaded by

danicalili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views6 pages

Speller

cs50 notes

Uploaded by

danicalili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Speller

Problem to Solve
For this problem, you'll implement a program that spell-checks a file, a la the
below, using a hash table.

Etext WORDS MISSPELLED: 718 WORDS IN DICTIONARY: 143091 WORDS IN TEXT: 103614 TIME IN
load: 0.05 TIME IN check: 0.09 TIME IN size: 0.00 TIME IN unload: 0.01 TIME IN TOTAL:
0.15

Background
Theoretically, on input of size n, an algorithm with a running time of n is
"asymptotically equivalent," in terms of O , to an algorithm with a running
time of 2n. Indeed, when describing the running time of an algorithm, we
typically focus on the dominant (i.e., most impactful) term (i.e., in this
case, since n could be much larger than 2). In the real world, though, the
fact of the matter is that 2n feels twice as slow as n.

The challenge ahead of you is to implement the fatest spell checker you can!
By "fatest," though, we're talking actual "wall-clock," not asymptotic, time.

In speller.c , we've put together a program that's designed to spell-check a

file after loading a dictionary of words from disk to memory. That dictionary,
meanwhile, is implemented in a file called dictionary.c . (It could just be
implemented in speller.c , but as programs get more complex, it's often
convenient to break them into multiple files.) The prototypes for the functions
therein, meanwhilem are defined not in dictionary.c itself but in dictionary.h
instead. That way, both speller.c and dictionary.c can #include the file.
Unfortunately, we didn't quite get around to implementing the loading part. Or
the checking part. Both (and a bit more) we leave to you! But first, a tour.

Understanding
dictionary.h

Open dictionary.h , and you'll see some new syntax, including a few lines that
mention DICTIONARY_H . No need to worry about those, but, if curious, those
lines just ensure that, even though dictionary.c and speller.c (which you'll
see in a moment) #include this file, clang will only compile it once.
Next notice how we #include a file called stdbool.h . That's the file in which
bool itself is defined. You've not needed it before, since the CS50 Library
used to #include that for you.

Also notice our use of define , a "preprocessor directive" that defines a

"constant" called LENGHT that has a value of 45 . In other words, it's not a
variable, just a find-and-replace trick.

Finally, notice the prototypes for five functions: check , hash , load , size and
unload . Notice how three of those take a pointer as an argument, per the * :

bool check(const char *word);

unsigned int hash(const char *word);
bool load(const char *dictionary);

And const meanwhile, just says that those strings, when passed in as
arguments, must remain cosntant; you won't be able to change them,
accidentally or otherwise!

dictionary.c

Now open up dictionary.c . Notice how, atop the file, we've defined a struct
called node that represents a node in a hash table. And we've declared a
global pointer array, table , which will (soon) represent the hash table you will
use to keep track of words in the dictionary. The array contains N node
pointers, and we've set N equal to 26 for now, to match with the default
hash function as described below. You will likely want to increase this
depending on your own implementation of hash .

Next, notice that we've implemented load , check , size , and unload , but only
barely, just enough for the code to compile. Notice too that we'vw
implemented hash with a sample algorithm based on the first letter of the
word. Your job, ultimately, is to re-implement those functions as cleverly as
possible so that this spell checker works as advertised. And fast!

speller.c

Okay, next open up speller.c and spend some time looking over the code and
comments therein. You won't need to change anything in this file, and you
don't need to understand its entirely, but try to get a sence of its
functionality nonetheles. Notice how, by way of function called getusage , we'll
be "benchmarking" your implementations of load , check , size , and unload .
Also notice how we go about passing check , word by word, the contentes of
some file to be spell-checked. Ultimately, we report each misspelling in that
file along with a bunch of statistics.

Notice, incidentally, that we have defined the usage of speller to be

Usage: ./speller [dictionary] text

where dictionary is assumed to be a file containing a list of lowercase words,

one per line, and text is a file to be spell-checked. As the brackets suggest ,
provision of dictionary is optional; if this argument is omitted, speller will use
dictionaries/large by default. In other words running

./speller text

will be equivalent to running

./speller dictionaries/large text

where text is the file you wish to spell-check. Suffice it to say, the former is
easier to type! (Of course, speller will not be able to load any dictionaries
until you implement load in dictionary.c ! Until then, you'll see Could not load. )

Within the default dictionary, mind you, are 143,091 words, all of which must
be loaded into memory! In fact, take a peek at that file to get a sense of its
structure and size. Notice that every word in that file appears in lowercase
(even, for simplicity, proper nouns and acronyms). From top to bottom, the
file is sorted lexicographically, with only one word per line (each of which
ends with \n ). No word is longer than 45 characters, and no word appears
more than once. During development, you may find it helpful to provide speller
with a dictionary of your own that contains far fewer words, lest you struggle
to debyg an otherwise enormous structure in memory. In dictionaries/small is
one such dictionary. To use it, execute

./speller dictionaries/small text

where text is the file you wish to spell-check. Don't move on until you're
sure you can understand how speller itself works!

Odds are, you didn't spend enough time looking over speller.c . Go back one
square and walk yourself through it again!

texts/

So that you can test your implementation of speller , we've also provided you
with a whole bunch of texts, among them the script from La La Land, the text
of the Affordable Care Act, three million bytes from Tolstoy, some excerpts
from The Federalist Papers and Shakespeare, and more. So that you know
what to expect, open and skim each of those files, all of which are in a
directory called texts within your pset5 directory.

Now, as you should know from having read over speller.c carefully, the output
of speller , if executed with, say,

./speller texts/lalaland.txt

will eventually resemble the below.

Below's some of the output you'll see. For informations's sake, we've
excerpted some examples of "misspellings." And lest we spoil the fun, we've
omitted our own statistics for now.

MISSPELLED WORDS

[...]
AHHHHHHHHHHHHHHHHHHHHHHHHHHHT
[...]
Shangri
[...]
fianc
[...]
Sebastian's
[...]

WORDS MISSPELLED:
WORDS IN DICTIONARY:
WORDS IN TEXT:
TIME IN load:
TIME IN check:
TIME IN size:
TIME IN unload:
TIME IN TOTAL:

TIME IN load represents the number of seconds that speller spends executing
your implementation of load . TIME IN check represents the number of seconds
that speller spends, in total, executing your implementation of check . TIME IN
size represents the number of seconds that speller spends executing your
implementation of size . TIME IN unload represents the number of seconds
that speller spends executing your implementation of unload . TIME IN TOTAL is
the sum of those four measurements.

Note that these times may vary somewhat across executions of speller ,
depending on what else your codespace is doing, even if you don’t change your
code.

Incidentally, to be clear, by “misspelled” we simply mean that some word is

not in the dictionary provided.

Specification

Alright, the challenge now before you is to implement, in order,

load , hash , size , check , and unload as efficiently as possible using a hash
table in such a way that TIME IN load , TIME IN check , TIME IN size , and TIME IN
unload are all minimized. To be sure, it's not obvious what it even means to
be minimized, inasmuch as these benchmarks will certainly vary as you feed
speller different values for dictionary and for text . But therein lies the
challenge, if not the fun, of this problem. This problem is your chance to
design. Although we invite you to minimize space, your ultimate enemy is
time. But before you dive in, some specifications from us.

You may not alter speller.c or Makefile .

You may alter dictionary.c (and, in fact, must in order to complete the
implementations of load , hash , size , check and unload ), but you may not
alter the declarations of those functions. You may, though, add new
functions and (local or global) variables to dictionary.c .
You may change the value of N in dictionary.c , so that your hash table
can have more buckets.
You may alter dictionary.h , but you may not alther the declarations of the
functions.
Your implementation of check must be case-insensitive. In other words, if
foo is in dictionary, then check should return true given any capitalization
thereof; none of foo , foO , fOo , fOO , fOO , Foo , FoO , FOo ,
and FOO should be considered misspelled.
Capitalization aside, your implementation of check should only return true
for words actually in dictionary . Beware hard-coding common words
(e.g., the ), lest we pass your implemantation a dictionary without those
same words. Moreover, the only possessives allowed are those actually in
dictionary . In other words, even if foo is in dictionary , check should
return false given foo's if foo's is not also in dictionary .
You may assume that any dictionary passed to your program will be
structured exactly like ours, alphabetically sorted from top to bottom with
one word per line, each of which ends with \n . You may also assume that
dictionary will contain at least one word, that no word will be longer than
LENGTH (a constant defined in dictionary.h ) characters, that no word will
appear more than once, that each word will contain oly lowercase
alphabetical characters and possibly apostrophes, and that no word will
start with an apostrophe.
You may assume that check will only be passed words that contain
(uppercase or lowercase) alphabetical characters and possibly
apostrophes.
Your spell checker may only take text and, optionally, dictionary as input.
Although you might be inclined (particularly among those more confortable)
to "pre-process" our default dictionary in order to derive an "ideal hash
function" for it, you may not save the output of any such pre-processing
to disk in order to load it back into memory of subsequent runs of your
spell checker in order to gain an advantage.
Your spell checker must not leak any memory. Be sure to check for leaks
with valgrind .
The hash function you write should ultimately be your own, not one you
search online.

Developer's Spell Checker Guide
No ratings yet
Developer's Spell Checker Guide
2 pages
FAF 233 Nicolai Petcov 10
No ratings yet
FAF 233 Nicolai Petcov 10
7 pages
C++ Hash Table Spell Checker Assignment
No ratings yet
C++ Hash Table Spell Checker Assignment
7 pages
Project
No ratings yet
Project
3 pages
Spell Checker
No ratings yet
Spell Checker
4 pages
Coding Exercises for Beginners
No ratings yet
Coding Exercises for Beginners
50 pages
Spell Checker Project Report
No ratings yet
Spell Checker Project Report
15 pages
Learning Outcomes
No ratings yet
Learning Outcomes
6 pages
5IT4 22 - Compiler Lab Manual Student
No ratings yet
5IT4 22 - Compiler Lab Manual Student
29 pages
Understanding Python Dictionaries
No ratings yet
Understanding Python Dictionaries
10 pages
CS107 Midterm Prep Guide
No ratings yet
CS107 Midterm Prep Guide
6 pages
Assignment 3 (String)
No ratings yet
Assignment 3 (String)
2 pages
CD Rec Process
No ratings yet
CD Rec Process
74 pages
Synopsis Chandrashekhar
No ratings yet
Synopsis Chandrashekhar
5 pages
Record Cs 1-1
No ratings yet
Record Cs 1-1
66 pages
BDA Assignment
No ratings yet
BDA Assignment
55 pages
Intelligent Spell Checker Implementation
No ratings yet
Intelligent Spell Checker Implementation
4 pages
How To Write A Spelling Corrector
No ratings yet
How To Write A Spelling Corrector
10 pages
COMP-111 Programming Principles I: Homework 5
No ratings yet
COMP-111 Programming Principles I: Homework 5
9 pages
1 Updated GaganCD
No ratings yet
1 Updated GaganCD
26 pages
CS 103 Programming Assignment 1a
No ratings yet
CS 103 Programming Assignment 1a
7 pages
CD File
No ratings yet
CD File
20 pages
Lab07 PDF
No ratings yet
Lab07 PDF
25 pages
String Manipulation Techniques in Python
No ratings yet
String Manipulation Techniques in Python
3 pages
Important Programs
No ratings yet
Important Programs
20 pages
Count Word Occurrences in C
No ratings yet
Count Word Occurrences in C
5 pages
Solved Problems Strings
No ratings yet
Solved Problems Strings
7 pages
System Requirements: Hardware Requirements
No ratings yet
System Requirements: Hardware Requirements
128 pages
Write A Program To Recognize Identifiers: Code
No ratings yet
Write A Program To Recognize Identifiers: Code
28 pages
Lab Manual Compiler
No ratings yet
Lab Manual Compiler
39 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
84 pages
Aditya-389 CD File - With - Header
No ratings yet
Aditya-389 CD File - With - Header
29 pages
KARTHIK
No ratings yet
KARTHIK
55 pages
Compiler Construction Practical List
No ratings yet
Compiler Construction Practical List
16 pages
ProgrammingCompetition BlockA en 1
No ratings yet
ProgrammingCompetition BlockA en 1
19 pages
Hangman Word Game Project Report
No ratings yet
Hangman Word Game Project Report
10 pages
Python String Programs
No ratings yet
Python String Programs
7 pages
Keyword Search in C Files
No ratings yet
Keyword Search in C Files
3 pages
Ansh Tygai Practical File
No ratings yet
Ansh Tygai Practical File
48 pages
Assignment 2
No ratings yet
Assignment 2
4 pages
Strings
No ratings yet
Strings
5 pages
CS2094D Assignment 2 Updated
No ratings yet
CS2094D Assignment 2 Updated
9 pages
Pranav Compiler Design Lab File
No ratings yet
Pranav Compiler Design Lab File
32 pages
Pracitical:01
No ratings yet
Pracitical:01
7 pages
Sodapdf
No ratings yet
Sodapdf
23 pages
Python Word Guessing Game Project
No ratings yet
Python Word Guessing Game Project
12 pages
C Language String Functions Guide
No ratings yet
C Language String Functions Guide
34 pages
Problem Set 2
No ratings yet
Problem Set 2
15 pages
Synopsis On Spell Cheker
No ratings yet
Synopsis On Spell Cheker
12 pages
Compiler Laboratory
No ratings yet
Compiler Laboratory
66 pages
C Reference
No ratings yet
C Reference
3 pages
Capgemini 528 9 Sept
No ratings yet
Capgemini 528 9 Sept
14 pages
Compiler Design Lab Guide
No ratings yet
Compiler Design Lab Guide
42 pages
Ritik Kumar EXP 5 Merged AP LAB (Nemesis)
No ratings yet
Ritik Kumar EXP 5 Merged AP LAB (Nemesis)
10 pages
02.05 Strings
No ratings yet
02.05 Strings
5 pages
C File Handling
No ratings yet
C File Handling
21 pages
A.A Program C
No ratings yet
A.A Program C
4 pages
Security Breach: The Case of TJX Companies, Inc
No ratings yet
Security Breach: The Case of TJX Companies, Inc
18 pages
Mechanics Rubrics
No ratings yet
Mechanics Rubrics
1 page
TM-SP08-0001, Extended Modbus Slave Protocol Specification
No ratings yet
TM-SP08-0001, Extended Modbus Slave Protocol Specification
42 pages
M03 Describing Cisco HX Software Components
No ratings yet
M03 Describing Cisco HX Software Components
34 pages
PHD Dissertation Electrical Engineering
100% (2)
PHD Dissertation Electrical Engineering
7 pages
Design and Development of Collaborative Robot: Senior Design Project Report
100% (1)
Design and Development of Collaborative Robot: Senior Design Project Report
82 pages
Bis613d MQP Solved
No ratings yet
Bis613d MQP Solved
41 pages
Microsoft Teams Calling Specialization
No ratings yet
Microsoft Teams Calling Specialization
3 pages
Mosip c5 Ucm Voxvalley
No ratings yet
Mosip c5 Ucm Voxvalley
10 pages
Sign Language Recognition Project
No ratings yet
Sign Language Recognition Project
24 pages
Test Bank For Information Systems Project Management 1st Edition
No ratings yet
Test Bank For Information Systems Project Management 1st Edition
24 pages
Essential ICT Terms for Education
No ratings yet
Essential ICT Terms for Education
2 pages
TV - Lcd-Treinamento-Samsung
No ratings yet
TV - Lcd-Treinamento-Samsung
119 pages
MASS: Customer Fields Are Not Mass-Maintainable: Symptom
No ratings yet
MASS: Customer Fields Are Not Mass-Maintainable: Symptom
3 pages
Geometric Sequence & Series - KEY
No ratings yet
Geometric Sequence & Series - KEY
4 pages
Industrial Serial to WiFi Converter
No ratings yet
Industrial Serial to WiFi Converter
2 pages
FORScan Ford Explorer 11-23MY.xlsx - جداول بيانات Googleف
No ratings yet
FORScan Ford Explorer 11-23MY.xlsx - جداول بيانات Googleف
2 pages
MVQ201 Operation Manual
No ratings yet
MVQ201 Operation Manual
14 pages
MV3 Series User Guide Overview
No ratings yet
MV3 Series User Guide Overview
59 pages
Mechanical Engineering Graduate Resume
No ratings yet
Mechanical Engineering Graduate Resume
3 pages
Simple Interactions Exploit LLM Jailbreaks
No ratings yet
Simple Interactions Exploit LLM Jailbreaks
24 pages
SAP HANA Database Service Connections
No ratings yet
SAP HANA Database Service Connections
3 pages
DWBI Unit-1
No ratings yet
DWBI Unit-1
19 pages
AI & Data Science Course Registration 2024
No ratings yet
AI & Data Science Course Registration 2024
1 page
Electronic Enclosure Design Guide
No ratings yet
Electronic Enclosure Design Guide
20 pages
Cisco Switch Layer 2 VLAN Setup Guide
No ratings yet
Cisco Switch Layer 2 VLAN Setup Guide
6 pages
2457615computer Aided Systems Theory Eurocast 2022 18th International Conference Las Palmas de Gran Canaria Spain February 2025 2022 Revised Selected Papers Roberto Morenodaz Instant Download
100% (3)
2457615computer Aided Systems Theory Eurocast 2022 18th International Conference Las Palmas de Gran Canaria Spain February 2025 2022 Revised Selected Papers Roberto Morenodaz Instant Download
59 pages
AMPS User Guide for NYC Parks Data
No ratings yet
AMPS User Guide for NYC Parks Data
10 pages
CH 4 Instructions 2
No ratings yet
CH 4 Instructions 2
43 pages
Cybersecurity in Building Automation
No ratings yet
Cybersecurity in Building Automation
28 pages

Speller

Uploaded by

Speller

Uploaded by

Speller

In speller.c , we've put together a program that's designed to spell-check a

Also notice our use of define , a "preprocessor directive" that defines a

bool check(const char *word);

Notice, incidentally, that we have defined the usage of speller to be

Usage: ./speller [dictionary] text

where dictionary is assumed to be a file containing a list of lowercase words,

will be equivalent to running

./speller dictionaries/large text

./speller dictionaries/small text

will eventually resemble the below.

Incidentally, to be clear, by “misspelled” we simply mean that some word is

Alright, the challenge now before you is to implement, in order,

You may not alter speller.c or Makefile .

You might also like