Data Structures University of San Diego Programming Abstraction Methods HW Problem In the file imdb.py, write program that: Reads a file containing movie

Data Structures University of San Diego Programming Abstraction Methods HW Problem

In the file imdb.py, write program that:

Reads a file containing movie information, and stores the information in appropriate data structures, which are specified further in this document. (You will only store the information from the file that is relevant to answering questions of the form described above.)
Reads in from the user (with a prompt) the name of an actor or actress.
Determines the actor or actress who has appeared in the most movies with the named actor/actress, and prints out the result.
Prompts the user again for the name of another actor/actress. The program continues until the user enters an empty string in response to the prompt.

Your program should behave exactly as shown further down in this problem statement. That is, the wording and spacing of what is printed by your program should be the same as shown below.

The dataset files maintained by the IMDb are very large and somewhat tedious to read, so you will be working with a single smaller file, named “imdb_data.csv” that contains information on about 10000 popular movies released since around 1960. The information includes the names of the movies and the principle actors/actresses in them.

“imdb_data.csv” is (as the extension implies), a csv (comma separated value) file. csv files are plain text files that can be viewed with a plain text editor. Each record in a csv file is contained on a separate line, and the fields in a line are separated by commas. Fields containing commas must be escaped. This is typically done by embedding the field inside of double quotes. The first line of a csv file typically contains the labels of the fields. You can view a csv file using a spreadsheet program, like Excel, or Numbers (on a Mac).

The first line of “imdb_data.csv” contains the labels of the fields, and fields containing commas are embedded in double quotes. I suggest that you open “imdb_data.csv” with a spreadsheet program and look at the fields. For purposes of this PSA, the only fields you need to be concerned with are the “original_title” field and the “cast” field. (Note that even with the limited information presented in this file, there are lots of interesting questions you can pose about the data. This PSA has you answering just one such question.)

Since a csv is a plain text file, having your program read it should be straightforward, but there is one complication – those embedded commas within a field. If you read a line into a string, and use the python split method to break it into fields (using the comma as the separator character), you will break some fields into multiple fields. With some careful programming, you can get around this problem, but it is tedious. To avoid this tedium, you can (and you will) let the python module csv handle reading from a csv file for you. Since it is good practice for you to learn to read documentation, I refer you to https://docs.python.org/3/library/csv.html#csv-fmt… to learn about how to use the csv module to read a csv file. You can also Google the topic to find tutorials on it.

The program you write should behave (the wording of what it prints, including space characters) exactly as illustrated in this sample run:

Enter a name: Jennifer Lawrence
4 actor(s) have been in 4 common movies with Jennifer Lawrence. They are:
Bradley Cooper:
American Hustle
Joy
Serena
Silver Linings Playbook

Josh Hutcherson:
The Hunger Games
The Hunger Games: Catching Fire
The Hunger Games: Mockingjay – Part 1
The Hunger Games: Mockingjay – Part 2

Liam Hemsworth:
The Hunger Games
The Hunger Games: Catching Fire
The Hunger Games: Mockingjay – Part 1
The Hunger Games: Mockingjay – Part 2

Woody Harrelson:
The Hunger Games
The Hunger Games: Catching Fire
The Hunger Games: Mockingjay – Part 1
The Hunger Games: Mockingjay – Part 2

Enter a name: Emma Watson
2 actor(s) have been in 8 common movies with Emma Watson. They are:
Daniel Radcliffe:
Harry Potter and the Chamber of Secrets
Harry Potter and the Deathly Hallows: Part 1
Harry Potter and the Deathly Hallows: Part 2
Harry Potter and the Goblet of Fire
Harry Potter and the Half-Blood Prince
Harry Potter and the Order of the Phoenix
Harry Potter and the Philosopher’s Stone
Harry Potter and the Prisoner of Azkaban

Rupert Grint:
Harry Potter and the Chamber of Secrets
Harry Potter and the Deathly Hallows: Part 1
Harry Potter and the Deathly Hallows: Part 2
Harry Potter and the Goblet of Fire
Harry Potter and the Half-Blood Prince
Harry Potter and the Order of the Phoenix
Harry Potter and the Philosopher’s Stone
Harry Potter and the Prisoner of Azkaban

Enter a name: amy adams
1 actor(s) have been in 3 common movies with amy adams. They are:
Philip Seymour Hoffman:
Charlie Wilson’s War
Doubt
The Master

Enter a name: Gwyneth Paltrow
2 actor(s) have been in 3 common movies with Gwyneth Paltrow. They are:
Jude Law:
Contagion
Sky Captain and the World of Tomorrow
The Talented Mr. Ripley

Robert Downey Jr.:
Iron Man
Iron Man 2
Iron Man 3

Enter a name: John Doe
John Doe is not a known actor

Enter a name: moRgaN FReEmAn
1 actor(s) have been in 4 common movies with Morgan Freeman. They are:
Ashley Judd:
Dolphin Tale
Dolphin Tale 2
High Crimes
Kiss the Girls

Enter a name:

Some things to note about how your program must work:

If multiple actors have appeared in the most common movies with the query actor, then all of those actors must be listed.
For each actor that has appeared in the most common movies with the query actor, all common movies must be listed.
Actor names in the database file are capitalized (first letter uppercase, remaining letters lowercase), but when entering a query, the user can enter the letters of the name in any case, and your program should find the actor. (For examples, see the queries for Amy Adams and Morgan Freeman above.)
If multiple actors have appeared in the most movies with an actor, then those actors should be listed in lexicographic order of their names. (For examples, see the queries above except Amy Adams and Morgan Freeman.) (A lexicographic ordering is like the ordering of words in a dictionary.
In Python, the relational operators, like <, give the lexicographic order of strings. So if you have two strings s1 and s2, and if s1 < s2, then s1 appears before s2 in a lexicographic ordering. This means that if you have a list of strings in Python, and you sort it, then they will be put in lexicographic order.) The common movies an actor has with the query actor should be listed in lexicographic order. (As illustrated by all of the examples above.) The wording and spacing in the output of your program must be exactly as illustrated in the examples above. I will be testing your programs with software, and any deviations from the output shown above will cause your output to be flagged as incorrect. The imdbtest.py program included in the repository will check the output of your program. Make sure all test cases are flagged as correct. The program you write should conform to the following requirements: The name of the file containing your program should be imdb.py. The repository contains this file, and this file contains some starter code, which you should not modify. It should take one command line argument that is the name of the csv file containing the movie information. The starter code in imdb.py reflects this requirement. It should contain the definitions of a Movie class and an Actor class. The instance variables of the Movie class should be the name of the movie, and a list of actors who appear in the movie. This actor list should be a list of Actor objects, and not a list of Actor names. The instance variables of the Actor class should include the name of the actor, and a list of movies that the actor has appeared in. This movie list should be a list of Movie objects, and not a list of movie names. You may want to add additional instance variables to support a user query. It should contain the definition of a Imdb class. The instance variables of this class should be: a dictionary of movies, where the keys are the names of movies, and the values are the Movie objects. a dictionary of actors, where the keys are the names of actors, and the values are the Actor objects. The Imdb class should have a run method that reads in from the user (with prompting) the name of an actor/actress, and calls another method query with that name as parameter. query should return a string representation of the result of the query, which run should print out. run continues to prompt the user for actor names, until the user enters an empty string in response to the prompt. The query method of the Imdb class should be as efficient as possible (in terms of running time). Why store information about movies and actors in dictionaries? Because storing a new value in a dictionary and looking up a key in a dictionary are constant time operations, and you need that efficiency when building your data structures, especially when the database file is large.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.