Do some research & identify a big data application that you find interesting. First, share your understanding of the application. Describe why you have chosen that particular application and make sure to include a link or citation to the application.
Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.
I've worked on software for retail companies before, mainly on the frontend. The data collected by retailers can be huge. You need the order history of customers. The supplier history of pricing. The customer habits on the store. There's many other sources but I'll focus on these.
The primary objectives of Big Data in retail is to analyze customer behavior and improves sales with that information. This data can then be used to gain insights in customer preferences, buying patterns, supplier pricing, supplier inventory and market trends. This data then gets used to: personalize marketing efforts, improve inventory management, and optimize pricing and sales strategies.
The store site, customer order system and inventory management side of the system will typically consist of some relational database. These datasets are usually highly related and need quick access via software. They might even include in memory key value databases to manage user sessions.
The Big Data will typically get warehoused by the likes of Hadoop, MongoDB, or Spark. This data is less live like the kinds stored in relational databases. Typically the data warehouses are used to periodically perform analytics whose results end up in the faster databases.
Since the lectures have mentioned the use of IoT for data collection, I personally brainstormed an idea of using more sensors in retail stores to get better insights into customer behavior to better optimize the physical stores. Does this new layout hurt the likelihood that someone will find a product they want? Do these displays improve the likelihood of a purchase? These are the kinds of retail applications that could be improved by employing Big Data analytics with data from new sources like IoT.
At the end of Video 4.5, Dr. Sanchez challenges you to come up with an example of a simple database with entities and properties that suits an area of your interest. Following the example presented in the video about colleges and students, find an application that you are interested in and come up with at least two entities, each with at least three corresponding properties. In your post, include:
For example, you may want to define a Houses table with properties SquareFootage, NumOfBedrooms, and Basement that describe, respectively, the square footage of the house, how many bedrooms it has, and whether it has a basement or not. Next, you could define another table, Neighborhoods, with properties Houses, Schools, and DistanceFromDowntown. In this second table, each property describes a particular house that belongs to that neighborhood, how many schools are in the neighborhood, and the average distance of the house in the neighborhood to downtown. A potential application for this example would be to help a real estate agent keep track of the types of listings they have available.
Your initial response should be 100 to 125 words.
Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.
Suggested Time: 45 minutes
Suggested Length: 100-125 words
This is a required activity and will count toward course completion.
I have worked a lot on frontends in the e-commerce sector so my thoughts were on a set of database entities and properties that often get used there. So it's interesting to me to potentially get into the data layer of all the front end code I've written in this field over the years. Below is a nested list version of them for easier reading.
Customers
Name
Address
Email
Products
SKU
ProductName
Price
Stock
The three most important entities I thought of are:
Customers
, Products
, and Orders
.
These three together can represent a lot about an online store.
Customers
will hold information about the customer.
Information like:
Name
, Address
, and Email
would be included as properties.
This data would depend on which Customer
is on that record.
Products
would represent all the products in the store.
Each product would have:
an SKU
, ProductName
, Price
, and Stock
.
An SKU
is a Stock Keeping Unit and
it is a common numbering system which gets used in retail.
Almost any physical product in a store will have one or something like it.
This will be important to track each individual product.
The ProductName
should be seperate from the SKU
since
it has a tendency to change,
believe me it can be a pain when it does,
so it should be a seperate property from SKU
.
The Price
& Stock
fields are fairly simple.
Price
is there to track a product's ever changing price.
The Stock
is there to track how many of them are available for order.
Which industry in Video 4.1 proves to be data-rich but struggles to obtain good predictions based on data?
Which industry or domain is not mentioned in Video 4.1 that has many data applications?
If a website generates data in the form of JSON documents, what type of database would you use to store these JSON documents?
What type of database would you use to store airline data containing passangers' travel history information with columns showing date of travel origin, destination, etc...?
Which database can be used to store data with key-value pair format?
What are entities in a data model?
What are referred to as properties?
What is the purpose of a unique identifier?
What should you do if a table doesn't contain a unique identifier?
Suppose there are two entities called Employees
and Organizations
.
The entity Employees
has a primary key called OrganizationId
.
The OrganizationId
column is also present in the entity Employees
.
In the entity Employees
, the OrganizationId
column is treated as what?
What is the meaning of cardinality?
What are the possible relationships between two tables?
Suppose there are two tables called Employee
and Organization
.
Suppose further that one employee can only work for a single company at a time.
Employee
has a primary key of EmployeeId
and Organization
has a
primary key of OrganizationId
.
Additionally,
OrganizationId
is a foreign key in the Employee
table.
What kind of relationship exists between the Organization
and Employee
tables?
Suppose you have two tables: Students
and Results
.
Students
has a primary key of roll_num
and
Results
has a primary key of result_id
.
Additionally, result_id
is a foreign key in the Students
table.
The result_id
is unique for ea. student.
No two students can have the same result_id
.
One fundamental step when designing a database is to identify its entities and (?).
An entity can be thought of as (?) defined within a database. On the other hand, properties are simply defined as the (?), or attributes, that each table (?).
When defining the entities and properties of a database, it’s important to check if any table has been (?) in the design process and if all the properties are (?) to the correct table.
The next step in database design is to identify a (?) for each table. Primary keys are also called (?) and they are useful to access any (?) record in your table.
Defining a primary key is of paramount importance. In fact, failing to define one can cause problems in your database (?).
A strategy to ensure that a unique identifier is present is to add an extra (?) to your table that will act as a unique identifier.
Finally, before physically designing a database, it’s important to understand the (?) between the different entities in your database. This is particularly useful because it allows you to establish how (?) between the tables can be (?).
There are three types of relationships: 1-to-1 (one-to-one), (?) (one-to-many), and N-to-N (?).
One fundamental step when designing a database is to identify its entities and properties.
An entity can be thought of as a table defined within a database. On the other hand, properties are simply defined as the columns, or attributes, that each table captures.
When defining the entities and properties of a database, it’s important to check if any table has been forgotten in the design process and if all the properties are attached to the correct table.
The next step in database design is to identify a primary key for each table. Primary keys are also called unique identifiers and they are useful to access any unique record in your table.
Defining a primary key is of paramount importance. In fact, failing to define one can cause problems in your database manipulation.
A strategy to ensure that a unique identifier is present is to add an extra column to your table that will act as a unique identifier.
Finally, before physically designing a database, it’s important to understand the relationships between the different entities in your database. This is particularly useful because it allows you to establish how records between the tables can be accessed.
There are three types of relationships: 1-to-1 (one-to-one), 1-to-N (one-to-many), and N-to-N (many-to-many).
In Video 4.10, what noun form is recommended for use in table names?
In Video 4.10, the professor suggested using what type of form for column names?
One of the table names in a database is Books
&
the db designer would like to have a book ID
as
the primary key of this table.
Which of these names follow
the naming convention that was reviewed in Video 4.10?
bookID
Bookid
BookID
BooksID
There are two tables in a database:
Books
& Authors
.
Which name is most likely to be a foreign key name in the Books
table?
authorID
AuthorID
authorsID
AuthorsID
As part of a self learning exercise from the subject of database normalization, Follow along this Mini Lesson.
Normalizing data involves a set of practices that primarily aim to avoid redundancy in the data, improve integrity, and eliminate duplicate records.
When building out the details of a table, you may run into anomaly issues whenever adding, updating, or deleting data. Normalization, with its set of database organization practices, can help prevent that.
A database that has been normalized is considered a Normalized Form.
There are three types of normalized forms: First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF). Let's review ea. of them in more detail.
The 1NF ensures that any cell in your table contains only a single value. This can also be referred to as atomicity and it ensures that records can be accessed individually without the risk of unintentionally retrieving multiple values.
The 2NF builds on the first form. The 2NF can't be enforced w/o having implemented the 1NF. The goal of the 2NF is to guarantee that every non-key depends on the entire PK. In certain cases, one may want to separate a single table into smaller tables and assign a PK to ea. table. Doing so guarantees the dependence of ea. value on a key.
Like the relation between the 2NF & the 1NF,
the 3NF builds on the 2NF.
In other words,
if a column,
say A
,
depends on the PK column,
then you cannot have another column,
say B
,
also depend on column A
&
indirectly on the PK column.
As with 2NF,
a popular strategy to avoid this issue is to separate the original table into
smaller ones and assign new PKs to guarantee independence.
More can be learned about the nuances of normalization in SQL and its many types, by reading this edureka article on the subject.
Suppose that you have two tables called Students
& Subjects
,
where Students
has a primary key of studentId
and
Subjects
has a primary key of subjectId
.
Each student can opt for multiple subjects and
each subject can be opted for by multiple students.
What kind of relationship can be established between
the tables Students
and Subjects
?
Suppose that you have two tables called Employees
& Vehicles
,
where Employees
has a PK of employeeId
and
Vehicles
has a primary key of vehicleId
.
Each employee can have multiple vehicles &
each vehicle belongs to only one employee.
What kind of relationship can be established between
the tables Employess
& Vehicles
?
Suppose you have two tables called Publishers
& Books
,
with PK set as pubId
& bookId
respectively.
Also,
assume that a 1-to-N relationship exists between the two tables.
How would you map and represent the data in
these tables to reduce redundancy in the data?
bookDate
to the Books
tablepubId
to the Books
table
bookId
to the Publishers
table.What is a foreign key?
What is the goal of normalization?
Which normal form requires that each cell in the table should have an atomic value?
Which of the following is related to the Second Normal Form?
What does the Third Normal Form require?
For the rest of the course, a simple example database for tracking students and colleges will be used.
For this discussion, you will continue to add more complex design aspects to the database that you started in Discussion 4.2.
In your discussion post, identify and describe the relationship between the entities and properties in your database. Then define and include details about the primary keys and foreign keys of all of your tables.
Finally, identify and discuss three potential use cases of the database that you designed.
Read the statements posted by your peers. Engage with them by responding with thoughtful comments and questions to deepen the discussion.
Suggested Time: 60 minutes
Suggested Length: 150 words
This is a required activity and will count toward course completion.
I wrote about retail being a common use of big data, so I'll design a database around a theoretical online retailer. To me the most important tables or entities sounds like:
Customers
CustomerID
(PK)Name
Address
Email
Products
SKU
- Or Stock Keeping Unit - (PK)ProductName
Price
Stock
Orders
OrderID
(PK)CustomerID
(FK)ProductOrders
Products
& Orders
ProductOrderID
(PK) = (FK) + (FK)OrderID
(FK)ProductID
(FK)Quantity
A quick summary about the entities, properties and relations.
In the Customer
entity, which represents each customer with an account.
Its primary key is the CustomerID
.
Although the Email
property is unique for every Customer
,
it makes sense to have a separate primary key in case they want to change email.
Online retailers also often allow associating via multiple email addresses.
This means the second normal form is broken because each primary key needs to
be unique entirely to one entity.
The rest of the properties are non-keys describing the Customer
,
that are entirely dependent on the primary key.
Next the Proucts
entity,
The first property is the SKU
.
In retail,
almost everyone uses some SKU
or Stock Keeping Unit system.
Each SKU
is unique by design so it only makes sense to make this
the primary key for Products
.
The rest of the properties are non-key properties describing the product.
There are no foreign keys because Products
are many-to-many related to
Orders
which necessitates a join table ProductOrders
.
The relationship for a many-to-many entity and its join table is 1-to-many.
Meaning the foreign key should reside in the join table.
Now for the Order
entity, which describes any given order a customer has made.
Its primary key is an OrderID
.
It has a 1-to-many relationship with customers.
One CustomerID
can have many orders,
but single Orders
don't make sense being related to multiple Customers
.
Because Orders
is the many in 1-to-many it makes most sense to
put the foreign key to Customers
,
CustomerID
inside Orders
to avoid duplication of data.
Just like with Products
which Orders
has a many-to-many relation with,
the relationship for an entity to its join table is 1-to-many.
This means Orders
only duplicates data by
putting a Product
related foreign key within it.
Finally, the ProductOrder
join table.
This is the join table that maps Products
& Orders
together.
Because ProductID
& OrderID
are each unique within their tables,
compositing their values by concatenation guarantees ProductOrderID
,
the primary key,
is also unique.
And since the relation between a join table and
its joining entities is 1-to-many with the many on the join table side,
this means the foreign keys in the relationship should reside here.
So this table will have two foreign keys;
the ProductID
and OrderID
being mapped together.
There's also another piece of information that
depends on both Products
& Orders
, Quantity
.
The Quantity
of Products
in Orders
depends on both Entities
.
This means in the third normal form it makes most sense to
place that property here.
There's tons more complexity involved in any e-commerce site. However, I thought these most simply describe some of the database modeling topics we've encountered so far.
While creating a table in a database, which of the following needs to be defined?
What can happen if a field has the wrong datatype in a table?
Which datatype is the best when there are leading zeroes in the data, such as ZIP codes used in U.S. addresses?
Which keyword is used to order the dataset in ascending order in a MySQL query?