Mars Craters project is the week 1, 2, 3 and 4 submissions of the “Data management and data visualisation” course on Coursera.

The codebook: Mars Craters (week 1)

For this project, I have decided to use the Mars Crater dataset. According to the Mars Crater Codebook, during the period of “heavy bombardment” heavily cratered terrain on Mars was created between 4.2 to 3.8 billion years ago. To this day, the surface has not changed and therefore still remains heavily crated.

Such unique conditions help us understand crustal properties and surface modification events and make inferences into our climatic and hydrologic history.

This project is based on a study created by Stuart Robbins: a global database that contains almost 400K craters statistically complete for D  ≥  1km.

Step1: My research questions


Based on the above information, I have decided to investigate if there a correlation between :

  1. The crater diameter and the depth of the rim
  2. The morphology of the ejecta and the crater diameter
  3. The longitude or latitude length and the depth of the rim
  4. The rim morphology and the rim depth

Depending on the research question, I would select the following datasets (each dataset number corresponds to the above research question):


Step2: My first topic of interest

I selected the Mars Craters because I’m fascinated by space and how physical laws that apply to the micro also apply to the macro (and vice versa).

In this particular case, I want to understand how an apparently random and chaotic event – the meteorite bombardment- has generated a landscape that follows a pattern that can be predicted.

I would also be fascinated to find out if the location has influenced the crater pattern formation. For instance, are craters that share a consecutive CRATER_ID have more similar pattern parameters (diameter, longitude, latitude, rim depth and rim morphology)?

Step3: My first codebook on Mars Craters

In my first analysis, I will investigate the correlation between the rim depth and the ejecta morphology. In particular, I want to determine if the ejecta morphology influences the rim depth.

In this codebook, I will include two parameters, the MORPHOLOGY_EJECTA_1 & DEPTH_RIMFLOOR_TOPOG.

Step4: My second topic of choice

In my second analysis, I will investigate if different classification methods have an effect on the final result.

Do we see correlations or absence of thereof because of data classification? Would a different data collection method influence the final result?

Step5: My second topic of choice

I will use MORPHOLOGY_EJECTA_2 and the rim depth data to establish if the result from the previous codebook holds true when a different classification method is applied.

In this second codebook on Mars Craters, I will include the MORPHOLOGY_EJECTA_2 & DEPTH_RIMFLOOR_TOPOG.

The second codebook will differentiate from the first one only by the selection of ejecta morphology classification data (MORPHOLOGY_EJECTA_2 instead of MORPHOLOGY_EJECTA_1)

Step6: The scientific Literature

Based on the available scientific research, it appears that craters have been studied intensively, both on earth and in space. Examples include vulcanos, the Moon, and planets in the solar system. Here are some links to scientific papers that aimed to study the correlation among crater parameters (rim height, diameter, depth among others).

  2. Geometry of Martian impact craters: First results from an interactive software package. Mouginis-Mark et al., 2004
  3. The shape and appearance of craters formed by oblique impact on the Moon and Venus. Herrick et al., 2003
  4. Ejecta thickness and structural rim uplift measurements of Martian impact craters: Implications for the rim formation of complex impact craters. Sturm et al., 2016
  5. Martian impact craters: Correlations of ejecta and interior morphologies with diameter, latitude, and terrain. Barlow et al., 1990

Step7: My hypothesis on Mars Craters

Based on your literature review, I believe there’s a direct association between the rim depth and the crater morphology, especially if considering the Mouginis-Mark paper of 2004. Interestingly, the paper of 2016 describes the usual ejecta pattern of the Zunil Crater on Mars which unexpectedly doesn’t correlate with other parameters such as rim depth and the crater diameter.

Data Analysis (week 2)

Step8: Frequency Distributions

Variable 1: Crater diameter distribution’s frequency

In my first code, I’m analyzing the Frequency of each crater’s diameter.

Python code1:

print("Frequency of each crater's diameter")

c4=data["DIAM_CIRCLE_IMAGE"].value_counts(sort=True, normalize=True)


Note: I have added sort=True because I want to order craters’ diameters from the most frequent to the least frequent.

Python Result1:

Conclusion1: The most common diameter is around 1km, and the least common is above 52km. In percentage, a crater of 1km in diameter represents 1,5-1,6% of the total, while each crater above 50km in diameter represents less than 0,003% of the entire crater population.

Variable 2: Rim depth distribution’s frequency

In my second code, I’m investigating the Frequency of each crater’s rim depth.

Python code2:

print("Frequency in percentage of each crater's rim depth")
c7=data.groupby('DEPTH_RIMFLOOR_TOPOG').size()*100 /len(data)

Python Results2:

Conclusion2: 80% of the craters do not have a rim while each crater whose rim is above 3,80km in depth represents less than 0,003% of the entire crater population.

Variable 3: What’s the frequency?

In my third code, I’m analyzing how many craters have both a diameter and a rim floor depth equal or above specific values.

First scenario: how many craters have a diameter equal to or above 1km and a rim floor equal to or deeper than 2km?

Phyton code3.1:

sub1=data[(data['DIAM_CIRCLE_IMAGE']>=1) & (data['DEPTH_RIMFLOOR_TOPOG']>=2)]

Python Results3.1:

Conclusion3.1: Only 332 craters out of 384343 craters have a diameter that is equal to or higher than 1km and a rim floor depth equal to or higher than 2km.

Second scenario3.2: how many craters have a diameter equal to or above 50km and a rim floor equal to or deeper than 1km?

Python code3.2:

sub3=data[(data['DIAM_CIRCLE_IMAGE']>=50) & (data['DEPTH_RIMFLOOR_TOPOG']>=1)]

Python Results3.2:

Conclusion3.2: 993 out of 2017 craters with a diameter equal to or higher than 50km have a rim floor (depth equal to or higher than 1km).

Third scenario: how many craters have a diameter equal to or above 50km and no rim floor?

Python code3.3:

sub7=data[(data['DIAM_CIRCLE_IMAGE']>=50) & (data['DEPTH_RIMFLOOR_TOPOG']==0)]

Python Results3.3:

Conclusion3.3: Only 69 craters out of 2071 craters with a diameter that is equal to or higher than 50km have no rim floor.

Data Management Decisions (week 3)

Step9: Coding out the missing data

I’m interested to find out how many craters have a name out of the 384343 craters in the database. In this particular case, I don’t need to apply any .replace function as the craters with no names have empty rows for the CRATER_NAME column.

Python code4:

print("Crater name")
c8 = data["CRATER_NAME"].value_counts(sort=True, dropna=True)

Python Results4:

Conclusion4: Out of 384343 only 986 craters have a name.

Step10: Creating secondary variables

I’m interested to calculate the radium of the 384343 craters in the database. To calculate the value, I’m dividing in half the diameter. Plus, I’m displaying only the crater_id, the crater diameter and the crater radium values.

Python code5:

print("Crater Radium")

Python Results5:

Conclusion5: The table above displays a new column called Crater_Radium. The Diam_Circle_Image column confirms that the new column data values have been calculated correctly (Crater_Radium values are half of the Diam_Circle_Image values).

Step11: Grouping variables

I’m interested to organise all the craters into 10 categories based on their crater diameter. As all the crater values vary from a minimum of 0 to a maximum of 99km, I’m expecting to find how many craters belong to each of these groups (0-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100).

Python code6:

print('Crater diameters organised in 10 groups')
sub10['DIAM_CIRCLE_GROUPS']=pandas.qcut(sub10.DIAM_CIRCLE_IMAGE, 10, labels=["1-10", "11-20", "21-30", "31-40", "41-50", "51-60", "61-70", "71-80", "81-90", "91-100"])
c12= sub10['DIAM_CIRCLE_GROUPS'].value_counts(sort=False, dropna=False)

Python Results6:

Conclusion6: Craters are evenly distributed among the 10 categories of crater diameters.

Creating Graphs for your Data (week 4)

Step12: Graphing individual variables

I’m interested to represent in histograms all the 10 craters categories of step 11. In step 11, I organised all the craters into 10 groups based on their diameter (0-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100). Now, I want to see all those groups visualised as histograms.

Python code7:

seaborn.countplot(x='DIAM_CIRCLE_GROUPS', data=sub10)
plt.xlabel('Size of circle diameter for each crater category')
plt.title('Number of craters for each category')

Python Results7:

To describe the Table above, I need to use python .describe() function.

Python code8:

print('Describe Table')

Python Results8:

Conclusion7-8: The table contains 384343 elements (count), divided into 10 categories (unique). Among these 10 groups, the most populated is 31-40 group (top) that contains craters whose diameter varies between 31 and 40 km. The most frequent group contains 40849 craters in total (freq).

Step12: Graphing combined variables

I’m interested to find how if a direct relationship exists between the crater diameter and the floor rim. To do this, I’ll use a scatter plot graph.

Python code9:

scat1= seaborn.regplot(x='DIAM_CIRCLE_IMAGE', y='DEPTH_RIMFLOOR_TOPOG', fit_reg=False, data=data)
plt.title('Relationship diameter and depth rimfloor')

Python Results9:

Conclusion9: Due to 4K fold difference between the crater diameter and rim floor depth, the scatter plot code fails to show a graph that can be interpreted.

I’m interested to find how if a direct relationship exists between the longitude and the crater diameter. To do this, I’ll use a scatter plot graph.

Python code10:

scat3= seaborn.regplot(x='LONGITUDE_CIRCLE_IMAGE', y='DIAM_CIRCLE_IMAGE', fit_reg=False, data=data)
plt.title('Relationship longitude and crater diameter')

Python Results10:

Conclusion10: As shown before the 4K fold difference between the longitude and the crater diameter, the scatter plot fails to display a graph that can be interpreted. Additional data manipulation is required.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.