You do not have permission to edit this page, for the following reason:

You are not allowed to execute the action you have requested.


You can view and copy the source of this page.

x
 
1
2
3
==PYTHON NOTEBOOK USEFULLNESS, CASE STUDY: OPTIMIZATION TECNICS COURSE==
4
5
Laura Andrea Rodriguez Rodriguez - [mailto:la.rodriguez@uniandes.edu.co la.rodriguez@uniandes.edu.co]
6
7
Cristian José Pardo Mercado - [mailto:crijpardomer@unal.edu.co crijpardomer@unal.edu.co]
8
9
Juan Esteban Cepeda Baena - [mailto:jecepedab@unal.edu.co jecepedab@unal.edu.co]
10
11
Juan Pablo Gómez - [mailto:jupgomezme@unal.edu.co jupgomezme@unal.edu.co]
12
13
Sergio Rivera - [mailto:jupgomezme@unal.edu.co jupgomezme@unal.edu.co]
14
15
UNIVERSIDAD NACIONAL DE COLOMBIA
16
17
=1. Introduction=
18
19
Online code sharing and documentation platforms have become an essential part of every analyst and data scientist’s lives. Google Colab and Jupyter Notebook are some of the well named platforms which are used quite extensively to work on datasets. These platforms allow the user to showcase and manipulate data along with the capabilities of adding beautiful visualizations and essential narrative texts. They offer high customization abilities for the users along with Collaboration and sharing capabilities. These notebooks also offer a variety of languages to choose from, like R, Python, PySpark etc. if the necessary packages are installed. Other platforms available for working on big data includes R Markdown, Kaggle, Spark Notebook and many other popular tools. With the recent developments in Data Science technologies and Machine learning, there has been an increased adoption for these platforms, which has eventually led to the addition of new features to these tools.
20
21
'''JupyterLab '''is a web-based interactive development environment for Jupyter Notebook, codes as well as data. It has been designed to be flexible so as to incorporate any form of workflow into it pertaining to Data Science, Scientific Computing as well as Machine Learning & Artificial Intelligence. It also allows developers to write their own plugins and packages, which can then be easily incorporated into the workflows, i.e., seamless integration experience. Jupyter Notebook is a component of Lab which is an open-source web application that allows users to create and share documents containing live code, equations, visualizations and narrative text. Some of the use cases include data cleaning and transformation, numerical simulation, statistical modeling, and data visualization.
22
23
'''Google Colab''' or short for Colaboratory is a product developed by Google Research, which is a free to use platform and can be used to execute arbitrary Python codes through the browser. This tool is especially suited for machine learning, data analysis as well as educational purposes. Colab is a hosted Jupyter Notebook service which requires no setup and provides free access to computing power like GPUs on the cloud. This is ofcourse extended if the user pays for the pro version called Colab Pro.
24
25
=2. Why Jupyter Notebook?=
26
27
Some of the features provided by Jupyter are as follows -
28
29
:* Record code and steps performed in data manipulation and data cleaning which helps to back track steps in a later stage
30
31
:* Visualizations like graphs and figures are directly rendered on the notebook which can be easily rendered on html or pdf files for sharing
32
33
:* Animations and dynamic visualizations or interactive visuals are available on Jupyter Notebook which helps users interact with visuals and understand patterns in-depth
34
35
=3. Why Google Colab?=
36
37
Just as we discussed about some primary features that Jupyter provides to its users, here are some features that Google Colab provides its users as well -
38
39
:* It runs on Google Cloud Platform which provides robust and flexible service to the users in terms of storage and management
40
41
:* It is free of cost, provided the high end GPUs and resources that it provides for computation. Recently, they have added the feature for Tensor Processing Unit apart from its existing GPU and CPU
42
43
:* It can easily be integrated with Google Drive which makes dealing with large datasets from the location easy
44
45
=4. Google Colab vs Jupyter Notebook: The verdict=
46
47
Google Colab is an evolved version of the Jupyter Notebook with enhancements and far more superior features to be offered. There are multiple reasons why Google Colab is often preferred in colleges and companies, some of them being -
48
49
:* Lot of libraries are pre-installed which eliminates the need to install softwares and packages before using. Some of the most commonly used packages are pandas, numpy and matplotlib which are already pre-loaded in the environment
50
51
:* The work is saved on the cloud and there is no hassle of saving it in the local system. They are shared on the users google drive account which can easily be shared across with others over the internet
52
53
:* Collaborative capabilities which users can share across with team mates, with proper comments and notes. It has the power to co-code with fellow analysts and data scientists for real time work updates and reviews
54
55
:* GPU and TPU uses are free for everyone which eliminates the need to have expensive processing power with the user. Google provides all the processing power for powerful ML and AI packages and the local systems are not harmed in any manner
56
57
:* Google Colab also allows easy integration with numerous platforms like Google Forms, Stack Overflow, and Code Gist on Github
58
59
[[Image:Draft_Rivera_196260444-image1.png|600px]]
60
Figure 1. Google Colab vs Jupyter Notebook
61
62
The image above (Figure 1) shows both the IDEs and how a basic print statement looks on each of these platforms. Communication is an essential aspect of any project or data science activity. It is the idea of sharing information with one another in the process of developing a model or analyzing the data. You can only imagine when there are situations wherein the coder has developed 1000s of lines of codes which serves the purpose of the project. Now, sharing the same with the entire team, without any proper annotations or comments only makes it complex for the team to interpret what each section of the code does. In another scenario, earlier practices have been to copy snippets of code and paste it on MS Word document or some Confluence page after which the developer might go ahead and start explaining what each segment of the code does. This process of ‘documenting as you code’ has become far more simpler with the introduction of IDEs like Google Colab and Jupyter Notebook.
63
64
=5. Guide to using Google Colab=
65
66
We have constantly faced this issue in the past wherein a Machine Learning model stops working and we get an error ‘memory-error’, especially when working with large datasets on our local system. The following are some of the steps for setting up a project on Google Colab -
67
68
:''* Setting Up a Drive''
69
70
One can start by clicking on this [https://colab.research.google.com/ link] and then go on to create a new notebook or even upload a notebook from the local system to the environment. One can also seamlessly import a notebook directly from Google Drive if they wish to do so
71
72
:* ''Choosing a GPU or TPU''
73
74
Users are free to change the runtime on their notebook as they please. They can choose either a GPU or a TPU to work on their notebook
75
76
:* ''Using Terminal Code''
77
78
Google Colab also allows their user to run terminal codes and the popular libraries are already added to the notebook. Other than that, pip install allows installing the common packages on the work environment for the user
79
80
:* ''Saving a File''
81
82
While there are multiple ways to saving a Colab file, users can either use the terminal commands to save their work or directly go to File and Choose Save a Copy in Drive to get the job done
83
84
These simple steps as discussed above allows a user to navigate easily on the platform, which makes most of the tasks quite easy for them to comprehend. This makes Google Colab even more desirable in terms of choosing over an IDE for Data Science and Machine Learning. Having discussed that, there are also some downsides to using Google Colab, some of which include -
85
86
:* ''Closed Environment''
87
88
Google Colab can be used by anyone to run arbitrary Python codes in the browser. But it is kind of a closed environment for the Machine Learning Engineers or Data Scientists while adding their own packages. Hence the platform is great for using common tools but lacks scope for specialization purposes.
89
90
:* ''Repetitive Task''
91
92
The problem is that repetitive tasks are persistent in Google Colab wherein the user needs to install the packages and libraries again and again, when they start a new session every time. So, one needs to keep in mind that once they shut down an instance or session, on beginning a new one, everything needs to be re-installed again to get back to the original state of working
93
94
:* ''No Live-Editing''
95
96
For real-time collaboration just as Google Docs or Google Sheets provide, multiple users cannot work on the same file which limits the scope of real time collaboration. Hence, the need for back and forth transition arises when working on a single file within a team, since one person can work on the edits at a time
97
98
:* ''Saving & Storage Problem''
99
100
Uploaded files are removed when the session is restarted because Google Colab does not provide a persistent storage facility. So, if the device is turned off, the data can get lost, which can be a nightmare for many. Moreover, as one uses the current session in Google Storage, a downloaded file that is required to be used later needs to be saved before the session’s expiration. In addition to that, one must always be logged in to their Google account, considering all Colaboratory notebooks are stored in Google Drive
101
102
:* ''Limited Time and Space''
103
104
The Google Colab platform stores files in Google Drive with a free space of 15GB; however, working on bigger datasets requires more space, making it difficult to execute. This, in turn, can hold most of the complex functions to execute. Google Colab allows users to run their notebooks for at most 12 hours a day, but in order to work for a longer period of time, users need to access the paid version, i.e. [https://colab.research.google.com/signup Colab Pro], which allows programmers to stay connected for 24 hours
105
106
These points that we discussed above come as a set of trade-offs for the platform, when comparing with other coding IDEs for Data Science and Machine Learning. But if a user is comfortable with these points, Google Colab is definitely a great option to choose from a wide range of coding platforms.
107
108
To conclude, even though Jupyter Notebook is a robust IDE for Data Scientists and Analysts to work on, Google Colab has become an evolved form of the same which can make the work far more easier. With the increasing amount of data and the requirement for more accuracy in predictions, the need to use powerful ML and AI techniques has increased. This requires high computation power which is provided by Google Colab to its users. In comparison to the Jupyter Notebook, which uses local computational power, this becomes far more convenient for users who cannot afford high end processors and GPUs.
109
110
Over and above that, there are various mathematical equations, convex as well as non-convex optimization requirements along with other mixed-integer programming problems which can be solved with the high computational abilities of Google Colab. For the purpose of demonstration, a convex optimization has been used to solve a problem of control in this [https://colab.research.google.com/github/cvxgrp/cvx_short_course/blob/master/intro/control.ipynb notebook]. Similar to the example notebook, many other mathematical models can be solved using Google Colab and its computing capabilities.
111
112
The notebooks have evolved over the years, from a rudimentary form (iPython) to Google Colab. While iPython notebook only provided Read-eval-print loop, Jupyter was developed, which helped in running codes and taking notes along with other interactive features. Finally, Google Colab was developed which provided real-time code collaboration as well as hardware and resource allocation for the users.
113
114
Hence, for extensive Machine Learning and Artificial Intelligence, Complex Mathematical model optimization and other resource intensive tasks, Google Colab can be preferred over the Jupyter notebook, which relies on local computational power. For universities and highly collaborative work spaces, Google Colab surpasses the traditional JupyterLab features in terms of code development and model testing purposes. Over time, a research organization like Google Research is bound to improve a lot on their product Google Colab in the coming days. At least for now, when it comes to working on your projects, Google Colab is one of the best in the market which completes all the set of needs for their users as compared to the other IDEs available.
115
116
==CASE STUDY: OPTIMIZATION TECNICS COURSE==
117
118
The development of Google Colaboratory notebooks to complement classroom learning can be included in the category of development of educational and didactic resources, particularly university educational resources. Their preparation is not very different from the preparation of textbooks, books, or lecture notes, which are generally used; the process of preparation should be subject to rigorous phases of planning, development, and evaluation. During the development of this novel resource, the bibliography to be used was previously analyzed; this bibliography will guide the contents of the Google Colab notebooks. The next step was to find analogous material, either on Python or other languages, related to optimization techniques. To this end, we reviewed and evaluated four textbooks on mathematical optimization, engineering optimization, and engineering optimization applications [9]-[12] as well as three blogs with material published on the internet [13]-[15]. We chose two main sources and three secondary sources [9]-[11], [14]-[15]. The first reference was chosen because its contents are updated and coincide with the course syllabus [16], while the second reference was chosen because of its clear explanations and examples. According to the author of reference [9], ''This book introduces all the major metaheuristic algorithms and their applications in optimization. This textbook consists of three parts:  Part I: Introduction and fundamentals of optimization and algorithms; Part II: Metaheuristic algorithms; and Part III: applications of metaheuristics in engineering optimization''. The author of reference [10] states: ''The reader is motivated to be engaged with the content via numerous application examples of optimization in the area of electrical engineering throughout the book. This approach not only provides relevant readers with a better understanding of mathematical basics but also reveals a systematic approach of algorithm design in the real world of electrical engineering''. Once these two references had been chosen, the thematic content to be developed in the notebooks was structured. Since the objective was to address all the topics of the syllabus in [16], our proposal was as follows:
119
120
:* ● Mathematics for optimization 1
121
122
::* Upper and lower dimensions
123
124
::* Basic calculus
125
126
::* Optimality
127
128
::* Norms of vectors and matrices
129
130
:* ● Mathematics for optimization 2
131
132
::* Eigenvalues and definition
133
134
::* Linear and affine functions
135
136
::* Gradient vector, Jacobian matrix, and Hessian matrix
137
138
::* Convexity (convex sets, convex functions)
139
140
:* Convex optimization 1
141
142
::* Unconstrained optimization
143
144
::* Gradient-based methods
145
146
::* Constrained optimization
147
148
:* Convex optimization 2
149
150
::* Linear programming
151
152
::* Simplex method (basic procedure, augmented form)
153
154
::* Non-linear optimization
155
156
:* Convex optimization 3
157
158
::* Penalty method
159
160
::* Lagrange multipliers
161
162
::* Karush-Kuhn-Tucker conditions
163
164
:* Convex optimization 4
165
166
::* BFGS method
167
168
::* Nelder-Mead method
169
170
:* Convex optimization 5
171
172
::* Trust-region method
173
174
::* Sequential quadratic programming
175
176
:* Non-convex optimization 1
177
178
::* Non-convex stochastic gradient descent
179
180
:* Non-convex optimization 2
181
182
::* Gradient descent for principal components analysis
183
184
:* Non-convex optimization 3
185
186
::* Alternating minimization methods
187
188
::* Branch-and-bound methods
189
190
:* Metaheuristic optimization 1
191
192
::* Simulated annealing
193
194
:* Metaheuristic optimization 2
195
196
::* Genetic algorithms
197
198
:* Metaheuristic optimization 3
199
200
::* Tabu search
201
202
:* Metaheuristic optimization 4
203
204
::* Evolutionary strategies
205
206
The contents of the selected references were previously read to structure the proposal. Subsequently, the topics were balanced, so they could be developed in a one-and-a-half-hour masterclass.
207
208
We expect to contribute from our approach: mathematics, computer science, and engineering. The purpose is to try to capture the rigor of mathematics along with the efficiency of algorithms, as well as applications in engineering. To this end, we proposed that the first two notebooks should have this mathematical focus to lay the foundations necessary to develop the course. We also ensured that all notebooks included the implementations of the optimization algorithms efficiently.
209
210
The importance of clear concepts for students was always emphasized. For this reason, we decided to use as many graphic resources as possible, whether they were images uploaded in the notebooks by means of a code or images obtained from different websites related to the contents.
211
212
The preparation of the Google Colab notebooks was based on the pedagogical proposal called ''Problem-based learning'' (APP); this methodology is widely used in the formulation of educational texts and curricula [17]. This methodological proposal has produced good results at all educational levels, including the university [18].
213
214
At the beginning of each notebook there is a brief description of the topics to be covered throughout the notebook for students to know the contents of the class beforehand. Then there is a brief introduction to the topics to be covered; this introduction usually contains small historical notes related to the main topic of the notebook and general ideas for students to better understand the phenomenological regarding the topic. For example, the introduction of notebook 4 (Classic Methods II) states: ''This notebook is focused on studying the Simplex method, developed in 1947 by the American mathematician George Dantzig (known as the father of linear optimization). The Simplex method is one of the canonical methods of the optimization theory; it is of great importance to efficiently solve instances of optimization problems with both the objective function and the constraints being linear functions''.
215
216
Sometimes, this introduction refers to topics studied in previous notebooks and contrasts methods and results, promoting students’ meaningful learning [19].
217
218
After the introduction, students will find a section of dependencies, which are basically libraries necessary for the notebook to run correctly.
219
220
221
Although this section was among the first ones to be developed, all the necessary libraries and functionalities were compiled until the completion of the notebook for it to be executed successfully. Once these libraries were compiled into a single cell, the cell was relocated at the beginning of the notebook. This was done primarily for two reasons. On the one hand, the sequential structure of Google Colab notebooks generates an error when trying to execute a function without importing the library that contains it first; it is therefore advisable to upload these functionalities beforehand. On the other hand, since all the libraries are grouped in the same cell, students will always be aware of the libraries they will use or learn to use during the class.
222
223
  
224
225
Students then encounter the most important part of the notebook, namely the body of the notebook. This contains the title of the topic to be addressed, its theoretical development, formulas, deductions, and pseudocodes; these are developments necessary for a deep understanding of the topic. Mathpix Snip was used to prepare this section; this is a powerful OCR tool for finding the LaTeX code of a text image. Thanks to this, writing a formula inside the notebook becomes a simple task: if the formula is found digitally as an image, only a single click is necessary to obtain the LaTeX code that generates it [22]. After completing the theoretical development of the topic, students find a simple example, which is clearly delimited by the graphic separator associated with it (Figure 2):
226
[[Image:Draft_Rivera_196260444-image4.png|center|162px]]
227
Figure 2. Graphic separator.
228
229
Sometimes, the examples include a practical section, so students are expected to complete the structuring and to solve the problem presented; that is the function of the graphic separator (Figure 3):
230
231
232
[[Image:Draft_Rivera_196260444-image5.png|center|162px]]
233
Figure 3. Function of the graphic separator.
234
235
Students are told that it is time to interact with the exercise; this is based on problem-based learning, as explained above.
236
237
The proposed exercises include comments, hints, and aids that will allow students to satisfactorily fulfil the objectives. For this reason, the code contains comments and interactive hints.  These interactive hints are only an object that, when clicked on, unfolds completely as shown below (Figure 4):
238
239
240
[[Image:Draft_Rivera_196260444-image8.png|center|600px]]
241
Figure 4. Interactive hints.
242
243
Interactive hints are created based on an HTML object called Details.  <details> is used as a widget (small program) for revealing information. It is useful as users can see additional information, which is not visible initially unless they voluntarily want to do so. The aim is to encourage students to try to solve the exercise on their own by using the hints provided when necessary.
244
245
There are also questions in which it is necessary to confirm that the result is correct for the future development of the notebook. To this end, there is a section that shows the answer that students should obtain when solving the proposed exercise; the graphic separator is used to do it.
246
247
Then the answer to the exercise is shown.
248
249
The answers to the exercises were previously tested for students to verify if they have assimilated new knowledge correctly. [20]
250
251
Inside the body of the notebook, students will also find historical notes or curious data that complement the information presented in the notebook; the graphic separator is used for this.
252
253
For example, the ''Did You Know?'' section of notebook 1 (Mathematics for Optimization) states: ''The property that allows every non-empty set of real numbers bounded above to have a supreme is known as the “supremum axiom” or “axiom of completeness;” it is logically equivalent to the “intermediate value theorem,” a theorem that we will use later in the course. ''This information is not included in the main references [9] and [10], but complements the topics addressed.
254
255
256
Both header and footer banners as well as graphic separators containing ''example'', ''your turn'', ''answers'', ''did you know that? ''were designed on the web application Canva; this is a graphic design tool software with drag-and-drop features that provides access to more than 60 million photographs and 5 million vectors, graphs, and free and licensed fonts [21] (Figure 5).
257
258
259
[[Image:Draft_Rivera_196260444-image11.png|center|600px]]
260
Figure 5. Example of banner taken from notebook 1.
261
262
Finally, to take full advantage of the benefits of Google Colab notebooks, libraries of Python programming language were implemented in the explanations and examples sections, such as ''Matplotlib'', ''Seaborn'', and ''plotly''; the main function of these libraries is to provide tools for data visualization [23]-[25]. The functions of the examples developed in the notebooks were plotted in both 2D and 3D graphs with these libraries; functions, contour lines, and even feasible sets can be plotted (Figure 6):
263
264
265
[[Image:Draft_Rivera_196260444-image12.png|center|252px]]
266
Figure 6. Example taken from notebook 5 (Classic methods III).
267
268
Most graphs are static: the user sees the graph as an image that does not change over time and cannot interact with it. However, dynamic and interactive graphs were also added, as in notebook 3 (Optimization methods).
269
270
Made with the plotly library [25], students can move the graph with their cursor, check the results, be more involved with the objects of the exercise, and understand its objectives.
271
272
Finally, we would like to mention those tools that we believe may be interesting to be explored in future developments:
273
274
:* Tensowflow [26]: it contains a set of functionalities related to machine learning. Functionalities like those inherited from Keras [27] regarding optimizers contain valuable information with topics we have already studied in notebooks such as SGD or Adagrad [28].
275
276
:* R [29]: it is possible to invoke functionalities and run programs written with R syntax in Google Colab by means of the rpy2 library [30]. In doing so, optimization techniques and the native optimization tools provided by R could be explored. Including exercises that can take advantage of probability and statistics should also be considered.
277
278
:* CVXPY [32]: it is an optimization library for Python; one of the authors of reference [12] participated in its development. Some exercises can be proposed to compare the efficiency of the algorithms presented with the Scipy library and then contrast them with their corresponding algorithms from the CVXPY library.
279
280
:* plotly [25]: We did not explore the inclusion of interactive graphs deeply. The use of libraries such as plotly or the classic matplotlib can provide interesting pedagogical approaches as they allow users to interact with the mathematical objects they are faced with.
281
282
==References==
283
284
[1] [https://towardsdatascience.com/4-reasons-why-you-should-use-google-colab-for-your-next-project-b0c4aaad39ed https://towardsdatascience.com/4-reasons-why-you-should-use-google-colab-for-your-next-project-b0c4aaad39ed]
285
286
[2] [https://dimensionless.in/using-jupyter-notebooks-google-colab/#:~:text=Colaboratory%20is%20a%20free%20Jupyter https://dimensionless.in/using-jupyter-notebooks-google-colab/#:~:text=Colaboratory%20is%20a%20free%20Jupyter],for%20free%20from%20your%20browser.
287
288
[3] [https://research.google.com/colaboratory/faq.html https://research.google.com/colaboratory/faq.html]
289
290
[4] [https://jupyter.org/ https://jupyter.org/]
291
292
[5] [https://medium.com/@alexstrebeck/jupiter-notebooks-vs-colaboratory-67bd51803d8 https://medium.com/@alexstrebeck/jupiter-notebooks-vs-colaboratory-67bd51803d8]
293
294
[6] [https://towardsdatascience.com/google-colab-jupyter-lab-on-steroids-perfect-for-deep-learning-cdddc174d77a https://towardsdatascience.com/google-colab-jupyter-lab-on-steroids-perfect-for-deep-learning-cdddc174d77a]
295
296
[7] [https://analyticsindiamag.com/a-beginners-guide-to-using-google-colab/ https://analyticsindiamag.com/a-beginners-guide-to-using-google-colab/]
297
298
[8] [https://analyticsindiamag.com/explained-5-drawback-of-google-colab/ https://analyticsindiamag.com/explained-5-drawback-of-google-colab/]
299

Return to Rodriguez et al 2021c.

Back to Top

Document information

Published on 19/10/21
Accepted on 04/10/21
Submitted on 24/07/21

Volume 37, Issue 4, 2021
DOI: 10.23967/j.rimni.2021.10.003
Licence: CC BY-NC-SA license

Document Score

0

Views 362
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?