You do not have permission to edit this page, for the following reason:

You are not allowed to execute the action you have requested.


You can view and copy the source of this page.

x
 
1
2
3
==PYTHON NOTEBOOK USEFULLNESS, CASE STUDY: OPTIMIZATION TECNICS COURSE==
4
5
Laura Andrea Rodriguez Rodriguez - [mailto:la.rodriguez@uniandes.edu.co la.rodriguez@uniandes.edu.co]
6
7
Universidad de lod Andes
8
9
Cristian José Pardo Mercado - [mailto:crijpardomer@unal.edu.co crijpardomer@unal.edu.co]
10
11
UNIVERSIDAD NACIONAL DE COLOMBIA
12
13
Juan Esteban Cepeda Baena - [mailto:jecepedab@unal.edu.co jecepedab@unal.edu.co]
14
15
UNIVERSIDAD NACIONAL DE COLOMBIA
16
17
Juan Pablo Gómez - [mailto:jupgomezme@unal.edu.co jupgomezme@unal.edu.co]
18
19
UNIVERSIDAD NACIONAL DE COLOMBIA
20
21
Sergio Rivera - [mailto:jupgomezme@unal.edu.co jupgomezme@unal.edu.co]
22
23
UNIVERSIDAD NACIONAL DE COLOMBIA
24
25
=1. Introduction=
26
27
Online code sharing and documentation platforms have become an essential part of every analyst and data scientist’s lives. Google Colab and Jupyter Notebook are some of the well named platforms which are used quite extensively to work on datasets. These platforms allow the user to showcase and manipulate data along with the capabilities of adding beautiful visualizations and essential narrative texts. They offer high customization abilities for the users along with Collaboration and sharing capabilities. These notebooks also offer a variety of languages to choose from, like R, Python, PySpark etc. if the necessary packages are installed. Other platforms available for working on big data includes R Markdown, Kaggle, Spark Notebook and many other popular tools. With the recent developments in Data Science technologies and Machine learning, there has been an increased adoption for these platforms, which has eventually led to the addition of new features to these tools.
28
29
'''JupyterLab '''is a web-based interactive development environment for Jupyter Notebook, codes as well as data. It has been designed to be flexible so as to incorporate any form of workflow into it pertaining to Data Science, Scientific Computing as well as Machine Learning & Artificial Intelligence. It also allows developers to write their own plugins and packages, which can then be easily incorporated into the workflows, i.e., seamless integration experience. Jupyter Notebook is a component of Lab which is an open-source web application that allows users to create and share documents containing live code, equations, visualizations and narrative text. Some of the use cases include data cleaning and transformation, numerical simulation, statistical modeling, and data visualization.
30
31
'''Google Colab''' or short for Colaboratory is a product developed by Google Research, which is a free to use platform and can be used to execute arbitrary Python codes through the browser. This tool is especially suited for machine learning, data analysis as well as educational purposes. Colab is a hosted Jupyter Notebook service which requires no setup and provides free access to computing power like GPUs on the cloud. This is ofcourse extended if the user pays for the pro version called Colab Pro.
32
33
=2. Why Jupyter Notebook?=
34
35
Some of the features provided by Jupyter are as follows -
36
37
:* Record code and steps performed in data manipulation and data cleaning which helps to back track steps in a later stage
38
39
:* Visualizations like graphs and figures are directly rendered on the notebook which can be easily rendered on html or pdf files for sharing
40
41
:* Animations and dynamic visualizations or interactive visuals are available on Jupyter Notebook which helps users interact with visuals and understand patterns in-depth
42
43
=3. Why Google Colab?=
44
45
Just as we discussed about some primary features that Jupyter provides to its users, here are some features that Google Colab provides its users as well -
46
47
:* It runs on Google Cloud Platform which provides robust and flexible service to the users in terms of storage and management
48
49
:* It is free of cost, provided the high end GPUs and resources that it provides for computation. Recently, they have added the feature for Tensor Processing Unit apart from its existing GPU and CPU
50
51
:* It can easily be integrated with Google Drive which makes dealing with large datasets from the location easy
52
53
=4. Google Colab vs Jupyter Notebook: The verdict=
54
55
Google Colab is an evolved version of the Jupyter Notebook with enhancements and far more superior features to be offered. There are multiple reasons why Google Colab is often preferred in colleges and companies, some of them being -
56
57
:* Lot of libraries are pre-installed which eliminates the need to install softwares and packages before using. Some of the most commonly used packages are pandas, numpy and matplotlib which are already pre-loaded in the environment
58
59
:* The work is saved on the cloud and there is no hassle of saving it in the local system. They are shared on the users google drive account which can easily be shared across with others over the internet
60
61
:* Collaborative capabilities which users can share across with team mates, with proper comments and notes. It has the power to co-code with fellow analysts and data scientists for real time work updates and reviews
62
63
:* GPU and TPU uses are free for everyone which eliminates the need to have expensive processing power with the user. Google provides all the processing power for powerful ML and AI packages and the local systems are not harmed in any manner
64
65
:* Google Colab also allows easy integration with numerous platforms like Google Forms, Stack Overflow, and Code Gist on Github
66
67
[[Image:Draft_Rivera_196260444-image1.png|600px]]
68
Figure 1. Google Colab vs Jupyter Notebook
69
70
The image above (Figure 1) shows both the IDEs and how a basic print statement looks on each of these platforms. Communication is an essential aspect of any project or data science activity. It is the idea of sharing information with one another in the process of developing a model or analyzing the data. You can only imagine when there are situations wherein the coder has developed 1000s of lines of codes which serves the purpose of the project. Now, sharing the same with the entire team, without any proper annotations or comments only makes it complex for the team to interpret what each section of the code does. In another scenario, earlier practices have been to copy snippets of code and paste it on MS Word document or some Confluence page after which the developer might go ahead and start explaining what each segment of the code does. This process of ‘documenting as you code’ has become far more simpler with the introduction of IDEs like Google Colab and Jupyter Notebook.
71
72
=5. Guide to using Google Colab=
73
74
We have constantly faced this issue in the past wherein a Machine Learning model stops working and we get an error ‘memory-error’, especially when working with large datasets on our local system. The following are some of the steps for setting up a project on Google Colab -
75
76
:''* Setting Up a Drive''
77
78
One can start by clicking on this [https://colab.research.google.com/ link] and then go on to create a new notebook or even upload a notebook from the local system to the environment. One can also seamlessly import a notebook directly from Google Drive if they wish to do so
79
80
:* ''Choosing a GPU or TPU''
81
82
Users are free to change the runtime on their notebook as they please. They can choose either a GPU or a TPU to work on their notebook
83
84
:* ''Using Terminal Code''
85
86
Google Colab also allows their user to run terminal codes and the popular libraries are already added to the notebook. Other than that, pip install allows installing the common packages on the work environment for the user
87
88
:* ''Saving a File''
89
90
While there are multiple ways to saving a Colab file, users can either use the terminal commands to save their work or directly go to File and Choose Save a Copy in Drive to get the job done
91
92
These simple steps as discussed above allows a user to navigate easily on the platform, which makes most of the tasks quite easy for them to comprehend. This makes Google Colab even more desirable in terms of choosing over an IDE for Data Science and Machine Learning. Having discussed that, there are also some downsides to using Google Colab, some of which include -
93
94
:* ''Closed Environment''
95
96
Google Colab can be used by anyone to run arbitrary Python codes in the browser. But it is kind of a closed environment for the Machine Learning Engineers or Data Scientists while adding their own packages. Hence the platform is great for using common tools but lacks scope for specialization purposes.
97
98
:* ''Repetitive Task''
99
100
The problem is that repetitive tasks are persistent in Google Colab wherein the user needs to install the packages and libraries again and again, when they start a new session every time. So, one needs to keep in mind that once they shut down an instance or session, on beginning a new one, everything needs to be re-installed again to get back to the original state of working
101
102
:* ''No Live-Editing''
103
104
For real-time collaboration just as Google Docs or Google Sheets provide, multiple users cannot work on the same file which limits the scope of real time collaboration. Hence, the need for back and forth transition arises when working on a single file within a team, since one person can work on the edits at a time
105
106
:* ''Saving & Storage Problem''
107
108
Uploaded files are removed when the session is restarted because Google Colab does not provide a persistent storage facility. So, if the device is turned off, the data can get lost, which can be a nightmare for many. Moreover, as one uses the current session in Google Storage, a downloaded file that is required to be used later needs to be saved before the session’s expiration. In addition to that, one must always be logged in to their Google account, considering all Colaboratory notebooks are stored in Google Drive
109
110
:* ''Limited Time and Space''
111
112
The Google Colab platform stores files in Google Drive with a free space of 15GB; however, working on bigger datasets requires more space, making it difficult to execute. This, in turn, can hold most of the complex functions to execute. Google Colab allows users to run their notebooks for at most 12 hours a day, but in order to work for a longer period of time, users need to access the paid version, i.e. [https://colab.research.google.com/signup Colab Pro], which allows programmers to stay connected for 24 hours
113
114
These points that we discussed above come as a set of trade-offs for the platform, when comparing with other coding IDEs for Data Science and Machine Learning. But if a user is comfortable with these points, Google Colab is definitely a great option to choose from a wide range of coding platforms.
115
116
To conclude, even though Jupyter Notebook is a robust IDE for Data Scientists and Analysts to work on, Google Colab has become an evolved form of the same which can make the work far more easier. With the increasing amount of data and the requirement for more accuracy in predictions, the need to use powerful ML and AI techniques has increased. This requires high computation power which is provided by Google Colab to its users. In comparison to the Jupyter Notebook, which uses local computational power, this becomes far more convenient for users who cannot afford high end processors and GPUs.
117
118
Over and above that, there are various mathematical equations, convex as well as non-convex optimization requirements along with other mixed-integer programming problems which can be solved with the high computational abilities of Google Colab. For the purpose of demonstration, a convex optimization has been used to solve a problem of control in this [https://colab.research.google.com/github/cvxgrp/cvx_short_course/blob/master/intro/control.ipynb notebook]. Similar to the example notebook, many other mathematical models can be solved using Google Colab and its computing capabilities.
119
120
The notebooks have evolved over the years, from a rudimentary form (iPython) to Google Colab. While iPython notebook only provided Read-eval-print loop, Jupyter was developed, which helped in running codes and taking notes along with other interactive features. Finally, Google Colab was developed which provided real-time code collaboration as well as hardware and resource allocation for the users.
121
122
Hence, for extensive Machine Learning and Artificial Intelligence, Complex Mathematical model optimization and other resource intensive tasks, Google Colab can be preferred over the Jupyter notebook, which relies on local computational power. For universities and highly collaborative work spaces, Google Colab surpasses the traditional JupyterLab features in terms of code development and model testing purposes. Over time, a research organization like Google Research is bound to improve a lot on their product Google Colab in the coming days. At least for now, when it comes to working on your projects, Google Colab is one of the best in the market which completes all the set of needs for their users as compared to the other IDEs available.
123
124
==CASE STUDY: OPTIMIZATION TECNICS COURSE==
125
126
The development of Google Colaboratory notebooks to complement classroom learning can be included in the category of development of educational and didactic resources, particularly university educational resources. Their preparation is not very different from the preparation of textbooks, books, or lecture notes, which are generally used; the process of preparation should be subject to rigorous phases of planning, development, and evaluation. During the development of this novel resource, the bibliography to be used was previously analyzed; this bibliography will guide the contents of the Google Colab notebooks. The next step was to find analogous material, either on Python or other languages, related to optimization techniques. To this end, we reviewed and evaluated four textbooks on mathematical optimization, engineering optimization, and engineering optimization applications [9]-[12] as well as three blogs with material published on the internet [13]-[15]. We chose two main sources and three secondary sources [9]-[11], [14]-[15]. The first reference was chosen because its contents are updated and coincide with the course syllabus [16], while the second reference was chosen because of its clear explanations and examples. According to the author of reference [9], ''This book introduces all the major metaheuristic algorithms and their applications in optimization. This textbook consists of three parts:  Part I: Introduction and fundamentals of optimization and algorithms; Part II: Metaheuristic algorithms; and Part III: applications of metaheuristics in engineering optimization''. The author of reference [10] states: ''The reader is motivated to be engaged with the content via numerous application examples of optimization in the area of electrical engineering throughout the book. This approach not only provides relevant readers with a better understanding of mathematical basics but also reveals a systematic approach of algorithm design in the real world of electrical engineering''. Once these two references had been chosen, the thematic content to be developed in the notebooks was structured. Since the objective was to address all the topics of the syllabus in [16], our proposal was as follows:
127
128
:* ● Mathematics for optimization 1
129
130
::* Upper and lower dimensions
131
132
::* Basic calculus
133
134
::* Optimality
135
136
::* Norms of vectors and matrices
137
138
:* ● Mathematics for optimization 2
139
140
::* Eigenvalues and definition
141
142
::* Linear and affine functions
143
144
::* Gradient vector, Jacobian matrix, and Hessian matrix
145
146
::* Convexity (convex sets, convex functions)
147
148
:* Convex optimization 1
149
150
::* Unconstrained optimization
151
152
::* Gradient-based methods
153
154
::* Constrained optimization
155
156
:* Convex optimization 2
157
158
::* Linear programming
159
160
::* Simplex method (basic procedure, augmented form)
161
162
::* Non-linear optimization
163
164
:* Convex optimization 3
165
166
::* Penalty method
167
168
::* Lagrange multipliers
169
170
::* Karush-Kuhn-Tucker conditions
171
172
:* Convex optimization 4
173
174
::* BFGS method
175
176
::* Nelder-Mead method
177
178
:* Convex optimization 5
179
180
::* Trust-region method
181
182
::* Sequential quadratic programming
183
184
:* Non-convex optimization 1
185
186
::* Non-convex stochastic gradient descent
187
188
:* Non-convex optimization 2
189
190
::* Gradient descent for principal components analysis
191
192
:* Non-convex optimization 3
193
194
::* Alternating minimization methods
195
196
::* Branch-and-bound methods
197
198
:* Metaheuristic optimization 1
199
200
::* Simulated annealing
201
202
:* Metaheuristic optimization 2
203
204
::* Genetic algorithms
205
206
:* Metaheuristic optimization 3
207
208
::* Tabu search
209
210
:* Metaheuristic optimization 4
211
212
::* Evolutionary strategies
213
214
The contents of the selected references were previously read to structure the proposal. Subsequently, the topics were balanced, so they could be developed in a one-and-a-half-hour masterclass.
215
216
We expect to contribute from our approach: mathematics, computer science, and engineering. The purpose is to try to capture the rigor of mathematics along with the efficiency of algorithms, as well as applications in engineering. To this end, we proposed that the first two notebooks should have this mathematical focus to lay the foundations necessary to develop the course. We also ensured that all notebooks included the implementations of the optimization algorithms efficiently.
217
218
The importance of clear concepts for students was always emphasized. For this reason, we decided to use as many graphic resources as possible, whether they were images uploaded in the notebooks by means of a code or images obtained from different websites related to the contents.
219
220
The preparation of the Google Colab notebooks was based on the pedagogical proposal called ''Problem-based learning'' (APP); this methodology is widely used in the formulation of educational texts and curricula [17]. This methodological proposal has produced good results at all educational levels, including the university [18].
221
222
At the beginning of each notebook there is a brief description of the topics to be covered throughout the notebook for students to know the contents of the class beforehand. Then there is a brief introduction to the topics to be covered; this introduction usually contains small historical notes related to the main topic of the notebook and general ideas for students to better understand the phenomenological regarding the topic. For example, the introduction of notebook 4 (Classic Methods II) states: ''This notebook is focused on studying the Simplex method, developed in 1947 by the American mathematician George Dantzig (known as the father of linear optimization). The Simplex method is one of the canonical methods of the optimization theory; it is of great importance to efficiently solve instances of optimization problems with both the objective function and the constraints being linear functions''.
223
224
Sometimes, this introduction refers to topics studied in previous notebooks and contrasts methods and results, promoting students’ meaningful learning [19].
225
226
After the introduction, students will find a section of dependencies, which are basically libraries necessary for the notebook to run correctly.
227
228
229
Although this section was among the first ones to be developed, all the necessary libraries and functionalities were compiled until the completion of the notebook for it to be executed successfully. Once these libraries were compiled into a single cell, the cell was relocated at the beginning of the notebook. This was done primarily for two reasons. On the one hand, the sequential structure of Google Colab notebooks generates an error when trying to execute a function without importing the library that contains it first; it is therefore advisable to upload these functionalities beforehand. On the other hand, since all the libraries are grouped in the same cell, students will always be aware of the libraries they will use or learn to use during the class.
230
231
  
232
233
Students then encounter the most important part of the notebook, namely the body of the notebook. This contains the title of the topic to be addressed, its theoretical development, formulas, deductions, and pseudocodes; these are developments necessary for a deep understanding of the topic. Mathpix Snip was used to prepare this section; this is a powerful OCR tool for finding the LaTeX code of a text image. Thanks to this, writing a formula inside the notebook becomes a simple task: if the formula is found digitally as an image, only a single click is necessary to obtain the LaTeX code that generates it [22]. After completing the theoretical development of the topic, students find a simple example, which is clearly delimited by the graphic separator associated with it (Figure 2):
234
[[Image:Draft_Rivera_196260444-image4.png|center|162px]]
235
Figure 2. Graphic separator.
236
237
Sometimes, the examples include a practical section, so students are expected to complete the structuring and to solve the problem presented; that is the function of the graphic separator (Figure 3):
238
239
240
[[Image:Draft_Rivera_196260444-image5.png|center|162px]]
241
Figure 3. Function of the graphic separator.
242
243
Students are told that it is time to interact with the exercise; this is based on problem-based learning, as explained above.
244
245
The proposed exercises include comments, hints, and aids that will allow students to satisfactorily fulfil the objectives. For this reason, the code contains comments and interactive hints.  These interactive hints are only an object that, when clicked on, unfolds completely as shown below (Figure 4):
246
247
248
[[Image:Draft_Rivera_196260444-image8.png|center|600px]]
249
Figure 4. Interactive hints.
250
251
Interactive hints are created based on an HTML object called Details.  <details> is used as a widget (small program) for revealing information. It is useful as users can see additional information, which is not visible initially unless they voluntarily want to do so. The aim is to encourage students to try to solve the exercise on their own by using the hints provided when necessary.
252
253
There are also questions in which it is necessary to confirm that the result is correct for the future development of the notebook. To this end, there is a section that shows the answer that students should obtain when solving the proposed exercise; the graphic separator is used to do it.
254
255
Then the answer to the exercise is shown.
256
257
The answers to the exercises were previously tested for students to verify if they have assimilated new knowledge correctly. [20]
258
259
Inside the body of the notebook, students will also find historical notes or curious data that complement the information presented in the notebook; the graphic separator is used for this.
260
261
For example, the ''Did You Know?'' section of notebook 1 (Mathematics for Optimization) states: ''The property that allows every non-empty set of real numbers bounded above to have a supreme is known as the “supremum axiom” or “axiom of completeness;” it is logically equivalent to the “intermediate value theorem,” a theorem that we will use later in the course. ''This information is not included in the main references [9] and [10], but complements the topics addressed.
262
263
264
Both header and footer banners as well as graphic separators containing ''example'', ''your turn'', ''answers'', ''did you know that? ''were designed on the web application Canva; this is a graphic design tool software with drag-and-drop features that provides access to more than 60 million photographs and 5 million vectors, graphs, and free and licensed fonts [21] (Figure 5).
265
266
267
[[Image:Draft_Rivera_196260444-image11.png|center|600px]]
268
Figure 5. Example of banner taken from notebook 1.
269
270
Finally, to take full advantage of the benefits of Google Colab notebooks, libraries of Python programming language were implemented in the explanations and examples sections, such as ''Matplotlib'', ''Seaborn'', and ''plotly''; the main function of these libraries is to provide tools for data visualization [23]-[25]. The functions of the examples developed in the notebooks were plotted in both 2D and 3D graphs with these libraries; functions, contour lines, and even feasible sets can be plotted (Figure 6):
271
272
273
[[Image:Draft_Rivera_196260444-image12.png|center|252px]]
274
Figure 6. Example taken from notebook 5 (Classic methods III).
275
276
Most graphs are static: the user sees the graph as an image that does not change over time and cannot interact with it. However, dynamic and interactive graphs were also added, as in notebook 3 (Optimization methods).
277
278
Made with the plotly library [25], students can move the graph with their cursor, check the results, be more involved with the objects of the exercise, and understand its objectives.
279
280
Finally, we would like to mention those tools that we believe may be interesting to be explored in future developments:
281
282
:* Tensowflow [26]: it contains a set of functionalities related to machine learning. Functionalities like those inherited from Keras [27] regarding optimizers contain valuable information with topics we have already studied in notebooks such as SGD or Adagrad [28].
283
284
:* R [29]: it is possible to invoke functionalities and run programs written with R syntax in Google Colab by means of the rpy2 library [30]. In doing so, optimization techniques and the native optimization tools provided by R could be explored. Including exercises that can take advantage of probability and statistics should also be considered.
285
286
:* CVXPY [32]: it is an optimization library for Python; one of the authors of reference [12] participated in its development. Some exercises can be proposed to compare the efficiency of the algorithms presented with the Scipy library and then contrast them with their corresponding algorithms from the CVXPY library.
287
288
:* plotly [25]: We did not explore the inclusion of interactive graphs deeply. The use of libraries such as plotly or the classic matplotlib can provide interesting pedagogical approaches as they allow users to interact with the mathematical objects they are faced with.
289
290
==References==
291
292
[1] [https://towardsdatascience.com/4-reasons-why-you-should-use-google-colab-for-your-next-project-b0c4aaad39ed https://towardsdatascience.com/4-reasons-why-you-should-use-google-colab-for-your-next-project-b0c4aaad39ed]
293
294
[2] [https://dimensionless.in/using-jupyter-notebooks-google-colab/#:~:text=Colaboratory%20is%20a%20free%20Jupyter https://dimensionless.in/using-jupyter-notebooks-google-colab/#:~:text=Colaboratory%20is%20a%20free%20Jupyter],for%20free%20from%20your%20browser.
295
296
[3] [https://research.google.com/colaboratory/faq.html https://research.google.com/colaboratory/faq.html]
297
298
[4] [https://jupyter.org/ https://jupyter.org/]
299
300
[5] [https://medium.com/@alexstrebeck/jupiter-notebooks-vs-colaboratory-67bd51803d8 https://medium.com/@alexstrebeck/jupiter-notebooks-vs-colaboratory-67bd51803d8]
301
302
[6] [https://towardsdatascience.com/google-colab-jupyter-lab-on-steroids-perfect-for-deep-learning-cdddc174d77a https://towardsdatascience.com/google-colab-jupyter-lab-on-steroids-perfect-for-deep-learning-cdddc174d77a]
303
304
[7] [https://analyticsindiamag.com/a-beginners-guide-to-using-google-colab/ https://analyticsindiamag.com/a-beginners-guide-to-using-google-colab/]
305
306
[8] [https://analyticsindiamag.com/explained-5-drawback-of-google-colab/ https://analyticsindiamag.com/explained-5-drawback-of-google-colab/]
307

Return to Rodriguez et al 2021c.

Back to Top

Document information

Published on 19/10/21
Accepted on 04/10/21
Submitted on 24/07/21

Volume 37, Issue 4, 2021
DOI: 10.23967/j.rimni.2021.10.003
Licence: CC BY-NC-SA license

Document Score

0

Views 378
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?